Discover how Model Soups improve accuracy and robustness by averaging weights of Ultralytics YOLO models. Learn to boost performance without increasing latency.
Model Soups refer to a machine learning technique where the weights of multiple neural networks, fine-tuned from the same pre-trained base model using different hyperparameters, are averaged together to create a single, more robust model. This approach allows developers to improve overall accuracy and generalization without increasing the computational cost during inference.
When fine-tuning a model, practitioners typically run a wide hyperparameter tuning sweep to find the best-performing configuration. Traditionally, the single best model is selected, and the rest are discarded. However, creating a model soup capitalizes on the diverse features learned by all the models in the sweep. By directly averaging their model weights, the resulting network often outperforms the single best model, effectively combining their strengths while minimizing overfitting. This process is highly efficient and can be easily managed within collaborative environments like the Ultralytics Platform.
Model Soups are highly effective in scenarios where computational resources are restricted, but high accuracy and robustness are required.
To navigate the landscape of deep learning optimization, it is important to distinguish Model Soups from similar techniques:
Creating a uniform model soup involves accessing the PyTorch state dictionary of multiple trained models and mathematically averaging their tensors. Below is a concise example of how this can be achieved using an Ultralytics YOLO26 workflow natively backed by the PyTorch framework.
import torch
# Load the PyTorch state dictionaries from two fine-tuned YOLO26 models
model1 = torch.load("yolo26_run1.pt")["model"].state_dict()
model2 = torch.load("yolo26_run2.pt")["model"].state_dict()
# Create a uniform model soup by averaging the model weights
soup_dict = {key: (model1[key] + model2[key]) / 2.0 for key in model1.keys()}
# The resulting soup_dict can now be loaded into a new YOLO26 instance
By leveraging this technique, computer vision practitioners can easily boost performance metrics like zero-shot learning capabilities and general robustness without sacrificing the deployment speed required for modern, edge-first AI architectures.
Begin your journey with the future of machine learning