Discover how optimization algorithms like SGD and AdamW drive ML training. Learn to minimize loss and improve Ultralytics YOLO26 performance for AI applications.
An optimization algorithm serves as the core computational engine that drives the training process of machine learning (ML) and deep learning (DL) models. Its primary responsibility is to iteratively adjust the internal model weights and biases to minimize the error between predicted outcomes and actual targets. You can visualize this process as a hiker attempting to navigate down a foggy mountain to reach the lowest point in the valley. The optimization algorithm acts as the guide, determining the direction and the size of the step the hiker should take to reach the bottom, which corresponds to the state where the loss function is minimized and the model's predictive accuracy is maximized.
The training of a neural network involves a repetitive cycle of prediction, error calculation, and parameter updates. The optimization algorithm controls the "update" phase of this loop. Once a batch of training data is processed, the system calculates a gradient—a vector that points in the direction of the steepest increase in error—using a method called backpropagation.
The optimizer then updates the model parameters in the opposite direction of the gradient to reduce the error. The magnitude of this update is governed by a crucial hyperparameter known as the learning rate. If the step is too large, the model might overshoot the global minimum; if it is too small, training may become prohibitively slow or get stuck in a local minimum. Advanced resources like the Stanford CS231n optimization notes offer deeper technical insights into these dynamics.
Different problems require different strategies. While there are many variations, a few key algorithms dominate modern AI development:
Optimization algorithms operate silently behind the scenes of almost every successful AI solution, translating data into actionable intelligence.
It is important to differentiate the optimization algorithm from other components of the learning process to understand the workflow effectively.
In modern frameworks, selecting an optimization algorithm is often done via a single argument. The following example
demonstrates how to train a YOLO26 model using the
AdamW optimizer within the ultralytics package. Users can also leverage the
Ultralytics Platform for a no-code approach to managing these training
sessions.
from ultralytics import YOLO
# Load the latest YOLO26 model (recommended for new projects)
model = YOLO("yolo26n.pt")
# Train the model using the 'AdamW' optimization algorithm
# The optimizer iteratively updates weights to minimize loss on the dataset
results = model.train(data="coco8.yaml", epochs=5, optimizer="AdamW")
For those interested in the lower-level mechanics, frameworks like PyTorch Optimizers and TensorFlow Keras Optimizers offer extensive documentation on how to implement and customize these algorithms for custom research architectures.