Tune in to YOLO Vision 2025!
September 25, 2025
10:00 — 18:00 BST
Hybrid event
Yolo Vision 2024
Glossary

Anchor Boxes

Learn how anchor boxes enable anchor-based object detection, priors for classification, regression and NMS, with applications in autonomous driving and retail.

Anchor boxes are a foundational component in many anchor-based object detection models, serving as a predefined set of reference boxes with specific heights and widths. These boxes act as priors, or educated guesses, about the potential location and scale of objects in an image. Instead of searching for objects blindly, models use these anchors as starting points, predicting offsets to refine their position and size to match the actual objects. This approach transforms the complex task of object localization into a more manageable regression problem, where the model learns to adjust these templates rather than generating boxes from scratch.

How Anchor Boxes Work

The core mechanism involves tiling an image with a dense grid of anchor boxes at various positions. At each position, multiple anchors with different scales and aspect ratios are used to ensure that objects of diverse shapes and sizes can be detected effectively. During the model training process, the detector's backbone first extracts a feature map from the input image. The detection head then uses these features to perform two tasks for each anchor box:

  • Classification: It predicts the probability that an anchor box contains an object of interest, assigning a class label and a confidence score.
  • Regression: It calculates the precise adjustments (or offsets) needed to transform the anchor box into a final bounding box that tightly encloses the object.

The model uses metrics like Intersection over Union (IoU) to determine which anchor boxes best match the ground-truth objects during training. After prediction, a post-processing step called Non-Maximum Suppression (NMS) is applied to eliminate redundant and overlapping boxes for the same object.

Anchor Boxes vs. Other Concepts

It is important to distinguish anchor boxes from related terms in computer vision:

  • Bounding Box: An anchor box is a pre-defined template used during the detection process, while a bounding box is the final, refined output that precisely localizes a detected object.
  • Anchor-Free Detectors: While anchor-based models like YOLOv5 and the Faster R-CNN family rely on these presets, modern architectures have increasingly shifted towards anchor-free detectors. Models like Ultralytics YOLO11 predict object locations directly by identifying keypoints or centers, which simplifies the model design and can improve performance on objects with unconventional shapes. You can read more about the benefits of an anchor-free design in YOLO11.

Real-World Applications

The structured approach of anchor boxes makes them effective in scenarios where objects have predictable shapes and sizes.

  1. Autonomous Driving: In solutions for the automotive industry, anchor-based detectors excel at identifying cars, pedestrians, and traffic signs. The relatively consistent aspect ratios of these objects align well with predefined anchors, enabling reliable detection for systems developed by companies like NVIDIA and Tesla.
  2. Retail Analytics: For AI-driven inventory management, these models can efficiently scan shelves to count products. The uniform size and shape of packaged goods make them ideal candidates for an anchor-based approach, helping automate stock monitoring and reduce manual effort.

These models are typically developed using powerful deep learning frameworks such as PyTorch and TensorFlow. For continued learning, platforms like DeepLearning.AI offer comprehensive courses on computer vision fundamentals.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard