Learn how Feature Pyramid Networks (FPN) enable multi-scale object detection—boosting accuracy for small and large objects in YOLO11 and modern CV systems.
A Feature Pyramid Network (FPN) is a specialized architecture used in computer vision (CV) to improve the detection of objects at different scales. It serves as a critical component in many modern object detection architectures, designed to overcome the limitations of traditional detectors that struggle to recognize small items. By generating a multi-scale feature pyramid from a single-resolution input image, FPNs enable models to detect both large structures and tiny details with high accuracy. This architecture typically sits between the backbone (which extracts features) and the detection head (which predicts classes and boxes), effectively enriching the semantic information passed to the final layers.
The primary goal of an FPN is to leverage the inherent multi-scale, pyramidal hierarchy of deep Convolutional Neural Networks (CNNs) while reducing the computational cost associated with processing multiple image scales separately. The architecture consists of three main pathways that process visual data:
Before FPNs, object detectors generally had to choose between using only the top-most layer (good for large objects, bad for small ones) or processing an image pyramid (slow and computationally expensive). FPNs provide a "best of both worlds" solution. This capability is vital for real-time inference, allowing advanced models like YOLO26 and YOLO11 to maintain high frame rates while accurately identifying objects that occupy only a few pixels of the screen.
The ability to handle multi-scale data makes FPNs indispensable across various industries relying on artificial intelligence (AI).
While FPN revolutionized feature extraction, newer architectures have refined the concept.
Deep learning libraries and the Ultralytics framework handle the complexities of FPNs internally. The following example demonstrates how to load a model that utilizes a feature pyramid structure to detect objects.
from ultralytics import YOLO
# Load the YOLO26 model, which utilizes an advanced feature pyramid architecture
# The 'n' suffix stands for nano, a lightweight version of the model
model = YOLO("yolo26n.pt")
# Run inference on an image to detect objects ranging from small to large
# The model internally uses its FPN neck to aggregate features at multiple scales
results = model.predict("https://ultralytics.com/images/bus.jpg")
# Display the resulting bounding boxes and class labels
results[0].show()