Glossary

Feature Pyramid Network (FPN)

Learn how Feature Pyramid Networks (FPN) enable multi-scale object detection—boosting accuracy for small and large objects in YOLO11 and modern CV systems.

A Feature Pyramid Network (FPN) is a fundamental architecture in modern computer vision (CV) designed to detect objects at varying scales with high precision. Traditional deep learning (DL) models often struggled to recognize small objects because they rely on deep layers where spatial resolution is lost. FPN addresses this by building a pyramidal structure of feature maps that combines low-resolution, semantically strong features with high-resolution, spatially detailed features. This design acts as a crucial "neck" in many object detection architectures, connecting the initial feature extractor—known as the backbone—to the final prediction layers, or the detection head. By efficiently sharing information across different levels, FPNs enable models like YOLO11 to accurately identify both tiny, distant details and large, prominent subjects within a single image.

Understanding the Architecture

The core innovation of a Feature Pyramid Network lies in how it processes visual information through three distinct stages. This structure allows the network to maintain a rich representation of the image across multiple resolutions without incurring a massive computational cost.

Bottom-Up Pathway: This stage corresponds to the forward pass of a standard Convolutional Neural Network (CNN), such as ResNet. As the image passes through the network, the spatial dimensions decrease while the semantic value (contextual understanding) increases.
Top-Down Pathway: To recover the lost spatial detail, the network upsamples the spatially coarse but semantically rich feature maps from the deeper layers. This process effectively reconstructs higher-resolution maps that contain strong context.
Lateral Connections: The crucial step involves merging the upsampled maps from the top-down pathway with the corresponding maps from the bottom-up pathway. These lateral connections fuse the high-level semantic context with the low-level textures and edges found in earlier layers, creating a multi-scale feature pyramid. The original FPN research paper details how this fusion significantly boosts performance on benchmark datasets like COCO.

Why Multi-Scale Detection Matters

In real-world scenarios, objects appear in vastly different sizes depending on their distance from the camera. A standard classifier might easily spot a car filling the frame but fail to detect a pedestrian in the background. FPNs solve this by assigning prediction tasks to different levels of the pyramid. Large objects are detected on the low-resolution, deep feature maps, while small objects are detected on the high-resolution, fused feature maps. This capability is essential for achieving high accuracy and recall in diverse environments, distinguishing FPN-equipped models from older single-scale detectors.

Real-World Applications

The ability to handle multi-scale data makes FPNs indispensable across various industries relying on artificial intelligence (AI).

Autonomous Vehicles: Self-driving systems must simultaneously track nearby vehicles and distant traffic lights. An FPN allows the perception stack to process these elements within the same inference pass, ensuring critical safety decisions are made in real-time. Leading research from organizations like Waymo highlights the importance of such multi-scale understanding for navigation.
Medical Image Analysis: In diagnostic imaging, identifying anomalies requires precision across scales. A tumor might be a large mass or a tiny, early-stage nodule. FPNs enhance image segmentation models used in radiology, helping clinicians detect pathologies of varying sizes in X-rays and MRI scans, as discussed in Radiology AI journals.

FPN vs. BiFPN

While FPN revolutionized feature extraction, newer architectures have refined the concept. A notable evolution is the Bi-directional Feature Pyramid Network (BiFPN), introduced by Google Research in the EfficientDet architecture. Unlike the standard FPN which flows one way (top-down), BiFPN adds bottom-up paths and learns specific weights for each connection, prioritizing more important features. However, standard FPN designs and their variants remain the foundation for high-performance models like YOLO11, balancing speed and accuracy effectively for most real-time inference tasks.

Implementation Example

Modern libraries handle the complexities of FPNs internally. The following example demonstrates using the Ultralytics YOLO package, which incorporates advanced feature pyramid structures to detect objects of all sizes seamlessly.

from ultralytics import YOLO

# Load the YOLO11 model, which utilizes a feature pyramid architecture for multi-scale detection
model = YOLO("yolo11n.pt")

# Run inference on an image to detect objects ranging from small to large
results = model.predict("path/to/street_scene.jpg")

# Display the resulting bounding boxes and class labels
results[0].show()

Feature Pyramid Network (FPN)

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Understanding the Architecture

Why Multi-Scale Detection Matters

Real-World Applications

FPN vs. BiFPN

Implementation Example

Read more in this category

Self-supervised learning for denoising: A step-by-step breakdown

Future object detection trends: 7 key things to look out for

Enhancing vehicle re-identification with Ultralytics YOLO models

Join the Ultralytics community