Tune in to YOLO Vision 2025!
September 25, 2025
10:00 — 18:00 BST
Hybrid event
Yolo Vision 2024
Glossary

Feature Pyramid Network (FPN)

Learn how Feature Pyramid Networks (FPN) enable multi-scale object detection—boosting accuracy for small and large objects in YOLO11 and modern CV systems.

A Feature Pyramid Network (FPN) is a component within deep learning models, particularly object detection architectures, designed to improve the detection of objects at various scales. In any given image, objects can appear large or small depending on their size and distance from the camera. FPN addresses this challenge by efficiently creating a multi-scale representation of features, allowing a model to simultaneously recognize a small, distant car and a large, nearby truck with high accuracy. It acts as a bridge, or "neck," between the main feature extractor and the final prediction component of a network.

How a Feature Pyramid Network Works

An FPN operates by combining low-resolution, semantically strong features with high-resolution, semantically weak features. This process is typically achieved through a structure with two pathways and lateral connections.

  1. Bottom-up Pathway: This is the standard forward pass of a Convolutional Neural Network (CNN), which serves as the model's backbone. As an image passes through successive layers, the resulting feature maps decrease in spatial size but increase in semantic depth, meaning they capture more abstract concepts.
  2. Top-down Pathway: The network then takes the feature map from the deepest layer (which is small but information-rich) and begins to upsample it.
  3. Lateral Connections: As the top-down pathway reconstructs larger feature maps, it merges them with corresponding feature maps from the bottom-up pathway. This fusion enriches the upsampled layers with the finer, more localized details from the earlier layers. The outcome is a "pyramid" of feature maps, each rich in both semantics and spatial detail, which is then fed to the detection head for prediction. The original FPN research paper provides a detailed technical explanation of this process.

The Role of FPN in Object Detection

In a typical object detection model, the architecture is split into a backbone, neck, and head. The FPN is a popular choice for the neck component. Its primary role is to aggregate the features extracted by the backbone before they are used for the final detection task. By providing a rich, multi-scale feature representation, FPNs enable models like YOLO11 to perform robustly across a wide range of object sizes. This approach is more computationally efficient than processing an image at multiple resolutions separately, as it reuses features computed in the backbone's single forward pass. Many state-of-the-art models leverage this concept, as seen in various YOLO model comparisons.

Real-World Applications

FPNs are integral to many modern computer vision (CV) applications where multi-scale object detection is critical.

  • Autonomous Vehicles: Self-driving cars must detect pedestrians, vehicles, traffic signs, and lane markings at various distances. An FPN helps the vehicle's perception system, detailed in resources from institutions like Carnegie Mellon University, to identify a distant pedestrian and a nearby car within the same frame, which is essential for safe navigation.
  • Medical Image Analysis: In radiology, FPNs can help analyze medical scans to detect anomalies of different sizes, such as small lesions and large tumors. This multi-scale capability allows for more comprehensive and accurate automated diagnostics in fields like pathology and oncology, as discussed in research published by the National Institutes of Health (NIH).

FPN vs. BiFPN

While FPN marked a significant advancement, newer architectures have evolved the concept. A notable example is the Bi-directional Feature Pyramid Network (BiFPN), introduced in the EfficientDet paper by Google Research. Unlike FPN's simple top-down pathway, BiFPN introduces bidirectional connections (both top-down and bottom-up) and uses weighted feature fusion, allowing the network to learn the importance of different input features. This often leads to better performance and efficiency, as highlighted in comparisons like EfficientDet vs. YOLO11. While FPN is a foundational concept, BiFPN represents a more advanced and optimized approach to multi-scale feature fusion.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard