Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Capsule Networks (CapsNet)

Explore Capsule Networks (CapsNets) and how they solve the limitations of CNNs. Learn about dynamic routing, spatial hierarchies, and comparing CapsNets to YOLO26.

Capsule Networks, often abbreviated as CapsNets, represent an advanced architecture in the field of deep learning designed to overcome specific limitations found in traditional neural networks. Introduced by Geoffrey Hinton and his team, CapsNets attempt to mimic the biological neural organization of the human brain more closely than standard models. Unlike a typical convolutional neural network (CNN), which excels at detecting features but often loses spatial relationships due to downsampling, a Capsule Network organizes neurons into groups called "capsules." These capsules encode not just the probability of an object's presence, but also its specific properties, such as orientation, size, and texture, effectively preserving the hierarchical spatial relationships within visual data.

Link to this sectionThe Limitation of Traditional CNNs#

To understand the innovation of CapsNets, it is helpful to look at how standard computer vision models operate. A conventional CNN uses layers of feature extraction followed by pooling layers—specifically max pooling—to reduce computational load and achieve translational invariance. This means a CNN can identify a "cat" regardless of where it sits in the image.

However, this process often discards precise location data, leading to the "Picasso problem": a CNN might classify a face correctly even if the mouth is on the forehead, simply because all the necessary features are present. CapsNets address this by removing pooling layers and replacing them with a process that respects the spatial hierarchies of objects.

Link to this sectionHow Capsule Networks Work#

The core building block of this architecture is the capsule, a nested set of neurons that outputs a vector rather than a scalar value. In vector mathematics, a vector has both magnitude and direction. In a CapsNet:

  • Magnitude (Length): Represents the probability that a specific entity exists in the current input.
  • Direction (Orientation): Encodes the instantiation parameters, such as the object's pose estimation, scale, and rotation.

Capsules in lower layers (detecting simple shapes like edges) predict the output of capsules in higher layers (detecting complex objects like eyes or tires). This communication is managed by an algorithm called "dynamic routing" or "routing by agreement." If a lower-level capsule's prediction aligns with the higher-level capsule's state, the connection between them is strengthened. This allows the network to recognize objects from different 3D viewpoints without requiring the massive data augmentation usually needed to teach CNNs about rotation and scale.

Link to this sectionKey Differences: CapsNets vs. CNNs#

While both architectures are fundamental to computer vision (CV), they differ in how they process and represent visual data:

  • Scalar vs. Vector: CNN neurons use scalar outputs to signify feature presence. CapsNets use vectors to encode presence (length) and pose parameters (orientation).
  • Routing vs. Pooling: CNNs use pooling to downsample data, often losing location details. CapsNets use dynamic routing to preserve spatial data, making them highly effective for tasks requiring precise object tracking.
  • Data Efficiency: Because capsules implicitly understand 3D viewpoints and affine transformations, they can often generalize from less training data compared to CNNs, which may require extensive examples to learn every possible rotation of an object.

Link to this sectionReal-World Applications#

While CapsNets are often more computationally expensive than optimized models like YOLO26, they offer distinct advantages in specialized domains:

  1. Medical Image Analysis: In healthcare, the precise orientation and shape of an anomaly are critical. Researchers have applied CapsNets to brain tumor segmentation, where the model must distinguish a tumor from surrounding tissue based on subtle spatial hierarchies that standard CNNs might smooth over. You can explore related research on Capsule Networks in Medical Imaging.

  2. Overlapping Digit Recognition: CapsNets achieved state-of-the-art results on the MNIST dataset specifically in scenarios where digits overlap. Because the network tracks the "pose" of each digit, it can disentangle two overlapping numbers (e.g., a '3' on top of a '5') as distinct objects rather than merging them into a single confused feature map.

Link to this sectionPractical Context and Implementation#

Capsule Networks are primarily a classification architecture. While they offer theoretical robustness, modern industry applications often favor high-speed CNNs or Transformers for real-time performance. However, understanding the classification benchmarks used for CapsNets, such as MNIST, is useful.

The following example demonstrates how to train a modern YOLO classification model on the MNIST dataset using the ultralytics package. This parallels the primary benchmark task used to validate Capsule Networks.

from ultralytics import YOLO

# Load a YOLO26 classification model (optimized for speed and accuracy)
model = YOLO("yolo26n-cls.pt")

# Train the model on the MNIST dataset
# This dataset helps evaluate how well a model learns handwritten digit features
results = model.train(data="mnist", epochs=5, imgsz=32)

# Run inference on a sample image
# The model predicts the digit class (0-9)
predict = model("https://docs.ultralytics.com/datasets/classify/mnist/")

Link to this sectionFuture of Capsules and Vision AI#

The principles behind Capsule Networks continue to influence AI safety and interpretability research. By explicitly modeling part-whole relationships, capsules offer a "glass box" alternative to the "black box" nature of deep neural networks, making decisions more explainable. Future developments look to combine the spatial robustness of capsules with the inference speed of architectures like YOLO11 or the newer YOLO26 to improve performance in 3D object detection and robotics. Researchers are also exploring Matrix Capsules with EM Routing to further reduce the computational cost of the agreement algorithm.

For developers looking to manage datasets and train models efficiently, the Ultralytics Platform provides a unified environment to annotate data, train in the cloud, and deploy models that balance the speed of CNNs with the accuracy required for complex vision tasks.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning