Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Mixture of Depths (MoD)

Explore how Mixture of Depths (MoD) optimizes AI efficiency by dynamically routing tokens. Learn how this technique reduces FLOPs in Ultralytics YOLO26 and LLMs.

In deep learning architectures, computational efficiency is paramount, especially when processing long sequences or high-resolution inputs. A novel approach dynamically allocates compute resources by allowing the network to decide which parts of the input require full processing and which can safely bypass certain layers. This dynamic routing strategy reduces overall computational complexity without sacrificing the model's predictive power or accuracy.

Link to this sectionUnderstanding the Concept#

Mixture of Depths (MoD) is an architectural technique primarily applied to Transformer architectures where the model learns to dynamically skip computation for specific tokens at various layers. Traditional transformers process every token through every layer, whether it is a crucial piece of information or filler content. In contrast, an MoD model uses a router mechanism to evaluate tokens and assigns them a score. Only the top-scoring tokens—up to a predefined capacity limit—are passed through the heavy computation blocks, such as attention mechanisms or dense feed-forward layers. The remaining tokens bypass the block via residual connections, effectively creating a "mixture of depths" where different tokens experience varying levels of processing depth.

This method, popularized by recent DeepMind research and documented extensively in the arXiv repository, drastically reduces the total number of floating-point operations (FLOPs) required during both training and inference.

Link to this sectionDifferentiating From Mixture of Experts (MoE)#

It is easy to confuse this concept with a Mixture of Experts (MoE). While both use routing mechanisms, they solve different problems:

  • MoE routes tokens to different sub-networks (experts) within a layer. The computational depth remains the same for all tokens, but the model's parameter count increases.
  • MoD routes tokens to either the computation block or a skip connection. The parameter count remains strictly constant, but the computational depth decreases for less important tokens, directly improving inference latency.

Link to this sectionReal-World Applications#

The ability to dynamically budget compute makes this technique highly valuable across multiple domains of computer vision and natural language processing.

  1. Context Optimization in Language Models: Modern Large Language Models (LLMs) from organizations like OpenAI and Anthropic process massive context windows. By employing dynamic depth routing, these models can skip structural or repetitive filler words, reserving deep computation for complex reasoning steps and factual extraction.

  2. High-Resolution Vision AI: In advanced vision systems like the Ultralytics YOLO26 model, processing large images for object detection and image segmentation requires immense memory. Depth routing allows the network to bypass feature extraction on uniform backgrounds (like empty skies or blank walls), focusing computational power on intricate foreground objects. This is crucial for deploying models to resource-constrained edge AI hardware optimized by CUDA optimization libraries.

Link to this sectionImplementation Example#

Below is a conceptual PyTorch snippet demonstrating how a basic routing mechanism might skip computation for a portion of input tokens, simulating a depth-routing behavior.

import torch
import torch.nn as nn


class MixtureOfDepthsBlock(nn.Module):
    def __init__(self, d_model, capacity_factor=0.5):
        super().__init__()
        self.capacity_factor = capacity_factor
        self.router = nn.Linear(d_model, 1)
        self.heavy_compute = nn.Sequential(nn.Linear(d_model, d_model * 4), nn.GELU(), nn.Linear(d_model * 4, d_model))

    def forward(self, x):
        # x shape: (batch_size, seq_len, d_model)
        seq_len = x.size(1)
        capacity = int(seq_len * self.capacity_factor)

        # 1. Compute routing scores
        scores = self.router(x).squeeze(-1)  # Shape: (batch_size, seq_len)

        # 2. Identify top-k tokens to process
        topk_indices = torch.topk(scores, capacity, dim=1).indices

        # 3. Create an output tensor mirroring the input (residual baseline)
        output = x.clone()

        # 4. Apply heavy computation only to the selected tokens
        for b in range(x.size(0)):
            selected_tokens = x[b, topk_indices[b]]
            processed_tokens = self.heavy_compute(selected_tokens)
            output[b, topk_indices[b]] += processed_tokens

        return output


# Example usage
dummy_input = torch.randn(2, 64, 128)  # Batch=2, Seq=64, Dim=128
mod_block = MixtureOfDepthsBlock(d_model=128, capacity_factor=0.5)
output = mod_block(dummy_input)
print(f"Output shape: {output.shape}")  # Expect (2, 64, 128)

By leveraging frameworks like the PyTorch framework or TensorFlow, developers can integrate these custom model optimization blocks. Furthermore, tools like the Ultralytics Platform help teams manage the training data required to accurately train these routers, alongside integrating seamlessly with enterprise ecosystems like Google Cloud AI.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning