Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Continuous Batching

Learn how continuous batching optimizes GPU throughput and reduces latency. Discover how to use Ultralytics YOLO26 to maximize efficiency in production ML tasks.

Continuous batching is an advanced scheduling and inference optimization technique used in machine learning (ML) to maximize hardware utilization and throughput. In traditional static batching, an inference engine waits for a predetermined number of requests to accumulate before processing them simultaneously. This often leads to inefficiencies because the system must wait for the longest-running request in the batch to finish before releasing resources. Continuous batching, also known as dynamic or iteration-level batching, solves this by injecting new requests into the compute batch as soon as an active request completes, significantly reducing idle time on GPUs and improving overall efficiency.

To better understand how data is processed during model deployment, it is helpful to differentiate continuous batching from other related terms in the glossary:

  • Batch Size: This refers to the fixed number of samples processed simultaneously during training or inference. Traditional batch processing workflows rely on static sizes, whereas continuous batching allows the effective batch size to fluctuate dynamically based on incoming traffic.
  • Real-Time Inference: This concept focuses on minimizing inference latency for immediate predictions, processing single inputs as they arrive. Continuous batching bridges the gap between high-throughput static batching and low-latency real-time inference by maintaining high throughput without forcing fast requests to wait for slower ones.

Link to this sectionReal-World Applications#

Continuous batching is critical for production systems that handle high volumes of unpredictable requests. Here are two concrete examples of its application:

  1. High-Throughput Text Generation: When serving Large Language Models (LLMs), generating responses for different users takes varying amounts of time depending on the output length. Frameworks leveraging continuous batching—such as vLLM on Ray Serve—can continuously stream newly generated tokens and immediately swap out finished conversations for new prompts. This method, originally popularized by research on iteration-level scheduling, drastically improves text generation throughput.

  2. Asynchronous Video Analytics: In video understanding tasks, such as tracking vehicles across a city's traffic camera network, frames arrive at different intervals. Continuous batching allows object tracking models to dynamically process incoming video frames the millisecond resources free up, optimizing hardware acceleration pipelines for smart city dashboards.

Link to this sectionContinuous Processing in Vision Tasks#

When managing high-traffic model deployment practices, streaming inferences iteratively can simulate the benefits of dynamic batching by ensuring memory is freed up progressively rather than blocked. The following Python example demonstrates how to use the generator pattern with the model prediction API to handle a continuous stream of images efficiently.

from ultralytics import YOLO

# Load the latest Ultralytics YOLO26 model
model = YOLO("yolo26n.pt")

# Using stream=True acts as a generator, iteratively processing inputs
# to keep memory usage low and throughput high
results = model.predict(source=["img1.jpg", "img2.jpg", "img3.jpg"], stream=True)

# Process each result as soon as it completes
for result in results:
    print(f"Detected {len(result.boxes)} objects in this frame.")

Managing system-level resource scheduling requires a balance between raw speed and operational cost. Teams deploying massive computer vision (CV) and language models increasingly rely on advanced serving frameworks to manage these dynamic batches. For enterprise teams looking to streamline their infrastructure, the Ultralytics Platform offers robust tools for training, monitoring, and exporting models into highly optimized production environments.

Explore solutions

Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning