Green check
Link copied to clipboard

Using Ultralytics YOLO11 to run batch inferences

Explore the difference between real-time inferencing and batch inferencing when using Ultralytics YOLO11 for various computer vision applications.

If you've seen a self-driving car in action, you've witnessed real-time AI inferencing. A self-driving car typically uses cameras, sensors, and AI to process its surroundings and make almost instant decisions. However, when quick responses aren't needed, real-time inferencing can be resource-heavy.

A better option in these cases is batch inference. Instead of processing data continuously in real time, a batch inference can handle large sets of data at scheduled intervals. This approach helps save resources, reduce power consumption, and cut down on infrastructure costs.

For instance, in computer vision applications, models like Ultralytics YOLO11 can be used for real-time tasks like object detection and instance segmentation. However, processing large volumes of visual data in real time can be demanding. 

Fig 1. An example of segmenting objects in an image using YOLO11.

With batch inferencing, YOLO11 can be run on visual data in batches, reducing the strain on the system and improving efficiency without sacrificing performance. This makes it easier to deploy Vision AI solutions at scale without overwhelming the hardware or increasing costs.

In this article, we’ll explore batch inferencing, its benefits, and how batch inferencing using YOLO11 can be applied in computer vision applications. Let’s get started!

A look at batch inferencing in computer vision

You can think of batch inferencing as tackling a big task all at once instead of handling it piece by piece as it comes in. Instead of constantly processing data in real time, batch inferencing allows you to process large groups of data at set intervals. This approach is much more efficient when immediate responses aren’t necessary, helping to save on computing resources, reduce energy use, and cut costs.

When it comes to computer vision, there are certain applications where low latency is vital. Low latency refers to the minimal delay between receiving input (such as an image or video frame) and the system's response. For example, in real-time security monitoring, even small delays can result in safety risks.

However, in many other computer vision scenarios, low latency isn't as critical. This is where batch inferencing shines - when the system doesn't need to react instantly. Batch inferencing works by feeding visual data to a computer vision model in groups or batches, enabling the system to process large datasets at once rather than continuously in real-time.

Understanding how batch inferencing works

Here’s a closer look at the steps involved in batch inferencing:

  • Data collection: Visual data is collected over a period of time. This could include security footage, product images, or customer data, depending on the application.
  • Batch preparation: The collected data is then grouped into batches. During this step, the data is formatted as required by the model. For example, images might be resized, normalized, or converted to the appropriate format for the model to process.
  • Prediction: Once the data is prepared, the entire batch is fed into the model (such as YOLO11), which processes the whole batch at once. This enables the model to make predictions for all the data in the batch simultaneously, making the process more efficient compared to handling each data point individually.

When to use batch inferencing?

Now that we’ve covered what batch inferencing is and how it differs from real-time inferencing, let’s take a closer look at when to use it.

Batch inferencing is ideal for analyzing historical data. Let’s say you have surveillance footage from a metro station over the past month, and you’re trying to identify specific patterns, such as the number of people entering and exiting at different times of the day. 

Instead of processing each frame in real-time, batch inferencing allows you to process the entire month’s worth of footage in batches, identifying key events or trends without the need for immediate results. This way, you can analyze large volumes of data more efficiently and gain insights into long-term patterns, without overwhelming the system or requiring constant monitoring.

Batch inferencing is also an optimal solution when system resources are limited. By running the inference during off-peak hours (such as overnight), you can save on computing costs and ensure the system isn’t overloaded during peak usage times. This makes it an efficient and cost-effective approach for businesses or projects that need to process large datasets but don’t require real-time analysis.

Batch inferencing using Ultralytics YOLO11

The Ultralytics Python package supports batch inferencing for models like YOLO11. With YOLO11, you can easily run batch inference by specifying the ‘batch’ argument, which determines how many images or video frames are processed at once. 

During the batch inferencing process, predictions are generated for all the images in the batch simultaneously. By default, the batch size is set to 1, but you can adjust it to any number you prefer. 

For example, if the batch size is set to 5, YOLO11 will process five images or video frames at a time and generate predictions for all five at once. Larger batch sizes typically lead to faster inference times, as processing multiple images in a batch is more efficient than handling them individually.

Computer vision applications enabled by batch inferencing

Next, let's explore some real-world computer vision use cases for batch inferencing.

Enhancing diagnostics and research in healthcare

In medical research, working with large amounts of visual data is very common. Here, batch inferencing can help scientists analyze data more easily across fields like chemistry, biology, and genetics. Instead of analyzing one at a time, the data is processed in batches, saving time and effort.

For instance, at medical facilities, batch inferencing can be especially useful for analyzing large sets of medical images like MRIs or CT scans. Hospitals can collect these scans throughout the day and process them in batches overnight. 

This approach allows hospitals to make better use of their hardware and staff, reduce operational costs, and ensure that all scans are reviewed in a consistent and uniform manner. It's also beneficial for large research projects and long-term studies, where handling vast amounts of data is necessary.

Fig 2. Detecting a medical scan using YOLO11.

Improving autonomous vehicles using simulations

Self-driving cars use AI technologies like computer vision to process the world around them. With the help of advanced models like YOLO11, onboard systems on the car can recognize other vehicles, lane lines, road signs, and people on the street. While real-time inferencing is critical on the road, self-driving technology also relies heavily on batch inferencing behind the scenes. 

Fig 3. YOLO11 can easily detect pedestrians on the road.

After a car completes a trip, the data it collects, such as hours of camera footage, sensor readings, and LIDAR scans, can be processed later in large batches. This makes it possible for engineers to update the car's AI models, enhance system safety, and improve its ability to handle various driving conditions.

Batch inferencing is also used in autonomous driving simulations to test how self-driving cars would react in different situations, such as navigating busy intersections or responding to unpredictable pedestrian movements. This approach saves time, reduces costs, and avoids the risks associated with testing every scenario in real life.

Retail data analysis driven by batch inferencing

Similarly, for retail stores, batch inferencing with computer vision models like YOLO11 can significantly enhance operational efficiency. For example, camera systems in stores can capture thousands of images throughout the day, which can then be processed in batches overnight. 

This allows stores to analyze what’s happening in the store, such as customer behavior, traffic patterns, and product interactions, without the need for real-time processing, which can be challenging for smaller stores. 

Another interesting example is using batch inferencing to generate heatmaps, which visualize areas of high and low customer activity within the store. By analyzing these heatmaps, retailers can identify which areas attract the most foot traffic and which parts of the store might need more attention or product placement optimization. This data can help retailers make better decisions on store layout, product positioning, and even promotional strategies to improve customer experience and sales.

Fig 4. Heatmaps can help retailers identify popular areas in stores.

Pros and cons of batch inferencing

Here are some of the key benefits batch inferencing can bring to various industries:

  • Ease of integration: Batch inferencing can be easily integrated into existing workflows, particularly for industries like retail, security, or healthcare, where large volumes of data need to be processed in bulk.
  • Easier data management: When working with large amounts of data, batch inferencing can streamline data management since the data is grouped into manageable chunks. This makes it easier to track, review, and organize data over time.
  • Reduced network load: When data is processed in batches, the amount of data transferred at any given moment can be minimized, reducing the strain on network resources in cloud-based systems or distributed computing environments.

While there are many advantages to using batch inferencing, there are also some limitations to consider. Here are a few factors to keep in mind:

  • Storage requirements: Storing large datasets for batch processing can significantly increase storage costs, particularly with high-resolution images, videos, or large volumes of data.
  • Potential for backlog: If data accumulates quickly or large batches are not processed on time, a backlog can develop. This can lead to delays in delivering insights and processing new data in a timely manner.
  • Resource spikes: Large batches, especially those involving high-resolution images, can cause spikes in memory or compute usage. If not properly managed, these spikes may overwhelm systems, leading to slowdowns or crashes.

Key takeaways

Batch inferencing is an efficient way to process large volumes of visual data that don’t require immediate results. Rather than analyzing each image in real-time, it processes them in batches at scheduled times, like overnight. 

This method is cost-effective, reduces computational load, and still provides accurate results. From helping stores manage inventory to assisting doctors with medical scan analysis and enhancing self-driving car technologies, batch inferencing makes computer vision more accessible, affordable, and practical for real-world applications.

Ready to dive deep into AI? Explore our GitHub repository, connect with our community, and check out our licensing options to get started on your computer vision journey. Find out more about innovations like AI in manufacturing and computer vision in the logistics industry on our solutions pages.

LinkedIn logoTwitter logoFacebook logoCopy-link symbol

Read more in this category

Let’s build the future
of AI together!

Begin your journey with the future of machine learning