Discover how real-time inference with Ultralytics YOLO enables instant predictions for AI applications like autonomous driving and security systems.
Real-time inference is the process of using a trained machine learning (ML) model to make predictions on new, live data with minimal delay. In the context of AI and computer vision (CV), this means the system can process information—like a video stream—and generate an output almost instantaneously. The goal is to make the inference latency low enough that the results are immediately useful for decision-making. This capability is crucial for applications where timing is critical, transforming how industries from automotive to healthcare leverage AI.
It is important to distinguish real-time inference from batch inference. The key difference lies in how data is processed.
While both use a trained model to make predictions, their use cases are fundamentally different based on the urgency of the results.
The ability to make instant decisions enables a wide range of powerful applications across various sectors.
Making models run fast enough for real-time computing applications often requires significant optimization:
Models like Ultralytics YOLO are designed with efficiency and accuracy in mind, making them well-suited for real-time object detection tasks. Platforms like Ultralytics HUB provide tools to train, optimize (e.g., export to ONNX or TensorRT formats), and deploy models, facilitating the implementation of real-time inference solutions across various deployment options.