Yolo Vision Shenzhen
Shenzhen
Jetzt beitreten
Glossar

Echtzeit-Inferenz

Entdecken Sie, wie Echtzeit-Inferenz mit Ultralytics YOLO sofortige Vorhersagen für KI-Anwendungen wie autonomes Fahren und Sicherheitssysteme ermöglicht.

Real-time inference refers to the process where a trained machine learning (ML) model accepts live input data and generates predictions almost instantaneously. Unlike offline processing, where data is collected and analyzed in bulk at a later time, real-time inference occurs on the fly, enabling systems to react to their environment with speed and agility. This capability is the heartbeat of modern Artificial Intelligence (AI) applications, allowing devices to perceive, interpret, and act upon data within milliseconds.

Die Bedeutung einer niedrigen Latenzzeit

The primary metric for evaluating real-time performance is inference latency. This measures the time delay between the moment data is input into the model—such as a frame from a video camera—and the moment the model produces an output, such as a bounding box or classification label. For an application to be considered "real-time," the latency must be low enough to match the speed of the incoming data stream.

For example, in video understanding tasks running at 30 frames per second (FPS), the system has a strict time budget of approximately 33 milliseconds to process each frame. If the inference takes longer, the system introduces lag, potentially leading to dropped frames or delayed responses. achieving this often requires hardware acceleration using GPUs or specialized Edge AI devices like the NVIDIA Jetson.

Echtzeit-Inferenz vs. Batch-Inferenz

It is helpful to distinguish real-time workflows from batch processing. While both involve generating predictions, their goals and architectures differ significantly:

  • Real-time Inference: Prioritizes low latency. It processes single data points (or very small batches) as soon as they arrive. This is essential for interactive applications like autonomous vehicles, where a car must instantly detect a pedestrian to brake in time.
  • Batch Inference: Prioritizes high throughput. It collects a large volume of data and processes it all at once. This is suitable for non-urgent tasks, such as generating nightly inventory reports or analyzing historical big data trends.

Anwendungsfälle in der Praxis

The ability to make split-second decisions has transformed various industries by enabling automation in dynamic environments.

  • Smart Manufacturing: In AI in manufacturing, cameras positioned over conveyor belts use real-time inference to perform automated quality control. An object detection model can instantly identify defects or foreign objects in products moving at high speeds. If an anomaly is detected, the system triggers a robotic arm to remove the item immediately, ensuring only high-quality goods reach packaging.
  • Surveillance and Security: Modern security systems rely on computer vision to monitor perimeters. Instead of just recording footage, these cameras run real-time person detection or face recognition to alert security personnel of unauthorized access the moment it happens.
  • Robotics: In the field of AI in robotics, robots use pose estimation to navigate complex physical spaces. A warehouse robot must continuously infer the location of obstacles and human workers to plan its path safely and efficiently.

Optimization and Deployment

Deploying models for real-time applications often requires optimization to ensure they run efficiently on target hardware. Techniques such as model quantization reduce the precision of the model's weights (e.g., from float32 to int8) to decrease memory usage and increase inference speed with minimal impact on accuracy.

Developers can utilize the Ultralytics Platform to streamline this process. The platform simplifies training and allows users to export models to optimized formats like TensorRT for NVIDIA GPUs, OpenVINO for Intel CPUs, or TFLite for mobile deployment.

Code-Beispiel

The following Python snippet demonstrates how to run real-time inference on a webcam feed using the ultralytics library. It uses the YOLO26 Nano model, which is engineered specifically for high-speed performance on edge devices.

from ultralytics import YOLO

# Load the YOLO26 Nano model, optimized for speed and real-time tasks
model = YOLO("yolo26n.pt")

# Run inference on the default webcam (source="0")
# 'stream=True' returns a generator for memory-efficient processing
# 'show=True' displays the video feed with bounding boxes in real-time
results = model.predict(source="0", stream=True, show=True)

# Iterate through the generator to process frames as they arrive
for result in results:
    # Example: Print the number of objects detected in the current frame
    print(f"Detected {len(result.boxes)} objects")

Werden Sie Mitglied der Ultralytics

Gestalten Sie die Zukunft der KI mit. Vernetzen Sie sich, arbeiten Sie zusammen und wachsen Sie mit globalen Innovatoren

Jetzt beitreten