Yolo Vision Shenzhen
Shenzhen
Rejoindre maintenant
Glossaire

Moteur d'inférence

Découvrez comment les moteurs d'inférence alimentent l'IA en fournissant des prédictions en temps réel, en optimisant les modèles et en permettant un déploiement multiplateforme.

An inference engine is a specialized software component designed to execute trained machine learning models and generate predictions from new data. Unlike the training phase, which focuses on learning patterns through computationally intensive processes like backpropagation, an inference engine is strictly optimized for the operational phase known as model deployment. Its primary goal is to run computations as efficiently as possible, minimizing inference latency and maximizing throughput on target hardware, whether that be a scalable cloud server or a battery-powered Edge AI device. By stripping away the overhead required for training, these engines allow complex neural networks to function in real-time applications.

Comment les moteurs d'inférence optimisent les performances

The transition from a training environment to an inference engine typically involves several optimization steps to streamline the model's structure. Because the model no longer needs to learn, the engine can discard data required for gradient updates, effectively freezing the model weights. Common techniques used by inference engines include layer fusion, where multiple operations are combined into a single step to reduce memory access, and model quantization, which converts weights from high-precision floating-point formats to lower-precision integers (e.g., INT8).

These optimizations allow advanced architectures like Ultralytics YOLO26 to run at incredibly high speeds without significant loss in accuracy. Different engines are often tailored to specific hardware ecosystems to unlock maximum performance:

  • NVIDIA TensorRT: Delivers high-performance inference on NVIDIA GPUs by utilizing hardware-specific kernels and optimizing the network graph.
  • Intel OpenVINO: Optimizes deep learning performance on Intel architectures, including CPUs and integrated graphics, making it ideal for edge computing.
  • ONNX Runtime: A cross-platform accelerator that supports models in the ONNX format, providing a bridge between different frameworks and hardware backends.

Applications concrètes

Inference engines are the silent drivers behind many modern AI conveniences, enabling computer vision systems to react instantly to their environment.

  1. Autonomous Vehicles: In self-driving cars, object detection models must identify pedestrians, traffic signs, and other vehicles in milliseconds. An inference engine running locally on the car's hardware ensures that this processing happens with real-time inference speeds, as relying on a cloud connection would introduce dangerous delays.
  2. Smart Manufacturing: Factories utilize industrial IoT cameras to inspect products on assembly lines. An inference engine processes video feeds to perform anomaly detection, instantly flagging defects. This automation reduces waste and ensures strict quality control without slowing down production.

Inference Engine Vs. Training Framework

It is helpful to distinguish between the software used to create the model and the engine used to run it. A Training Framework (like PyTorch or TensorFlow) provides the tools for designing architectures, calculating loss, and updating parameters via supervised learning. It prioritizes flexibility and debugging capabilities.

In contrast, the Inference Engine takes the finished artifact from the training framework and prioritizes execution speed and memory efficiency. While you can run inference within a training framework, it is rarely as efficient as using a dedicated engine, especially for deployment on mobile phones or embedded devices via tools like TensorFlow Lite or Apple Core ML.

Using An Inference Engine With YOLO26

Le ultralytics package abstracts much of the complexity of inference engines, allowing users to seamlessly run predictions. Under the hood, it handles the pre-processing of images and the execution of the model. For users looking to scale, the Plate-forme Ultralytics simplifies the process of training and exporting models to optimized formats compatible with various inference engines.

The following example demonstrates how to load a pre-trained YOLO26 model and run inference on an image:

from ultralytics import YOLO

# Load the YOLO26n model (nano version for speed)
model = YOLO("yolo26n.pt")

# Run inference on an image from a URL
# The 'predict' method acts as the interface to the inference process
results = model.predict("https://ultralytics.com/images/bus.jpg")

# Display the results
results[0].show()

Rejoindre la communauté Ultralytics

Rejoignez le futur de l'IA. Connectez-vous, collaborez et évoluez avec des innovateurs mondiaux.

Rejoindre maintenant