Optimisez les performances de l'IA grâce à la quantification des modèles. Réduisez la taille, augmentez la vitesse et améliorez l'efficacité énergétique pour les déploiements dans le monde réel.
Model quantization is a sophisticated model optimization technique used to reduce the computational and memory costs of running deep learning models. In standard training workflows, neural networks typically store parameters (weights and biases) and activation maps using 32-bit floating-point numbers (FP32). While this high precision ensures accurate calculations during training, it is often unnecessary for inference. Quantization converts these values into lower-precision formats, such as 16-bit floating-point (FP16) or 8-bit integers (INT8), effectively shrinking the model size and accelerating execution speed without significantly compromising accuracy.
The primary driver for quantization is the need to deploy powerful AI on resource-constrained hardware. As computer vision models like YOLO26 become more complex, their computational demands increase. Quantization addresses three critical bottlenecks:
Il est important de différencier la quantification des autres techniques d'optimisation, car elles modifient le modèle de manière distincte :
Quantization enables computer vision and AI across various industries where efficiency is paramount.
The Ultralytics library simplifies the export process, allowing developers to convert models like the cutting-edge YOLO26 into quantized formats. The Ultralytics Platform also provides tools to manage these deployments seamlessly.
The following example demonstrates how to export a model to TFLite with INT8 quantization enabled. This process involves a calibration step where the model observes sample data to determine the optimal dynamic range for the quantized values.
from ultralytics import YOLO
# Load a standard YOLO26 model
model = YOLO("yolo26n.pt")
# Export to TFLite format with INT8 quantization
# The 'int8' argument triggers Post-Training Quantization
# 'data' provides the calibration dataset needed for mapping values
model.export(format="tflite", int8=True, data="coco8.yaml")
Les modèles optimisés sont souvent déployés à l'aide de normes interopérables telles que ONNX ou des moteurs d'inférence haute performance tels que OpenVINO, garantissant une large compatibilité entre divers écosystèmes matériels.