Yolo Vision Shenzhen
Shenzhen
Şimdi katılın
Sözlük

Kuantalama Farkındalıklı Eğitim (QAT)

Learn how Quantization-Aware Training (QAT) optimizes [YOLO26](https://docs.ultralytics.com/models/yolo26/) for edge devices. Recover accuracy and reduce latency for efficient INT8 deployment.

Quantization-Aware Training (QAT) is a specialized technique used during the training phase of machine learning models to prepare them for lower-precision environments. In standard deep learning workflows, models typically operate using high-precision 32-bit floating-point numbers (FP32). While this precision offers excellent accuracy, it can be computationally expensive and memory-intensive, especially on edge devices. QAT simulates the effects of quantization—reducing precision to formats like 8-bit integers (INT8)—while the model is still training. By introducing these quantization errors during the learning process, the model learns to adapt its weights and effectively recover accuracy that might otherwise be lost during post-training conversion.

Why QAT Matters for Edge Deployment

Deploying computer vision models to resource-constrained devices often requires a balance between speed and performance. Standard quantization methods, known as Post-Training Quantization (PTQ), apply precision reduction only after the model is fully trained. While PTQ is fast, it can sometimes degrade the accuracy of sensitive models because the neural network weights are significantly altered without a chance to adjust.

QAT solves this by allowing the model to "practice" being quantized. During the forward pass of training, the weights and activations are simulated as low-precision values. This allows the gradient descent process to update the model parameters in a way that minimizes the loss specifically for the quantized state. The result is a robust model that retains high accuracy even when deployed on hardware like microcontrollers or mobile processors.

Differentiating QAT from Post-Training Quantization (PTQ)

It is helpful to distinguish QAT from model quantization, specifically Post-Training Quantization (PTQ):

  • Post-Training Quantization (PTQ): The model is trained normally in FP32. After training is complete, the weights are converted to INT8. This is faster and requires no retraining but may result in higher accuracy loss for complex architectures.
  • Quantization-Aware Training (QAT): The quantization process is emulated during the fine-tuning stage. The model adjusts its internal parameters to accommodate the noise introduced by lower precision, typically yielding better accuracy than PTQ.

Gerçek Dünya Uygulamaları

QAT is essential for industries where real-time inference on edge hardware is critical.

  • Autonomous Drones: In AI drone operations, battery life and onboard processing power are severely limited. Drones using models optimized via QAT can detect obstacles or track objects with high precision while using INT8 accelerators, significantly extending flight times compared to FP32 models.
  • Smart Retail Cameras: Supermarkets use computer vision in retail to monitor shelf inventory or manage checkout lines. These systems often run on low-power edge gateways. QAT ensures that the object detection models running on these devices maintain the accuracy needed to distinguish between similar products without requiring expensive cloud connectivity.

Implementing QAT with Ultralytics

The Ultralytics Platform and the YOLO ecosystem support exporting models to quantized formats. While QAT is a complex training procedure, modern frameworks facilitate the preparation of models for quantized inference.

Below is an example of how you might export a trained YOLO26 model to an INT8 quantized TFLite format, which utilizes the principles of quantization for efficient edge deployment.

from ultralytics import YOLO

# Load a trained YOLO26 model
model = YOLO("yolo26n.pt")

# Export the model to TFLite format with INT8 quantization
# This prepares the model for efficient execution on edge devices
model.export(format="tflite", int8=True)

Edge Ekosistemleri ile Entegrasyon

Models optimized via quantization techniques are designed to run on specialized inference engines. QAT-trained models are frequently deployed using ONNX Runtime for cross-platform compatibility or OpenVINO for optimization on Intel hardware. This ensures that whether the target is a Raspberry Pi or a dedicated Edge TPU, the model operates with the highest possible efficiency and speed.

Key Concepts Related to QAT

To fully understand QAT, it helps to be familiar with several related machine learning concepts:

  • Precision: Refers to the level of detail used to represent numbers. Half-precision (FP16) and INT8 are common targets for quantization.
  • Calibration: The process of determining the range of dynamic activation values (min/max) to map floating-point numbers to integers effectively. This is a crucial step in deploying quantized YOLO models.
  • Inference Latency: One of the primary benefits of QAT is reducing inference latency, allowing for faster decision-making in real-time systems.
  • Fine-Tuning: QAT is often performed as a fine-tuning step on a pre-trained model rather than training from scratch, saving computational resources.

By integrating Quantization-Aware Training into the MLOps pipeline, developers can bridge the gap between high-accuracy research models and highly efficient, production-ready edge AI applications.

Ultralytics topluluğuna katılın

Yapay zekanın geleceğine katılın. Küresel yenilikçilerle bağlantı kurun, işbirliği yapın ve birlikte büyüyün

Şimdi katılın