Deploying quantized Ultralytics YOLOv8 models on edge devices with DeGirum
Discover deploying quantized YOLOv8 models with DeGirum. Learn challenges, solutions, and deployment techniques for edge devices. Shape the future with us!

Welcome to the recap of another insightful talk from our YOLO VISION 2023 (YV23) event, held at the vibrant Google for Startups Campus in Madrid. This talk was delivered by Shashi Chilappagar, Chief Architect and Co-Founder at DeGirum. It delved into the fascinating world of quantization and deploying quantized models, exploring key challenges, solutions, and future possibilities.
Link to this sectionIntroduction to quantization and deploying quantized models#
Shashi provided a comprehensive overview of quantization, highlighting its importance in optimizing Ultralytics YOLO models for deployment on edge devices. From discussing the basics to exploring approaches for improving quantization, attendees gained valuable insights into the intricacies of model porting and deployment.
Link to this sectionChallenges in quantizing YOLO models#
Quantization often poses challenges, particularly with YOLO models in TFLite. Our audience learned about the significant drop in accuracy observed when all outputs are quantized with the same scale/zero point, shedding light on the complexities of maintaining model accuracy during the quantization process.
Link to this sectionImproving quantization of YOLO models#
Fortunately, solutions exist to address these challenges. The introduction of the DeGirum fork offers a quantization-friendly approach by separating outputs and optimizing bounding box decoding. With these enhancements, quantized model accuracy sees a significant improvement from baseline levels.
Link to this sectionMore quantization-friendly model architectures#
Exploring new model architectures is key to minimizing quantization loss. Attendees discovered how replacing SiLU with bounded ReLU6 activation leads to minimal quantization loss, offering promising results for maintaining accuracy in quantized models.
Link to this sectionDeploying quantized models#
Deploying quantized models has never been easier, with just five lines of code needed to run any model on the DeGirum cloud platform. A live code demo showcased the simplicity of detecting objects with a quantized Ultralytics YOLOv5 model, highlighting the seamless integration of quantized models into real-world applications.
To this effect, Ultralytics provides a variety of model deployment options, enabling end-users to effectively deploy their applications on embedded and edge devices. Different export formats include OpenVINO, TorchScript, TensorRT, CoreML, TFLite, and TFLite Edge TPU, offering versatility and compatibility.
This integration with third-party applications for deployment allows users to assess the performance of our models in real-world scenarios.
Link to this sectionUsing different models on different hardware#
Attendees also gained insights into the versatility of deploying different models on various hardware platforms, showcasing how a single codebase can support multiple models across different accelerators. Examples of running different detection tasks on diverse hardware platforms demonstrated the flexibility and scalability of our approach.
Link to this sectionResources and documentation#
To empower attendees further, we introduced a comprehensive resources section, providing access to our cloud platform, examples, documentation, and more. Our goal is to ensure that everyone has the tools and support they need to succeed in deploying quantized models effectively.
Link to this sectionWrapping up#
As the field of quantization evolves, it's essential to stay informed and engaged. We're committed to providing ongoing support and resources to help you navigate this exciting journey. Check out the full talk Watch the full talk!
Join us as we continue to explore the latest trends and innovations in machine learning and artificial intelligence. Together, we're shaping the future of technology and driving positive change in the world.






