Glossary

ONNX (Open Neural Network Exchange)

Discover how ONNX enhances AI model portability and interoperability, enabling seamless deployment of Ultralytics YOLO models across diverse platforms.

Open Neural Network Exchange (ONNX) is an open-source standard designed to represent machine learning (ML) models in a way that ensures portability across different frameworks and hardware. Originally developed by companies like Microsoft and Facebook, ONNX serves as a "universal translator" for AI. It allows developers to train a model in one ecosystem, such as PyTorch, and seamlessly deploy it in another, like TensorFlow or a specialized inference engine. This interoperability eliminates the need to rebuild or retrain networks when moving from research environments to production applications, significantly streamlining the model deployment pipeline.

How ONNX Works

At its core, ONNX defines a common set of operators—the building blocks of deep learning (DL) and machine learning models—and a standard file format. When a model is converted to ONNX, its computational structure is mapped to a static computation graph. In this graph, nodes represent mathematical operations (like convolutions or activation functions), and edges represent the flow of data tensors between them.

Because this graph representation is standardized, hardware manufacturers can build optimized execution providers for ONNX. This means a single .onnx file can be accelerated on diverse hardware, including a CPU, GPU (Graphics Processing Unit), or specialized TPU (Tensor Processing Unit), often using the high-performance ONNX Runtime.

Exporting Models to ONNX

For users of the ultralytics package, converting a trained model to the ONNX format is a straightforward process. The library handles the complex mapping of layers to the ONNX standard automatically. The following code snippet demonstrates how to export a YOLO11 model, preparing it for broader deployment.

from ultralytics import YOLO

# Load a pretrained YOLO11 model
model = YOLO("yolo11n.pt")

# Export the model to ONNX format
# This creates 'yolo11n.onnx' in the current directory
model.export(format="onnx")

Real-World Applications

The flexibility of ONNX makes it a critical component in modern AI infrastructure, particularly for computer vision (CV) tasks.

Cross-Platform Mobile Deployment: A developer might train an object detection model using PyTorch on a powerful workstation. However, the final application needs to run on both iOS and Android devices. By exporting the model to ONNX, the developer can integrate the same model file into mobile applications using the ONNX Runtime for Mobile, ensuring consistent behavior across different operating systems without maintaining separate codebases.
Integration with Legacy Systems: Many industrial applications are built using languages like C++ or C# for performance and stability. While Python is the standard for training, integrating a Python-based model into a C++ production environment can be slow and error-prone. ONNX bridges this gap. A manufacturing facility using computer vision in robotics can train a model in Python, export it to ONNX, and then load it directly into their C++ control software for high-speed real-time inference on the factory floor.

ONNX vs. Related Concepts

Understanding how ONNX interacts with other tools helps in selecting the right deployment strategy.

ONNX vs. TensorRT: While ONNX is a file format for representing models, TensorRT is a high-performance optimization SDK developed by NVIDIA specifically for NVIDIA GPUs. The two often work together; developers export models to ONNX and then use TensorRT to ingest that ONNX file, applying aggressive model optimization techniques like layer fusion and calibration for maximum speed on NVIDIA hardware.
ONNX vs. Framework Formats (e.g., .pt, .h5): Native formats like PyTorch's .pt or Keras's .h5 are excellent for training and saving model weights within their specific ecosystems. However, they often require the original framework to be installed to run the model. ONNX decouples the model from the training framework, making it easier to perform edge AI deployments where installing a full training library is impractical due to storage or memory constraints.
ONNX vs. Quantization: ONNX is a format, whereas model quantization is a technique to reduce model size and increase speed by lowering precision (e.g., from float32 to int8). The ONNX standard supports quantized operators, allowing developers to store and run quantized models efficiently.

ONNX (Open Neural Network Exchange)

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

How ONNX Works

Exporting Models to ONNX

Real-World Applications

ONNX vs. Related Concepts

Read more in this category

Tracking golf balls using Ultralytics YOLO models

Understanding why human-in-the-loop annotation is key

What is dataset distillation? A quick overview

Join the Ultralytics community