Seamlessly deploy Ultralytics YOLO11 using OpenVINO™

Abirami Vina

5 min read

July 1, 2025

Learn how exporting Ultralytics YOLO11 to the OpenVINO™ format enables lightning-fast inference on Intel® hardware, enhancing speed, scalability, and accuracy.

AI adoption depends on AI solutions being accessible, and a huge part of that is making them easy to deploy on the hardware people already have. Running AI models on GPUs (graphics processing units) is a great option in terms of performance and parallel processing power. 

However, the reality is that not everyone has access to high-end GPUs, especially in edge environments or on everyday laptops. That’s why it’s so important to optimize models to run efficiently on more widely available hardware like central processing units (CPUs), integrated GPUs, and neural processing units (NPUs).

Computer vision, for example, is a branch of AI that enables machines to analyze and understand images and video streams in real time. Vision AI models like Ultralytics YOLO11 support key tasks such as object detection and instance segmentation that power applications from retail analytics to medical diagnostics.

Fig 1. Using Ultralytics YOLO11 to detect and segment objects in a retail store.

To make computer vision more broadly accessible, Ultralytics has released an updated integration with the OpenVINO toolkit, which is an open-source project for optimizing and running AI inference across CPUs, GPUs, and NPUs. 

With this integration, it’s easier to export and deploy YOLO11 models with up to 3× faster inference on CPUs and accelerated performance on Intel GPUs and NPUs. In this article, we’ll walk through how to use the Ultralytics Python package to export YOLO11 models to the OpenVINO format and use it for inference. Let’s get started!

An overview of Ultralytics YOLO11

Before we dive into the details of the OpenVINO integration supported by Ultralytics, let’s take a closer look at what makes YOLO11 a reliable and impactful computer vision model. YOLO11 is the latest model in the Ultralytics YOLO series, offering significant enhancements in both speed and accuracy. 

One of its key highlights is efficiency. For instance, Ultralytics YOLO11m has 22% fewer parameters than Ultralytics YOLOv8m, yet it achieves higher mean average precision (mAP) on the COCO dataset. This means it runs faster and also detects objects more accurately, making it ideal for real-time applications where performance and responsiveness are critical.

Fig 2. Ultralytics YOLO11’s performance benchmarks.

Beyond object detection, YOLO11 supports various advanced computer vision tasks such as instance segmentation, pose estimation, image classification, object tracking, and oriented bounding box detection. YOLO11 is also developer-friendly, with the Ultralytics Python package providing a simple and consistent interface for training, evaluating, and deploying models. 

In addition to this, the Ultralytics Python package supports various integrations and multiple export formats, including OpenVINO, ONNX, TorchScript, allowing you to easily integrate YOLO11 into various deployment pipelines. Whether you're targeting cloud infrastructure, edge devices, or embedded systems, the export process is straightforward and adaptable to your hardware needs.

What is OpenVINO™?

OpenVINO™ (Open Visual Inference and Neural Network Optimization) is an open-source toolkit for optimizing and deploying AI inference across a wide range of hardware. It enables developers to run high-performance inference applications efficiently across various Intel platforms, including CPUs, integrated and discrete GPUs, NPUs, and field-programmable gate arrays (FPGAs).

OpenVINO provides a unified runtime interface that abstracts hardware differences through device-specific plugins. This means developers can write code once and deploy across multiple Intel hardware targets using a consistent API. 

Here are some of the key features that make OpenVINO a great choice for deployment:

  • Model converter: This tool converts and prepares models from popular frameworks such as PyTorch, ONNX, TensorFlow, PaddlePaddle, and others, so they can be optimized for efficient inference on Intel hardware.
  • Heterogeneous execution: You don’t need to rewrite your code for different Intel hardware. OpenVINO makes it easy to run the same model on any supported hardware, from CPUs to GPUs.
  • Quantization support: The toolkit supports reduced-precision formats like FP16 (default) and INT8, which help decrease model size and speed up inference without significantly affecting accuracy.
Fig 3. OpenVINO enables diverse deployment options.

Exploring the Ultralytics x OpenVINO integration

Now that we’ve explored what OpenVINO is and its significance, let’s discuss how to export YOLO11 models to the OpenVINO format and run efficient inference on Intel hardware.

Step 1: Install the Ultralytics Python package

To export a model to the OpenVINO format, you’ll first need to install the Ultralytics Python package. This package provides everything you need to train, evaluate, and export YOLO models, including YOLO11. 

You can install it by running the command "pip install ultralytics" in your terminal or command prompt. If you're working in an interactive environment like Jupyter Notebook or Google Colab, just add an exclamation mark before the command. 

Also, if you run into any issues during installation or while exporting, the Ultralytics documentation and troubleshooting guides are great resources to help you get back on track.

Step 2: Export your YOLO11 model to OpenVINO format

Once the Ultralytics package is set up, the next step is to load your YOLO11 model and convert it into a format compatible with OpenVINO. 

In the example below, we’re using a pre-trained YOLO11 model (“yolo11n.pt”). The export functionality is used to convert it into OpenVINO format. After running this code, the converted model will be saved in a new directory named “yolo11n_openvino_model”.

from ultralytics import YOLO

model = YOLO("yolo11n.pt")

model.export(format="openvino")

Step 3: Run inference with the exported model

Once your YOLO11 model is exported to OpenVINO format, you can run inferences in two ways: using the Ultralytics Python package or the native OpenVINO Runtime.

Using the Ultralytics Python package

The exported YOLO11 model can be easily deployed using the Ultralytics Python package, as shown in the code snippet below. This method is ideal for quick experimentation and streamlined deployment on Intel hardware. 

You can also specify which device to use for inference, such as "intel:cpu", "intel:gpu", or "intel:npu", depending on the Intel hardware available on your system.

ov_model = YOLO("yolo11n_openvino_model/")

results = ov_model("https://ultralytics.com/images/bus.jpg", device="intel:gpu")

After running the above code, the output image will be saved in the "runs/detect/predict" directory.

Fig 4. Using the exported YOLO11 model to detect objects in an image.

Using the native OpenVINO Runtime

If you are looking for a customizable way to run inference, especially in production environments, the OpenVINO Runtime gives you more control over how your model is executed. It supports advanced features such as asynchronous execution (running multiple inference requests in parallel) and load balancing (distributing inference workloads efficiently across Intel hardware).

To use the native runtime, you’ll need the exported model files: a .xml file (which defines the network architecture) and a .bin file (which stores the model’s trained weights). You can also configure additional parameters like input dimensions or preprocessing steps depending on your application.

A typical deployment flow includes initializing the OpenVINO core, loading and compiling the model for a target device, preparing the input, and executing inference. For detailed examples and step-by-step guidance, refer to the official Ultralytics OpenVINO documentation.

Why choose the Ultralytics x OpenVINO integration?

While exploring Ultralytics integrations, you’ll notice that the Ultralytics Python package supports exporting YOLO11 models to a variety of formats such as TorchScript, CoreML, TensorRT, and ONNX. So, why choose the OpenVINO integration?

Here are some reasons why the OpenVINO export format is a great fit for deploying models on Intel hardware:

  • Performance gains: You can experience up to 3× faster inference on Intel CPUs, with additional acceleration available on integrated GPUs and NPUs.
  • No retraining needed: You can export your existing YOLO11 models directly to OpenVINO format without modifying or retraining them.
  • Built for scale: The same exported model can be deployed across low-power edge devices and large-scale cloud infrastructure, simplifying scalable AI deployment.

You can also evaluate the performance benchmarks for the YOLO11 model across a range of Intel® platforms on the OpenVINO™ Model Hub. The OpenVINO Model Hub is a resource for developers to evaluate AI models on Intel hardware and discover the performance advantage of OpenVINO across Intel CPUs, built-in GPUs, NPUs, and discrete graphics. 

Fig 5. OpenVINO™ Model Hub: Performance Benchmarks for the YOLO11 Model across a range of Intel® platforms.

Applications of YOLO11 and the OpenVINO export format

With the help of the OpenVINO integration, deploying YOLO11 models across Intel hardware in real-world situations becomes much simpler. 

A great example is smart retail, where YOLO11 can help detect empty shelves in real time, track which products are running low, and analyze how customers move through the store. This enables retailers to improve inventory management and optimize store layouts for better shopper engagement.

Similarly, in smart cities, YOLO11 can be used to monitor traffic by counting vehicles, tracking pedestrians, and detecting red-light violations in real time. These insights can support traffic flow optimization, improve road safety, and assist in automated enforcement systems.

Fig 6. Counting vehicles using YOLO11.

Another interesting use case is industrial inspection, where YOLO11 can be deployed on production lines to automatically detect visual defects such as missing components, misalignment, or surface damage. This boosts efficiency, lowers costs, and supports better product quality.

Key factors to consider when using the OpenVINO toolkit

While deploying YOLO11 models with OpenVINO, here are a few important things to keep in mind to get the best results:

  • Check hardware compatibility: Ensure your Intel hardware, whether it’s a CPU, integrated GPU, or NPU, is supported by OpenVINO so the model can run efficiently.

  • Install the right drivers: If you're using Intel GPUs or NPUs, double-check that all required drivers are properly installed and up to date.

  • Understand precision tradeoffs: OpenVINO supports FP32, FP16, and INT8 model precisions. Each one comes with a tradeoff between speed and accuracy, so it's important to choose the right option based on your performance goals and available hardware.

Key takeaways

Exporting Ultralytics YOLO11 to the OpenVINO format makes it easy to run fast, efficient Vision AI models on Intel hardware. You can deploy across CPUs, GPUs, and NPUs without needing to retrain or change your code. It’s a great way to boost performance while keeping things simple and scalable.

With support built into the Ultralytics Python package, exporting and running inference with OpenVINO is straightforward. In just a few steps, you can optimize your model and run it across a variety of Intel platforms. Whether you're working on smart retail, traffic monitoring, or industrial inspection, this workflow helps you move from development to deployment with speed and confidence.

Join the YOLO community and check out the Ultralytics GitHub repository to learn more about impactful integrations supported by Ultralytics. Also, take a look at the Ultralytics licensing options to get started with computer vision today!

Register for our upcoming webinar to see the Ultralytics × OpenVINO integration in action, and visit the OpenVINO website to explore tools for optimizing and deploying AI at scale.

Let’s build the future
of AI together!

Begin your journey with the future of machine learning

Start for free
Link copied to clipboard