Yolo Vision Shenzhen
Shenzhen
Join now

Deploy Ultralytics YOLO models using the ExecuTorch integration

Abirami Vina

5 min read

November 4, 2025

Explore how to export Ultralytics YOLO models like Ultralytics YOLO11 to ExecuTorch format for efficient, PyTorch-native deployment on edge and mobile devices.

Certain computer vision applications, like automated quality inspection, autonomous drones, or smart security systems, perform best when Ultralytics YOLO models, like Ultralytics YOLO11, run close to the sensor capturing images. In other words, these models need to process data directly where it’s generated, on cameras, drones, or embedded systems, rather than sending it to the cloud. 

This approach, known as edge AI, enables models to perform inference directly on the device where data is captured. By processing information locally instead of relying on remote servers, systems can achieve lower latency, enhanced data privacy, and greater reliability, even in environments with limited or no internet connectivity.

For example, a manufacturing camera that inspects thousands of products every minute, or a drone navigating complex environments, can’t afford the delays that come with cloud processing. Running YOLO11 directly on the device enables instant, on-device inference.

To make running Ultralytics YOLO models on the edge easier and more efficient, the new ExecuTorch integration supported by Ultralytics provides a streamlined way to export and deploy models directly to mobile and embedded devices. ExecuTorch is part of the PyTorch Edge ecosystem and provides an end-to-end solution for running AI models directly on mobile and edge hardware, including phones, wearables, embedded boards, and microcontrollers.

This integration makes it easy to take an Ultralytics YOLO model, such as YOLO11, from training to deployment on edge devices. By combining YOLO11’s vision capabilities with ExecuTorch’s lightweight runtime and PyTorch export pipeline, users can deploy models that run efficiently on edge hardware while preserving the accuracy and performance of PyTorch-based inference.

In this article, we’ll take a closer look at how the ExecuTorch integration works, why it’s a great fit for edge AI applications, and how you can start deploying Ultralytics YOLO models with ExecuTorch. Let’s get started!

What is ExecuTorch?

Typically, when you train a model in PyTorch, it runs on powerful servers or Graphics Processing Units (GPUs) in the cloud. However, deploying that same model to a mobile or embedded device, such as a smartphone, drone, or microcontroller, requires a specialized solution that can handle limited computing power, memory, and connectivity.

That’s exactly what ExecuTorch brings to the table. ExecuTorch is an end-to-end solution developed as part of the PyTorch Edge ecosystem that enables efficient on-device inference across mobile, embedded, and edge platforms. It extends PyTorch’s capabilities beyond the cloud, letting AI models run directly on local devices.

Bringing PyTorch inferencing to the edge

At its core, ExecuTorch provides a lightweight C++ runtime that allows PyTorch models to execute directly on the device. ExecuTorch uses the PyTorch ExecuTorch (.pte) model format, an optimized export designed for faster loading, smaller memory footprint, and improved portability. 

It supports XNNPACK as the default backend for efficient Central Processing Unit (CPU) inference and extends compatibility across a wide range of hardware backends, including Core ML, Metal, Vulkan, Qualcomm, MediaTek, Arm EthosU, OpenVINO, and others. 

These backends enable optimized acceleration on mobile, embedded, and specialized edge devices. ExecuTorch also integrates with the PyTorch export pipeline, providing support for advanced features such as quantization and dynamic shape handling to improve performance and adaptability across different deployment environments.

Quantization reduces model size and boosts inference speed by converting high-precision values (such as 32-bit floats) into lower-precision ones, while dynamic shape handling is used to enable models to process variable input sizes efficiently. Both features are crucial for running AI models on resource-limited edge devices.

Fig 1. A look at how ExecuTorch works (Source)

A unified layer for edge hardware

Beyond its runtime, ExecuTorch also acts as a unified abstraction layer for multiple hardware backends. Simply put, it abstracts away hardware-specific details and manages how models interact with different processing units, including CPUs, GPUs and Neural Processing Units (NPUs).

Once a model is exported, ExecuTorch can be configured to target the most suitable backend for a given device. Developers can deploy models efficiently across diverse hardware without writing custom device-specific code or maintaining separate conversion workflows.

Because of its modular, portable design and seamless PyTorch integration, ExecuTorch is a great option for deploying computer vision models like Ultralytics YOLO11 to mobile and embedded systems. It bridges the gap between model training and real-world deployment, making edge AI faster, more efficient, and easier to implement.

Key features of ExecuTorch

Before we look at how to export Ultralytics YOLO models to the ExecuTorch format, let’s explore what makes ExecuTorch a reliable option for deploying AI on edge.

Here’s a glimpse of some of its key features:

  • Quantization support: ExecuTorch supports model quantization, a technique that converts high-precision values into lower-precision ones to reduce model size and speed up inference. This helps models run faster and use less memory on edge devices while maintaining nearly the same level of accuracy.
  • Efficient use of memory: One of ExecuTorch’s biggest advantages is how it handles memory. Instead of relying on dynamic memory allocation, which can introduce latency and power overhead, ExecuTorch uses Ahead-of-Time (AOT) memory planning. During export, it analyzes the model graph and precomputes how much memory is needed for each operation. This allows the runtime to execute models using a static memory plan, ensuring predictable performance and preventing slowdowns or crashes on devices with limited RAM or processing capacity.
  • Built-in model metadata: When exporting using the integration supported by Ultralytics, each model includes a YAML file that contains important metadata such as input image size, class names, and configuration parameters. This additional file simplifies model integration into various applications and ensures consistent behavior across different edge platforms.

How to export Ultralytics YOLO models to ExecuTorch format

Now that we have a better understanding of what ExecuTorch offers, let’s walk through how to export Ultralytics YOLO models to the ExecuTorch format.

Step 1: Install the Ultralytics Python package

To get started, you’ll need to install the Ultralytics Python package using pip, which is a package installer. You can do this by running “pip install ultralytics” in your terminal or command prompt. 

If you’re working in a Jupyter Notebook or Google Colab environment, simply add an exclamation mark before the command, like "!pip install ultralytics". Once installed, the Ultralytics package provides all the tools you need to train, test, and export computer vision models, including Ultralytics YOLO11.

If you face any issues during installation or while exporting your model, the official Ultralytics documentation and Common Issues guide have detailed troubleshooting steps and best practices to help you get up and running smoothly.

Step 2: Exporting Ultralytics YOLO11

After installing the Ultralytics package, you can load a variant of the YOLO11 model and export it to the ExecuTorch format. For example, you can use a pre-trained model such as “yolo11n.pt” and export it by calling the export function with the format set to “executorch”. 

This creates a directory named “yolo11n_executorch_model”, which includes the optimized model file (.pte) and a separate metadata YAML file containing important details such as image size and class names.

Here’s the code to export your model:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")

model.export(format="executorch")

Step 3: Running inferences after exporting the model

Once exported, the model is ready to be deployed on edge and mobile devices using the ExecuTorch runtime. The exported .pte model file can be loaded into your application to run real-time, on-device inference without needing a cloud connection.

For example, the code snippet below shows how to load the exported model and run inference. Inference simply means using a trained model to make predictions on new data. Here, the model is tested on an image of a bus taken from a public URL.

executorch_model = YOLO("yolo11n_executorch_model")

results = executorch_model.predict("https://ultralytics.com/images/bus.jpg", save=True)

After running the code, you’ll find the output image with the detected objects saved in the “runs/detect/predict” folder. 

Fig 2. Detecting objects using an exported YOLO11 model in ExecuTorch format.

Benefits of using the ExecuTorch integration

While exploring the different export options supported by Ultralytics, you might wonder what makes the ExecuTorch integration unique. The key difference is how well it combines performance, simplicity, and flexibility, making it easy to deploy powerful AI models directly on mobile and edge devices.

Here’s a look at some of the key advantages of using the ExecuTorch integration:

  • Flexible deployment options: ExecuTorch models can be deployed across mobile applications, embedded systems, IoT (Internet of Things) devices, and specialized edge AI hardware. This flexibility enables developers to build scalable AI solutions that perform consistently across diverse platforms and environments.
  • Benchmark-proven performance: Tests on devices like the Raspberry Pi 5 show that YOLO11 models exported to ExecuTorch format run roughly 2x faster than their PyTorch counterparts, with nearly identical accuracy.
  • Flexible integration APIs: ExecuTorch provides C++, Kotlin, and Objective-C APIs for iOS, Android, and embedded Linux, allowing developers to integrate YOLO models directly into native apps.
  • Hardware acceleration support: ExecuTorch supports multiple hardware acceleration backends, including Vulkan and Metal for mobile GPUs, with optional integration for OpenCL and other vendor-specific APIs. It can also leverage dedicated accelerators such as NPUs and DSPs to achieve substantial speedups over CPU-only inference.

Real-world applications of YOLO11 and the ExecuTorch export

Recently, Ultralytics was recognized as a PyTorch ExecuTorch success story, highlighting our early support for on-device inference and ongoing contributions to the PyTorch ecosystem. This recognition reflects a shared goal of making high-performance AI more accessible on mobile and edge platforms.

From cloud to edge: How ExecuTorch and YOLO11 bring Vision AI to life

In action, this looks like real-world Vision AI solutions running efficiently on everything from smartphones to embedded systems. For instance, in manufacturing, edge devices play a crucial role in monitoring production lines and detecting defects in real time.

Fig 3. An example of using YOLO11 to analyze a manufacturing assembly line. (Source)

Instead of sending images or sensor data to the cloud for processing, which can introduce delays and depend on internet connectivity, the ExecuTorch integration enables YOLO11 models to run directly on local hardware. This means factories can detect quality issues instantly, reduce downtime, and maintain data privacy, all while operating with limited compute resources.

Here are a few other examples of how the ExecuTorch integration and Ultralytics YOLO models can be applied:

  • Smart cities: By running YOLO11 models locally with ExecuTorch, cities can make faster, data-driven decisions, from detecting traffic jams to identifying hazards, improving overall mobility and safety.
  • Retail and warehousing: With on-device inference, retailers can automate shelf monitoring, track inventory, and inspect packages quickly and securely without relying on cloud connections.
  • Robotics and drones: Edge-optimized YOLO11 models enable robots and drones to recognize objects, navigate environments, and make real-time decisions even without internet access.

Fig 4. Detecting and counting cars in traffic using YOLO11 (Source)

Key takeaways

Exporting Ultralytics YOLO models to ExecuTorch format makes it easy to deploy computer vision models across many devices, including smartphones, tablets, and embedded systems such as the Raspberry Pi. This means it is possible to run optimized, on-device inference without relying on cloud connectivity, improving speed, privacy, and reliability.

Along with ExecuTorch, Ultralytics supports a wide range of integrations, including TensorRT, OpenVINO, CoreML, and more, giving developers the flexibility to run their models across platforms. As Vision AI adoption grows, these integrations simplify the deployment of intelligent systems built to perform efficiently in real-world conditions.

Curious about AI? Check out our GitHub repository, join our community, and explore our licensing options to kickstart your Vision AI project. Learn more about innovations like AI in retail and computer vision in logistics by visiting our solutions pages.

Let’s build the future
of AI together!

Begin your journey with the future of machine learning

Start for free
Link copied to clipboard