Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Model Serving

Learn how model serving bridges the gap between trained models and production. Explore deployment strategies for Ultralytics YOLO26 on the Ultralytics Platform.

Model serving is the process of hosting a trained machine learning model and making its functionality available to software applications via a network interface. It acts as the bridge between a static model file saved on a disk and a live system that processes real-world data. Once a model has completed the machine learning (ML) training phase, it must be integrated into a production environment where it can receive inputs—such as images, text, or tabular data—and return predictions. This is typically achieved by wrapping the model in an Application Programming Interface (API), allowing it to communicate with web servers, mobile apps, or IoT devices.

Link to this sectionThe Role of Model Serving in AI#

The primary goal of model serving is to operationalize predictive modeling capabilities effectively. While training focuses on accuracy and loss minimization, serving focuses on performance metrics like latency (how fast a prediction is returned) and throughput (how many requests can be handled per second). Robust serving infrastructure ensures that computer vision (CV) systems remain reliable under heavy loads. It often involves technologies like containerization using tools such as Docker, which packages the model with its dependencies to ensure consistent behavior across different computing environments.

Link to this sectionReal-World Applications#

Model serving powers ubiquitous AI features across various industries by enabling immediate decision-making based on data.

  • Smart Manufacturing: In industrial settings, AI in manufacturing systems use served models to inspect assembly lines. High-resolution images of components are sent to a local server, where a YOLO26 model detects defects like scratches or misalignments, triggering immediate alerts to remove faulty items.
  • Retail Automation: Retailers utilize AI in retail to enhance customer experiences. Cameras served by object detection models identify products in a checkout zone, tallying the total cost automatically without the need for manual barcode scanning.

Link to this sectionPractical Implementation#

To serve a model effectively, it is often beneficial to export models to a standardized format like ONNX, which promotes interoperability between different training frameworks and serving engines. The following example demonstrates how to load a model and run inference, simulating the logic that would exist inside a serving endpoint using Python.

from ultralytics import YOLO

# Load the YOLO26 model (this typically happens once when the server starts)
model = YOLO("yolo26n.pt")

# Simulate an incoming API request with an image source URL
image_source = "https://ultralytics.com/images/bus.jpg"

# Run inference to generate predictions for the user
results = model.predict(source=image_source)

# Process results (e.g., simulating a JSON response to a client)
print(f"Detected {len(results[0].boxes)} objects in the image.")

Link to this sectionChoosing the Right Strategy#

The choice of serving strategy depends heavily on the specific use case. Online Serving provides immediate responses via protocols like REST or gRPC, which is essential for user-facing web applications. Conversely, Batch Serving processes large volumes of data offline, suitable for tasks like nightly report generation. For applications requiring privacy or low latency without internet dependence, Edge AI moves the serving process directly to the device, utilizing optimized formats like TensorRT to maximize performance on constrained hardware. Many organizations leverage the Ultralytics Platform to simplify the deployment of these models to various endpoints, including cloud APIs and edge devices.

While closely related, "Model Serving" is distinct from Model Deployment and Inference.

  • Model Deployment: This refers to the broader lifecycle stage of releasing a model into a production environment. Serving is the specific mechanism or software (like NVIDIA Triton Inference Server or TorchServe) used to execute the deployed model.
  • Inference: This is the mathematical act of calculating a prediction from an input. Model serving provides the infrastructure (networking, scalability, and security) that allows inference to happen reliably for end-users.
  • Microservices: Serving is often architected as a set of microservices, where the model runs as an independent service that other parts of an application can query, often exchanging data in lightweight formats like JSON.

Explore solutions

Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning