Monitoring deployed computer vision models on Ultralytics Platform
Discover how to monitor computer vision models in production with Ultralytics Platform. Track metrics, detect issues, and improve reliability.
Discover how to monitor computer vision models in production with Ultralytics Platform. Track metrics, detect issues, and improve reliability.
Testing computer vision models that analyze images and video isn’t always the same as running them in production. During development, such models or algorithms are tested on clean, well-prepared datasets where conditions are controlled and predictable.
Once deployed, things become more dynamic. Models are exposed to real-world traffic, where request volumes can vary, response times can change, and occasional failures can happen.
At this stage, the focus shifts to whether the system is running reliably, with endpoints that stay available, responsive, and stable under changing conditions.
That’s why monitoring is essential. It provides a clear view of how deployed endpoints behave in production through metrics such as request volume, latency, error rates, and overall system health.
To make this easier, having the right tools in place becomes just as important as the model itself. Recently, Ultralytics introduced Ultralytics Platform, our new end-to-end environment that brings together the full computer vision workflow, from data and training to deployment and monitoring.

With monitoring built directly into this workflow, users can track endpoint health, inspect request behavior, and maintain reliable systems without relying on separate tools. In this article, we’ll explore how to use Ultralytics Platform to monitor deployed model endpoints and keep production systems running smoothly. Let’s get started!
In an AI model lifecycle, monitoring refers to observing how deployed systems behave once a model is live and serving real-world requests. While training and validation show how a machine learning model (ML model) performs on prepared datasets, monitoring focuses on how the deployed endpoint operates in a production environment.
A crucial part of monitoring is tracking system-level metrics that reflect reliability and responsiveness. Metrics such as latency and uptime help indicate how well the system is handling incoming requests. Latency measures how long it takes to process a request and return a response, while uptime highlights how consistently the endpoint remains available.
Another key aspect is observability, which provides visibility into how requests are handled. Each time an input, such as an image or video frame, is sent to a deployed model, it is processed as an inference request.
Logs capture these requests along with details like timestamps, response times, and status codes. These logs make it easier to trace requests, debug issues, and investigate failures when they occur. They are especially useful for identifying patterns such as repeated errors, slow responses, or unexpected system behavior.
By combining metrics and logs, monitoring helps users understand how their systems are operating in production and quickly respond to issues as they arise.
Before diving into production monitoring, let’s discuss the difference between model performance metrics and system metrics.
Typically, model monitoring is associated with evaluation metrics, also known as model metrics, such as accuracy, precision, recall, and mAP (mean average precision). These metrics are used to describe model behavior and assess the quality of model predictions, often in relation to production data or input data. They can be particularly useful for identifying edge cases or outliers in new data.
However, this is different from monitoring a deployed system in production. In this context, monitoring focuses on how the system is running rather than directly evaluating model predictions.
Instead of model metrics, deployment monitoring relies on system-level signals such as request volume, latency, error rates, and uptime. These metrics provide visibility into how endpoints handle input data, how consistently they respond, and how they operate in production environments.
Next, let’s look at a real-world example that highlights the need for a monitoring system in computer vision deployments.
Consider a vision solution that uses pose estimation, a computer vision task that is used to identify and analyze human body movements, to monitor construction workers for safety compliance. During initial model deployment, such a system might perform well under controlled conditions with clear visibility and standard scenarios.
However, real construction sites introduce additional complexity. Request volumes can vary throughout the day, network conditions may be inconsistent, and multiple cameras or endpoints may be sending data at the same time. These factors can lead to slower responses or occasional failures if the system isn’t properly monitored.

In a production environment like this, it becomes important to understand how reliably the system is operating. Monitoring provides visibility into whether endpoints are available, how quickly they respond to incoming requests, and how consistently they handle traffic over time.
For instance, an increase in latency may point to higher load or resource constraints, while a rise in error rates can indicate issues with request handling or system stability. Logs add more context by showing how individual requests are processed and where failures occur.
By tracking these signals, AI enthusiasts and data scientists can identify issues early, troubleshoot more effectively, and ensure that their systems continue to run reliably as real-world conditions change.
In many deep learning workflows, monitoring is often handled using separate tools for logging, metrics, and system health. This fragmented setup can make it difficult to get a clear view of how deployed endpoints are operating in production and adds complexity to managing deployments.
Ultralytics Platform simplifies this by bringing monitoring directly into a unified environment that covers the full computer vision workflow, from data ingestion and annotation to training, deployment, and monitoring.
With this integrated setup, users can track how their deployed endpoints handle real-world traffic without setting up external logging systems or additional dashboards. Everything is available in one place, making it easier to observe system behavior and maintain reliable deployments over time.
Monitoring features can be accessed directly from the Deploy tab. From a single dashboard, users can track key metrics, analyze request-level behavior, and visualize trends. These built-in visualizations make it easier to understand how solutions are performing without switching between tools.
By bringing monitoring, deployment, and model management together within the broader workflow, the platform reduces complexity. This makes it more seamless to focus on managing deployments, optimizing system performance, and maintaining reliability.
In real-world deployments, monitoring depends on having clear visibility into how systems are running as conditions change over time. It goes beyond tracking a few metrics and involves understanding how deployed endpoints behave across environments and managing multiple deployments effectively.
Inspired by feedback from the vision AI community on common computer vision challenges, the Ultralytics Platform includes several capabilities that make monitoring more practical and scalable.
Here’s an overview of some of these key features:
Next, we’ll walk through some of these features in more detail and see how they can be used to monitor deployed endpoints in production.
Once a model is deployed, monitoring starts with tracking key system metrics. While metrics such as accuracy and recall are useful during development, production monitoring focuses on system-level signals like response time and error rates, which provide measurable insights into how reliably endpoints are handling real-world traffic.
Ultralytics Platform provides a centralized dashboard that offers a clear view of endpoint activity and system behavior. In particular, the Deployment dashboard includes four key metrics that show how endpoints are being used and how they respond to incoming requests.
Here’s a closer look at these metrics:
Simply put, these metrics provide a clear view of how deployed endpoints operate in production. By analyzing usage patterns, teams and individuals can understand traffic distribution, identify peak load periods, and ensure that systems remain responsive and reliable as usage grows.
While metrics provide a high-level view of system performance, logs offer a more detailed look at how deployed endpoints handle individual requests. Logs record each request sent to an endpoint along with the corresponding response.
They are useful for tracking issues, inspecting failures, and understanding how requests are processed. Within the Ultralytics Platform, you can select any deployment to view its details, including logs.

A log entry in the platform is displayed in a structured format, making it easier to understand what happened during each request. Each entry includes a severity level, which indicates how important the event is, along with a timestamp showing when it occurred.
It also contains a message describing the event and HTTP-related details such as status codes and latency. This information helps trace requests, support troubleshooting, and debug issues more effectively. On top of this, logs are grouped by severity, so users can prioritize deployments that need attention.
Monitoring also involves understanding the overall health of deployed endpoints, including whether they are running properly, responding on time, and consistently handling requests without errors. The Ultralytics Platform provides a clear view of each deployment’s health status, making it straightforward to verify that endpoints are operating as expected.
The platform includes visual health indicators for each deployment, displayed on individual deployment cards.
For example, a green indicator shows that the endpoint is healthy and responding normally, while a red indicator signals issues or downtime. A spinning icon indicates that the system is actively checking the deployment’s status.
By tracking endpoint health over time, it becomes possible to detect issues early, maintain consistent performance, and ensure a stable experience for applications running in production.
Model monitoring isn't just about tracking metrics. It creates a feedback loop that supports continuous improvement over time. As endpoints handle real-world traffic, patterns begin to emerge in metrics and logs that can reveal issues such as increased latency, higher error rates, or inconsistent system behavior.
Monitoring highlights areas that need attention. For example, consistently high latency may indicate the need for better resource allocation or scaling, while a rising error rate can point to issues with request handling or system stability.
Once these issues are identified, steps can be taken to improve reliability. This may involve adjusting infrastructure, scaling resources, or fixing issues in how requests are processed. After these changes, systems can continue to be monitored to confirm that performance has improved.
By linking monitoring with ongoing improvements, users can maintain robust systems as usage grows and conditions change.
To get a better understanding of the impact of monitoring in a real-world scenario, let’s explore how it applies to the automation of aviation ground operations.
Take a vision system designed to monitor luggage loading and unloading during aircraft ground operations. In this setup, an object detection model like Ultralytics YOLO26 can be used to detect whether baggage falls off conveyor belts or handling equipment.
During testing and early deployment, the real-time system may perform well, accurately identifying luggage and responding quickly.
Begin your journey with the future of machine learning