Yolo Vision Shenzhen
Шэньчжэнь
Присоединиться сейчас
Глоссарий

Наблюдаемость

Explore the core concepts of AI observability. Learn how to debug YOLO26 models, track metrics, and ensure reliability in production using the Ultralytics Platform.

Observability refers to the capability of understanding the internal state of a complex system based solely on its external outputs. In the rapidly evolving fields of Artificial Intelligence (AI) and Machine Learning (ML), observability goes beyond simple status checks to provide deep insights into why a model behaves in a certain way. As modern Deep Learning (DL) architectures—such as the state-of-the-art YOLO26—become increasingly sophisticated, they can often function as "black boxes." Observability tooling creates a transparent window into these systems, allowing engineering teams to debug unexpected behaviors, trace the root causes of errors, and ensure reliability in production environments.

Наблюдаемость против мониторинга

While often used interchangeably, observability and model monitoring serve distinct but complementary purposes within the MLOps lifecycle.

  • Model Monitoring is reactive and focuses on "known unknowns." It involves tracking predefined metrics like inference latency, CPU usage, or error rates against established thresholds. Monitoring answers the question: "Is the system healthy?"
  • Observability is proactive and addresses "unknown unknowns." It provides granular data—logs, traces, and high-cardinality events—needed to investigate novel issues that were not anticipated during the training data preparation. As described in the Google SRE Book, an observable system enables you to understand new behaviors without shipping new code. It answers the question: "Why is the system acting this way?"

Три столпа наблюдаемости

To achieve true observability in Computer Vision (CV) pipelines, systems typically rely on three primary types of telemetry data:

  1. Logs: Timestamped, immutable records of discrete events. In a detection pipeline, a log might capture the input image resolution or the specific hyperparameter tuning configuration used during a run. Structured logging, often in JSON format, allows for complex querying and analysis.
  2. Metrics: Aggregated numerical data measured over time, such as average precision, memory consumption, or GPU utilization. Tools like Prometheus and Grafana are standard for storing these time-series data to visualize trends.
  3. Traces: Tracing follows the lifecycle of a request as it flows through various microservices. For distributed AI applications, standards like OpenTelemetry help map the path of a request, highlighting bottlenecks in the inference engine or network delays. Specialized tools like Jaeger helps visualize these distributed transactions.

Implementing Observability in Python

You can enhance observability in your training pipelines by using callbacks to log specific internal states. The following example demonstrates how to add a custom callback to a YOLO26 training session to monitor performance metrics in real-time.

from ultralytics import YOLO

# Load the YOLO26 model
model = YOLO("yolo26n.pt")


# Define a custom callback for observability
def on_train_epoch_end(trainer):
    # Access and print specific metrics at the end of each epoch
    map50 = trainer.metrics.get("metrics/mAP50(B)", 0)
    print(f"Observability Log - Epoch {trainer.epoch + 1}: mAP50 is {map50:.4f}")


# Register the callback and start training
model.add_callback("on_train_epoch_end", on_train_epoch_end)
model.train(data="coco8.yaml", epochs=3)

Применение в реальном мире

Observability is critical for deploying high-performance models in dynamic environments where test data might not perfectly match real-world conditions.

  • Автономные транспортные средства: В разработке автономных транспортных средствнаблюдаемость позволяет инженерам восстанавливать точное состояние системы во время события отключения. Путем корреляции результатами обнаружения объектов с журналами датчиков и командами управления, команды могут определить, была ли ошибка торможения вызвана шумом датчика, ошибкой прогнозирования модели или логической ошибкой в модуле планирования.
  • Healthcare Diagnostics: In AI in healthcare, ensuring consistent performance is vital for patient safety. Observability tools can detect data drift if a model's performance degrades when applied to images from a new type of MRI scanner. Traces can reveal if the issue stems from a change in image data preprocessing or a shift in the input distribution, enabling rapid remediation without compromising AI safety.

Integration with Modern Tools

Modern workflows often integrate observability directly into the training platform. Users of the Ultralytics Platform benefit from built-in visualization of loss curves, system performance, and dataset analysis. Additionally, standard integrations with tools like TensorBoard and MLflow allow data scientists to maintain rigorous experiment tracking and observability across the entire model lifecycle.

Присоединяйтесь к сообществу Ultralytics

Присоединяйтесь к будущему ИИ. Общайтесь, сотрудничайте и развивайтесь вместе с мировыми новаторами

Присоединиться сейчас