Yolo Vision Shenzhen
Shenzhen
Jetzt beitreten
Glossar

Observability (Beobachtbarkeit)

Explore the core concepts of AI observability. Learn how to debug YOLO26 models, track metrics, and ensure reliability in production using the Ultralytics Platform.

Observability refers to the capability of understanding the internal state of a complex system based solely on its external outputs. In the rapidly evolving fields of Artificial Intelligence (AI) and Machine Learning (ML), observability goes beyond simple status checks to provide deep insights into why a model behaves in a certain way. As modern Deep Learning (DL) architectures—such as the state-of-the-art YOLO26—become increasingly sophisticated, they can often function as "black boxes." Observability tooling creates a transparent window into these systems, allowing engineering teams to debug unexpected behaviors, trace the root causes of errors, and ensure reliability in production environments.

Beobachtbarkeit vs. Überwachung

While often used interchangeably, observability and model monitoring serve distinct but complementary purposes within the MLOps lifecycle.

  • Model Monitoring is reactive and focuses on "known unknowns." It involves tracking predefined metrics like inference latency, CPU usage, or error rates against established thresholds. Monitoring answers the question: "Is the system healthy?"
  • Observability is proactive and addresses "unknown unknowns." It provides granular data—logs, traces, and high-cardinality events—needed to investigate novel issues that were not anticipated during the training data preparation. As described in the Google SRE Book, an observable system enables you to understand new behaviors without shipping new code. It answers the question: "Why is the system acting this way?"

Die drei Säulen der Observability

To achieve true observability in Computer Vision (CV) pipelines, systems typically rely on three primary types of telemetry data:

  1. Logs: Timestamped, immutable records of discrete events. In a detection pipeline, a log might capture the input image resolution or the specific hyperparameter tuning configuration used during a run. Structured logging, often in JSON format, allows for complex querying and analysis.
  2. Metrics: Aggregated numerical data measured over time, such as average precision, memory consumption, or GPU utilization. Tools like Prometheus and Grafana are standard for storing these time-series data to visualize trends.
  3. Traces: Tracing follows the lifecycle of a request as it flows through various microservices. For distributed AI applications, standards like OpenTelemetry help map the path of a request, highlighting bottlenecks in the inference engine or network delays. Specialized tools like Jaeger helps visualize these distributed transactions.

Implementing Observability in Python

You can enhance observability in your training pipelines by using callbacks to log specific internal states. The following example demonstrates how to add a custom callback to a YOLO26 training session to monitor performance metrics in real-time.

from ultralytics import YOLO

# Load the YOLO26 model
model = YOLO("yolo26n.pt")


# Define a custom callback for observability
def on_train_epoch_end(trainer):
    # Access and print specific metrics at the end of each epoch
    map50 = trainer.metrics.get("metrics/mAP50(B)", 0)
    print(f"Observability Log - Epoch {trainer.epoch + 1}: mAP50 is {map50:.4f}")


# Register the callback and start training
model.add_callback("on_train_epoch_end", on_train_epoch_end)
model.train(data="coco8.yaml", epochs=3)

Anwendungsfälle in der Praxis

Observability is critical for deploying high-performance models in dynamic environments where test data might not perfectly match real-world conditions.

  • Autonome Fahrzeuge: Bei der Entwicklung von autonomen Fahrzeugenermöglicht die Beobachtbarkeit den Ingenieuren, den genauen Zustand des Systems während eines Disengagement-Ereignisses zu rekonstruieren. Durch die Korrelation von Objekterkennung mit Sensorprotokollen und Steuerbefehlen können Teams feststellen, ob ein Bremsfehler durch Sensorrauschen, einen Modellvorhersagefehler oder einen Logikfehler im Planungsmodul verursacht wurde.
  • Healthcare Diagnostics: In AI in healthcare, ensuring consistent performance is vital for patient safety. Observability tools can detect data drift if a model's performance degrades when applied to images from a new type of MRI scanner. Traces can reveal if the issue stems from a change in image data preprocessing or a shift in the input distribution, enabling rapid remediation without compromising AI safety.

Integration with Modern Tools

Modern workflows often integrate observability directly into the training platform. Users of the Ultralytics Platform benefit from built-in visualization of loss curves, system performance, and dataset analysis. Additionally, standard integrations with tools like TensorBoard and MLflow allow data scientists to maintain rigorous experiment tracking and observability across the entire model lifecycle.

Werden Sie Mitglied der Ultralytics

Gestalten Sie die Zukunft der KI mit. Vernetzen Sie sich, arbeiten Sie zusammen und wachsen Sie mit globalen Innovatoren

Jetzt beitreten