Glossary

AI Safety

Learn about AI Safety, the vital field for preventing unintended harm from AI systems. Discover its key pillars, real-world applications, and role in responsible AI.

AI Safety is a dedicated field within Artificial Intelligence (AI) focused on ensuring that AI systems operate reliably, predictably, and without causing unintended harm. As deep learning (DL) models become more autonomous and integrated into critical infrastructure, the potential consequences of system failures grow significantly. The primary objective of AI safety is to identify, analyze, and mitigate risks arising from technical glitches, unexpected behaviors, or misalignment between the AI's goals and human values. This discipline encompasses a wide range of practices, from rigorous model testing to the development of mathematical guarantees for system behavior.

Core Pillars of AI Safety

To build trustworthy systems, researchers and engineers focus on several foundational pillars that ensure machine learning (ML) models function correctly under varying conditions.

Robustness: A robust system must maintain performance even when encountering unexpected data or adversarial conditions. This involves defending against adversarial attacks, where malicious inputs are crafted to deceive a model. For example, a computer vision (CV) system should not misclassify a stop sign simply because of a sticker or bad lighting.
Alignment: This refers to the challenge of designing AI systems whose objectives accurately reflect human intentions. Misalignment can occur if a model finds a "shortcut" to achieve a high score on its loss function while violating safety constraints, a concept extensively studied by the Center for Human-Compatible AI.
Interpretability: Also known as Explainable AI (XAI), this principle emphasizes creating models that humans can understand. If a decision-making system fails, engineers must be able to inspect the internal model weights or activation maps to diagnose the error and prevent recurrence.
Monitoring: Continuous model monitoring is essential to detect data drift, where the data a model encounters in the real world diverges from its training data, potentially leading to unsafe predictions.

Real-World Applications

AI safety is not just theoretical; it is a critical requirement for deploying AI in automotive and healthcare sectors.

Autonomous Driving: Self-driving vehicles rely on object detection models to identify pedestrians, other vehicles, and obstacles. Safety protocols here involve redundancy—using LiDAR and radar alongside cameras—and "uncertainty estimation," where the car slows down or requests human intervention if the AI is unsure about an object. Organizations like Waymo publish detailed safety methodologies to validate these perception systems.
Medical Diagnostics: In medical image analysis, an AI assisting radiologists must maximize accuracy while minimizing false negatives. Safety mechanisms often include a "human-in-the-loop" workflow, where the AI only flags potential issues for doctor review rather than making a final diagnosis autonomously, ensuring patient safety is prioritized as highlighted in AI in healthcare solutions.

Implementing Safety Thresholds in Code

One basic method to enhance safety in deployment is to implement strict confidence thresholds. By ignoring low-confidence predictions, developers can prevent an AI agent from acting on weak or noisy data.

The following example demonstrates how to filter predictions using the Ultralytics YOLO11 model, ensuring only high-certainty detections are processed.

from ultralytics import YOLO

# Load the YOLO11 model for object detection
model = YOLO("yolo11n.pt")

# Perform inference on an image with a strict confidence threshold
# This ensures the model only reports objects it is at least 70% sure about
results = model.predict("https://ultralytics.com/images/bus.jpg", conf=0.70)

# Process only the safe, high-confidence detections
for result in results:
    print(f"Detected {len(result.boxes)} objects exceeding safety threshold.")

AI Safety vs. AI Ethics

While often used interchangeably, these terms address different aspects of responsible AI development.

AI Safety is primarily technical. It asks, "Will this system function as designed without crashing or causing physical accidents?" It deals with reliability, control, and error prevention, similar to safety engineering in civil aviation.
AI Ethics is societal and moral. It asks, "Is this system fair, and should we build it?" It focuses on issues like algorithmic bias, data privacy, and the socio-economic impact of automation. For deeper insights, explore our glossary entry on AI Ethics.

Frameworks such as the NIST AI Risk Management Framework provide guidelines for organizations to address both safety and ethical risks. As models evolve towards Artificial General Intelligence (AGI), the collaboration between safety researchers at institutes like the Future of Life Institute and industry developers becomes increasingly vital to ensure technology remains beneficial to humanity.

AI Safety

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Core Pillars of AI Safety

Real-World Applications

Implementing Safety Thresholds in Code

AI Safety vs. AI Ethics

Read more in this category

Self-supervised learning for denoising: A step-by-step breakdown

Future object detection trends: 7 key things to look out for

Enhancing vehicle re-identification with Ultralytics YOLO models

Join the Ultralytics community