Define AI confidence scores. Learn how models gauge prediction certainty, set thresholds for reliability, and distinguish confidence from accuracy.
In the realm of machine learning and artificial intelligence, a confidence score is a numerical value that represents the likelihood that a specific prediction made by a model is correct. Typically expressed as a probability between 0 and 1 (or a percentage from 0% to 100%), this score quantifies the certainty of the neural network regarding its output. For instance, in an object detection task, the system might predict the presence of a "cat" with a confidence of 0.95, indicating a strong belief in the accuracy of that label. These scores are usually derived from the final layer of the model using activation functions such as the softmax function for multi-class problems or the sigmoid function for binary classification.
Confidence scores are a fundamental component of the inference engine workflow. They allow developers to filter predictions based on a required level of certainty, a process known as thresholding. By setting a specific confidence threshold, you can effectively manage the trade-off between identifying every possible object (high recall) and ensuring that identified objects are correct (high precision).
In practical model deployment, raw predictions often contain noise or low-probability detections. Techniques like non-maximum suppression (NMS) utilize confidence scores to eliminate redundant overlapping boxes, keeping only the detection with the highest probability. This ensures that the final output presented to the user is clean and actionable.
The following example demonstrates how to apply a confidence threshold during inference using Ultralytics YOLO11:
from ultralytics import YOLO
# Load a pretrained YOLO11 model
model = YOLO("yolo11n.pt")
# Run inference on an image with a confidence threshold of 0.6 (60%)
# This filters out any detections with a confidence score lower than 0.6
results = model.predict("https://ultralytics.com/images/bus.jpg", conf=0.6)
# Display the count of objects detected above the threshold
print(f"Detected {len(results[0].boxes)} objects with high confidence.")
The utility of confidence scores extends across virtually every industry deploying computer vision and AI solutions.
It is vital for practitioners to distinguish "confidence" from standard evaluation metrics used to benchmark models.
If a model consistently yields low confidence for valid objects, it may indicate issues with the training data. Strategies to improve this include data augmentation to expose the model to more varied lighting and orientations, or employing active learning to annotate and retrain on the specific "edge cases" where the model is currently uncertain. Ensuring diverse and high-quality datasets is essential for building robust systems that users can trust.