Glossary

Confidence

Define AI confidence scores. Learn how models gauge prediction certainty, set thresholds for reliability, and distinguish confidence from accuracy.

In machine learning, the confidence score is a numerical value assigned to an individual prediction, indicating the model's certainty that the prediction is correct. Expressed as a percentage or a probability value between 0 and 1, it quantifies the model's "belief" in its own output for a single instance. For example, in an object detection task, a model like Ultralytics YOLO11 might identify a car in an image and assign a confidence score of 0.95 (or 95%), suggesting it is very sure about its finding. This score is a critical output that helps users filter, prioritize, and interpret the model's results in real-world scenarios.

The confidence score is typically derived from the output of the final layer of a neural network (NN), often a softmax or sigmoid function. This value is instrumental in practical applications, where a confidence threshold is set to discard predictions that fall below a certain level of certainty. By adjusting this threshold, developers can balance the trade-off between capturing all relevant detections and minimizing false positives, a key consideration in model deployment.

Real-World Applications

Confidence scores are essential for making AI systems more reliable and actionable. They allow systems to gauge uncertainty and trigger different responses accordingly.

  • Autonomous Vehicles: In self-driving cars, confidence scores are vital for safety. An object detector might identify a pedestrian with 98% confidence, a clear signal for the vehicle to slow down or stop. Conversely, if it detects an object with only 30% confidence, the system might flag it as uncertain and use other sensors to verify its nature before taking action. This helps prevent accidents by focusing on high-certainty threats. For more details on this topic, you can read about the role of AI in self-driving cars.
  • Medical Image Analysis: When an AI model analyzes medical scans for signs of disease, such as detecting tumors in medical imaging, the confidence score is invaluable. A detection with 99% confidence can be immediately flagged for a radiologist's review. A finding with 60% confidence might be marked as "ambiguous" or "needs further review," ensuring that uncertain cases receive human scrutiny without overwhelming experts with false alarms. The FDA provides guidance on AI/ML in medical devices.

Confidence vs. Other Metrics

It's important not to confuse the confidence score of an individual prediction with overall model evaluation metrics. While related, they measure different aspects of performance:

  • Accuracy: Measures the overall percentage of correct predictions across the entire dataset. It provides a general sense of model performance but doesn't reflect the certainty of individual predictions. A model can have high accuracy but still make some predictions with low confidence.
  • Precision: Indicates the proportion of positive predictions that were actually correct. High precision means fewer false alarms. Confidence reflects the model's belief in its prediction, which might or might not align with correctness.
  • Recall (Sensitivity): Measures the proportion of actual positive instances that the model correctly identified. High recall means fewer missed detections. Confidence doesn't directly relate to how many actual positives were found.
  • F1-Score: The harmonic mean of Precision and Recall, providing a single metric that balances both. Confidence remains a prediction-level score, not an aggregate measure of model performance.
  • Mean Average Precision (mAP): A common metric in object detection that summarizes the precision-recall curve across different confidence thresholds and classes. While mAP calculation involves confidence thresholds, the confidence score itself applies to each individual detection.
  • Calibration: Refers to how well the confidence scores align with the actual probability of correctness. A well-calibrated model's predictions with 80% confidence should be correct about 80% of the time. Confidence scores from modern neural networks are not always inherently well-calibrated, as discussed in research on model calibration.

In summary, confidence is a valuable output for assessing the certainty of individual AI predictions, enabling better filtering, prioritization, and decision-making in real-world applications. It complements, but is distinct from, metrics that evaluate the overall performance of a model, such as those you can track and analyze using tools like Ultralytics HUB.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard