ROC 곡선과 AUC를 사용하여 AI/ML에서 분류기 성능을 평가하고, 사기 탐지 및 의료 진단과 같은 작업에서 TPR 대 FPR을 최적화하는 방법을 알아봅니다.
The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of binary classification models. In the realm of machine learning (ML), it visualizes the trade-off between a model's sensitivity and its specificty across all possible decision thresholds. Unlike single-value metrics like accuracy, which can be misleading if a dataset is imbalanced, the ROC curve provides a comprehensive view of how a classifier behaves as the criteria for identifying positive instances becomes more or less strict. This visualization is essential for engineers utilizing supervised learning techniques to determine the optimal operating point for their specific use case.
To understand an ROC curve, it is necessary to look at the two parameters plotted against each other: the True Positive Rate (TPR) and the False Positive Rate (FPR).
The curve illustrates a dynamic relationship: as you lower the confidence threshold to capture more positive cases (increasing TPR), you invariably increase the risk of flagging negative cases incorrectly (increasing FPR). A perfect classifier would reach the top-left corner of the graph, indicating 100% sensitivity and 0% false alarms. A model that makes random guesses would appear as a diagonal line from bottom-left to top-right. The overall performance is often summarized by the Area Under the Curve (AUC), where a value of 1.0 represents perfection.
The decision of where to set the threshold on an ROC curve depends entirely on the cost of errors in a specific industry application.
To plot an ROC curve, you need the raw prediction probabilities rather than just the final class labels. The following example uses the state-of-the-art YOLO26 model to generate classification scores.
from ultralytics import YOLO
# Load a pretrained YOLO26 classification model
model = YOLO("yolo26n-cls.pt")
# Run inference to get probability distribution
results = model("bus.jpg")
# Access the probability score for the predicted class
# These continuous scores are required to calculate TPR/FPR at different thresholds
print(f"Top Class Index: {results[0].probs.top1}")
print(f"Confidence Score: {results[0].probs.top1conf:.4f}")
Once these probabilities are collected for a validation set, developers can use libraries like Scikit-learn to compute the curve points. For managing datasets and tracking these metrics over time, the Ultralytics Platform offers integrated tools for model evaluation and deployment.
It is important to distinguish the ROC curve from other evaluation tools: