機械学習におけるF1スコアの重要性をご覧ください!最適なモデル評価のために、適合率と再現率のバランスをどのように取るかを学びます。
The F1-Score is a critical performance metric in machine learning that combines precision and recall into a single harmonic mean. It is particularly useful for evaluating classification models where the dataset is imbalanced or where false positives and false negatives carry different costs. Unlike straightforward accuracy, which can be misleading if one class dominates the dataset, the F1-Score provides a more balanced view of a model's ability to identify relevant instances correctly while minimizing errors. By penalizing extreme values, it ensures that a high score is only achieved when both precision and recall are reasonably high, making it a staple metric in fields ranging from medical diagnostics to information retrieval.
In many real-world scenarios, simply knowing the percentage of correct predictions (accuracy) is insufficient. For example, in anomaly detection, normal cases far outnumber anomalies. A model that predicts "normal" for every single input might achieve 99% accuracy but would be useless for detecting actual issues. The F1-Score addresses this by balancing two competing metrics:
Because there is often a trade-off—improving precision tends to lower recall and vice versa—the F1-Score acts as a unified metric to find an optimal balance point. This is crucial when tuning models using hyperparameter optimization to ensure robust performance across diverse conditions.
The utility of the F1-Score extends across various industries where the cost of error is significant.
Modern computer vision frameworks simplify the calculation of these metrics. When training object detection models, the F1-Score is automatically computed during the validation phase. The Ultralytics Platform visualizes these metrics in real-time charts, allowing users to see the curve of F1-Score against different confidence thresholds.
Here is how you can access validation metrics, including components of the F1-Score, using the Python API:
from ultralytics import YOLO
# Load a pre-trained YOLO26 model
model = YOLO("yolo26n.pt")
# Validate the model on a dataset (metrics are computed automatically)
# This returns a validator object containing precision, recall, and mAP
metrics = model.val(data="coco8.yaml")
# Print the Mean Average Precision (mAP50-95), which correlates with F1 performance
print(f"mAP50-95: {metrics.box.map}")
# Access precision and recall arrays to manually inspect the balance
print(f"Precision: {metrics.box.p}")
print(f"Recall: {metrics.box.r}")
Understanding how the F1-Score differs from other evaluation criteria is essential for selecting the right tool for your project.
If your model suffers from a low F1-Score, several strategies can help. Data augmentation can increase the variety of positive examples, helping the model generalize better. Employing transfer learning from robust foundation models allows the network to leverage pre-learned features. Additionally, adjusting the confidence threshold during inference can manually shift the balance between precision and recall to maximize the F1-Score for your specific use case.