Nicht-Maximum-UnterdrückungNMS)
Entdecken Sie Non-Maximum SuppressionNMS) für die Objekterkennung. Erfahren Sie, wie es die Ergebnisse verfeinert, die Genauigkeit erhöht und KI-Anwendungen wie YOLO unterstützt.
Non-Maximum Suppression (NMS) is a post-processing technique used in
object detection to refine the raw predictions
made by a model. When an object detection model analyzes an image, it often generates multiple overlapping
bounding boxes for a single object, each with an
associated confidence score. These redundant predictions
occur because the model may detect the same feature at slightly different scales or positions. NMS filters this output
by keeping only the most accurate bounding box for each object and discarding the others, ensuring that the final
output is clean, precise, and free of duplicates.
Wie Non-Maximum Suppression funktioniert
The NMS algorithm operates on a list of candidate bounding boxes and their corresponding confidence scores. The goal
is to select the best box for an object and suppress (remove) any other boxes that overlap significantly with it, as
these are likely duplicate detections of the same object. The process typically follows these steps:
-
Filtering: Eliminate all bounding boxes with confidence scores below a specific threshold (e.g.,
0.25) to remove weak predictions immediately.
- Sorting: Sort the remaining boxes in descending order based on their confidence scores.
- Selection: Pick the box with the highest confidence score as a valid detection.
-
Comparison: Compare this selected box with all other remaining boxes using
Intersection over Union (IoU), a
metric that measures the overlap between two boxes.
-
Suppression: If the IoU between the selected box and another box exceeds a predefined threshold
(e.g., 0.45), the lower-scoring box is considered a duplicate and is removed.
-
Iteration: Repeat the process with the next highest-scoring box that has not yet been suppressed or
selected, until all boxes are processed.
Anwendungsfälle in der Praxis
NMS is essential in scenarios where precision is paramount and duplicate detections can confuse downstream systems.
-
Autonomous Driving: In self-driving car systems, cameras detect pedestrians, other vehicles, and
traffic signs. A model might predict three slightly different boxes for a single pedestrian. NMS ensures the
vehicle's planning system receives only one coordinate for that pedestrian, preventing erratic braking or path
planning errors caused by "ghost" obstacles.
-
Retail Inventory Management: When using
computer vision to count products on a shelf,
items are often packed closely together. Without NMS, a single soda can might be counted twice due to overlapping
predictions, leading to inaccurate stock levels. NMS refines these detections to ensure the inventory count matches
reality.
NMS Implementation with PyTorch
While many modern frameworks handle NMS internally, understanding the implementation helps in tuning parameters. The
following example demonstrates how to apply NMS using the
PyTorch library:
import torch
import torchvision.ops as ops
# Example bounding boxes: [x1, y1, x2, y2]
boxes = torch.tensor(
[
[100, 100, 200, 200], # Box A
[105, 105, 195, 195], # Box B (High overlap with A)
[300, 300, 400, 400], # Box C (Distinct object)
],
dtype=torch.float32,
)
# Confidence scores for each box
scores = torch.tensor([0.9, 0.8, 0.95], dtype=torch.float32)
# Apply NMS with an IoU threshold of 0.5
# Boxes with IoU > 0.5 relative to the highest scoring box are suppressed
keep_indices = ops.nms(boxes, scores, iou_threshold=0.5)
print(f"Indices to keep: {keep_indices.tolist()}")
# Output will likely be [2, 0] corresponding to Box C (0.95) and Box A (0.9),
# while Box B (0.8) is suppressed due to overlap with A.
NMS vs. End-to-End Detection
Traditionally, NMS has been a mandatory "clean-up" step that sits outside the main neural network, adding
inference latency. However, the field is evolving
toward end-to-end architectures.
-
Standard NMS: A heuristic process that requires manual tuning of the IoU threshold. If the
threshold is too low, valid objects close to each other might be missed (low
recall). If too high, duplicates remains (low
precision).
-
End-to-End Models: Next-generation models like
YOLO26 are designed to be natively end-to-end. They learn
to predict exactly one box per object during training, effectively internalizing the NMS process. This eliminates
the need for external post-processing, resulting in faster inference speeds and simpler deployment pipelines on the
Ultralytics Platform.
Verwandte Konzepte
-
Soft-NMS: A
variation where overlapping boxes are not strictly removed but have their confidence scores reduced. This allows
somewhat overlapping objects (like people in a crowd) to still be detected if their scores remain high enough after
decay.
-
Anchor Boxes: Predefined box shapes
used by many detectors to estimate object size. NMS is applied to the final predictions refined from these anchors.
-
Intersection over Union (IoU):
The mathematical formula used by NMS to determine how much two boxes overlap, acting as the decision threshold for
suppression.