ロボットや監視などのリアルタイム・アプリケーションに理想的な、YOLOような1段式物体検出器のスピードと効率をご覧ください。
One-stage object detectors are a powerful class of deep learning architectures designed to perform object detection tasks with exceptional speed and efficiency. Unlike traditional two-stage object detectors, which divide the detection process into separate steps for region proposal and subsequent classification, one-stage models analyze the entire image in a single pass. By framing detection as a direct regression problem, these networks simultaneously predict bounding box coordinates and class probabilities directly from input pixels. This streamlined approach significantly reduces computational overhead, making one-stage detectors the preferred choice for applications requiring real-time inference and deployment on resource-constrained edge AI devices.
The architecture of a one-stage detector typically centers around a convolutional neural network (CNN) that serves as a backbone for feature extraction. As an image passes through the network, the model generates a grid of feature maps that encode spatial and semantic information.
Early implementations, such as the Single Shot MultiBox Detector (SSD), relied on predefined anchor boxes at various scales to localize objects. However, modern advancements like Ultralytics YOLO11 and the state-of-the-art YOLO26 have largely shifted toward anchor-free designs. These newer architectures predict object centers and sizes directly, eliminating the need for complex hyperparameter tuning associated with anchors. The final output consists of coordinate vectors for localization and a confidence score that represents the model's certainty regarding the detected object.
これら二つの主要なカテゴリーを区別することは、特定のタスクに適したツールを選択するのに役立ちます:
The efficiency of one-stage detectors has driven their widespread adoption across diverse industries where immediate responsiveness is critical:
Implementing a one-stage detector is straightforward using modern high-level APIs. To ensure accurate results, models often predict multiple potential boxes, which are then filtered using techniques like Non-Maximum Suppression (NMS) based on Intersection over Union (IoU) thresholds, though newer end-to-end models like YOLO26 handle this natively.
Python 、最先端のYOLO26モデルを読み込み、画像に対して推論を実行する方法を示しています:
from ultralytics import YOLO
# Load the YOLO26 model, the latest natively end-to-end one-stage detector
model = YOLO("yolo26n.pt")
# Run inference on an image URL to detect objects
results = model("https://ultralytics.com/images/bus.jpg")
# Display the first result with bounding boxes and labels
results[0].show()
The evolution of one-stage detectors has focused on overcoming the "accuracy vs. speed" trade-off. Techniques such as Focal Loss were introduced to address class imbalance during training, ensuring that the model focuses on hard-to-classify examples rather than the abundant background. Furthermore, the integration of Feature Pyramid Networks (FPN) allows these models to detect objects at different scales effectively.
Today, researchers and developers can easily train these advanced architectures on custom datasets using tools like the Ultralytics Platform, which simplifies the workflow from data annotation to model deployment. Whether for agriculture or healthcare, the accessibility of one-stage detectors is democratizing powerful computer vision capabilities.