Discover the power of object detection—identify and locate objects in images or videos with cutting-edge models like YOLO. Explore real-world applications!
Object detection is a pivotal capability within computer vision (CV) that enables software systems to not only recognize what an image represents but also to locate specific instances of items within it. While standard classification assigns a single label to an entire visual input, object detection provides a more granular understanding by predicting a bounding box around each identified entity, accompanied by a specific class label and a confidence score. This technology acts as the sensory foundation for advanced artificial intelligence (AI), allowing machines to perceive, interpret, and interact with the complexity of the physical world. From automated quality control in factories to advanced surveillance, it transforms unstructured pixel data into actionable insights.
Modern detectors primarily rely on deep learning (DL) architectures, specifically Convolutional Neural Networks (CNNs), to learn spatial hierarchies of features. A typical architecture consists of a backbone, such as ResNet or CSPNet, which extracts essential visual features from the input image. These features are then processed by a detection head that outputs the coordinates for bounding boxes and the probability of class membership.
To achieve high performance, models are trained on massive labeled collections like the COCO dataset, which provides a standard for benchmarking. During inference, algorithms often generate multiple overlapping boxes for the same object. Techniques like Non-Maximum Suppression (NMS) are applied to filter these redundancies, keeping only the box with the highest confidence and best Intersection over Union (IoU) with the ground truth.
Models are generally categorized into two types:
It is crucial to differentiate object detection from similar computer vision tasks.
Object detection is the engine behind many transformative technologies across various industries.
The following code snippet demonstrates how to perform object detection using a pre-trained YOLO11 model with the
ultralytics package. This simple workflow loads a model and runs inference on an image to identify
objects like buses and people.
from ultralytics import YOLO
# Load a pretrained YOLO11 model (n-scale for speed)
model = YOLO("yolo11n.pt")
# Run inference on a remote image source
results = model("https://ultralytics.com/images/bus.jpg")
# Display the results with bounding boxes and labels
results[0].show()