Discover how instance segmentation refines object detection with pixel-level precision, enabling detailed object masks for AI applications.
Instance segmentation is a sophisticated computer vision (CV) technique that identifies, localizes, and delineates individual objects within an image at the pixel level. Unlike object detection, which approximates an object's location with a rectangular bounding box, instance segmentation generates a precise mask that outlines the exact shape of each distinct object. This granular level of detail allows systems to distinguish between multiple instances of the same class—such as separating two overlapping cars or individual people in a crowd—making it a critical component in advanced artificial intelligence (AI) applications.
To fully understand instance segmentation, it is helpful to compare it with other fundamental computer vision tasks:
Instance segmentation effectively merges the localization capabilities of object detection with the pixel-level precision of semantic segmentation.
Instance segmentation models generally employ deep learning (DL) architectures, specifically Convolutional Neural Networks (CNNs), to extract features from an image. The process typically involves two parallel steps:
Early approaches like Mask R-CNN utilized a two-stage process, first generating region proposals and then refining them. Modern architectures, such as Ultralytics YOLO11, have revolutionized this by performing detection and segmentation in a single stage. This enables real-time inference, making it possible to segment objects in live video streams with high speed and accuracy.
The precise boundary detection offered by instance segmentation is indispensable across various industries:
Developers can easily implement instance segmentation using the ultralytics Python package. The library
supports YOLO11 models pre-trained on the
COCO dataset, which can detect and segment 80 common object categories out of
the box.
Here is a concise example of how to load a model and run segmentation on an image:
from ultralytics import YOLO
# Load a pre-trained YOLO11 instance segmentation model
model = YOLO("yolo11n-seg.pt")
# Run inference on an image
# The model predicts classes, boxes, and masks simultaneously
results = model.predict("https://ultralytics.com/images/bus.jpg")
# Visualize the results with masks plotted
results[0].show()
For users looking to apply this to their own data, the framework supports training on custom datasets, allowing the model to learn new classes specific to niche applications.