Discover the power of image segmentation with Ultralytics YOLO. Explore pixel-level precision, types, applications, and real-world AI use cases.
Image segmentation is a core technique in computer vision (CV) that involves partitioning a digital image into multiple subgroups of pixels, commonly referred to as image segments. The primary objective is to simplify the representation of an image into something more meaningful and easier to analyze. Unlike object detection, which localizes objects within a rectangular bounding box, image segmentation provides a precise, pixel-level map of an object's shape. This process assigns a label to every pixel in an image, allowing artificial intelligence (AI) models to understand the exact boundaries and contours of entities within a scene.
In many modern machine learning (ML) workflows, knowing the approximate location of an object is insufficient. Applications requiring interaction with the physical world—such as a robot gripping a package or a car navigating a winding road—demand a granular understanding of geometry. Image segmentation bridges this gap by converting raw visual data into a set of classified regions. This capability is powered by advanced deep learning (DL) architectures, particularly Convolutional Neural Networks (CNNs), which extract spatial features to differentiate between foreground objects and the background.
Understanding the specific segmentation task is crucial for selecting the right model architecture. The three primary categories are:
The ability to delineate precise boundaries makes segmentation indispensable across various industries:
Modern frameworks have simplified the implementation of segmentation tasks. While older two-stage detectors like Mask R-CNN were accurate but slow, single-stage models have revolutionized the field by offering real-time inference. The Ultralytics YOLO11 model, for example, supports instance segmentation natively. Looking ahead, YOLO26 is being developed to further optimize these capabilities with end-to-end processing.
Developers can use standard libraries like OpenCV for pre-processing and visualization, while using PyTorch based frameworks for the heavy lifting of model inference.
Here is a concise example of how to perform instance segmentation using a pre-trained YOLO11 model in Python:
from ultralytics import YOLO
# Load a pre-trained YOLO11 segmentation model
model = YOLO("yolo11n-seg.pt")
# Run inference on an image (can be a local path or URL)
results = model("https://ultralytics.com/images/bus.jpg")
# Display the resulting image with segmentation masks overlaid
results[0].show()
This code snippet automatically handles the complex tasks of feature extraction, bounding box regression, and mask generation, allowing developers to focus on integrating the segmentation results into their larger applications.