Panoptic segmentationが、AIアプリケーションにおける正確なピクセルレベルのシーン理解のために、セマンティックセグメンテーションとインスタンスセグメンテーションを統合する様子をご覧ください。
Panoptic segmentation is a comprehensive computer vision (CV) task that unifies two distinct forms of image analysis: semantic segmentation and instance segmentation. While traditional methods treat these tasks separately—either classifying background regions like "sky" or "grass" generally, or detecting specific objects like "car" or "person"—panoptic segmentation combines them into a single, cohesive framework. This approach assigns a unique value to every pixel in an image, providing a complete scene understanding that distinguishes between countable objects (referred to as "things") and amorphous background regions (referred to as "stuff"). By ensuring that every pixel is accounted for and classified, this technique mimics human visual perception more closely than isolated detection methods.
To fully grasp panoptic segmentation, it is helpful to understand the dichotomy of visual information it processes. The task splits the visual world into two primary categories:
This distinction is crucial for advanced artificial intelligence (AI) systems, allowing them to navigate environments while simultaneously interacting with specific objects.
Modern panoptic segmentation architectures typically employ a powerful deep learning (DL) backbone, such as a Convolutional Neural Network (CNN) or a Vision Transformer (ViT), to extract rich feature representations from an image. The network generally splits into two branches or "heads":
A fusion module or post-processing step then resolves conflicts between these outputs—for example, deciding if a pixel belongs to a "person" instance or the "background" wall behind them—to produce a final, non-overlapping panoptic segmentation map.
The holistic nature of panoptic segmentation makes it indispensable for industries where safety and context are paramount.
While full panoptic training can be complex, developers can achieve high-precision instance segmentation—a critical component of the panoptic puzzle—using Ultralytics YOLO26. This state-of-the-art model offers real-time performance and is optimized for edge deployment.
Python 、事前学習済みセグメンテーションモデルを読み込み、 個別のオブジェクトを分離するための推論を実行する方法を示しています:
from ultralytics import YOLO
# Load the YOLO26 segmentation model
model = YOLO("yolo26n-seg.pt")
# Run inference on an image to segment individual instances
# The model identifies 'things' and generates pixel-perfect masks
results = model("https://ultralytics.com/images/bus.jpg")
# Display the resulting image with overlaid segmentation masks
results[0].show()
トレーニングデータの管理とアノテーションプロセスの自動化を目指すチーム向けに、Ultralytics データセット管理とモデルトレーニングのためのツール群を提供します。セグメンテーションタスクでは高品質なデータアノテーションが不可欠であり、モデルが効果的に学習するには正確なピクセルレベルのラベルが必要となります。
Understanding the nuances between segmentation types is vital for selecting the right model for your project:
For further exploration of dataset formats used in these tasks, you can review the COCO dataset documentation, which is a standard benchmark for measuring segmentation performance.