Yolo 深圳
深セン
今すぐ参加
用語集

データラベリング

機械学習におけるデータラベリングの重要な役割、そのプロセス、課題、そしてAI開発における現実世界の応用について解説します。

Data labeling is the fundamental process of identifying raw data—such as images, video frames, text, or audio—and adding informative tags or metadata to provide context. In the realm of machine learning (ML), algorithms cannot inherently understand the physical world; they require a "teacher" to guide them. This guidance comes in the form of labeled datasets used during supervised learning. The labels serve as the ground truth, representing the correct answers the model strives to predict. Whether training a simple classifier or a complex architecture like Ultralytics YOLO26, the accuracy, consistency, and quality of these labels are the primary determinants of a model's success.

Data Labeling vs. Data Annotation

While the terms are often used interchangeably in casual conversation, there is a subtle distinction worth noting. "Data labeling" generally refers to the broad act of assigning a category or tag to a piece of data (e.g., tagging an email as "spam"). In contrast, data annotation is often more specific to computer vision (CV), involving the precise delineation of objects using bounding boxes, polygons, or keypoints. However, within most ML operations (MLOps) workflows, both terms describe the creation of high-quality training data.

Key Types in Computer Vision

The method of labeling changes based on the task the model must perform. Common types include:

  • Image Classification: Assigning a single label to an entire image, such as identifying a weather condition as "cloudy" or "sunny."
  • Object Detection: Drawing 2D bounding boxes around distinct objects to teach the model what the object is and where it is located.
  • Instance Segmentation: Creating pixel-perfect masks or polygons around objects, which is essential for determining precise shapes and boundaries.
  • Pose Estimation: Marking specific keypoints on a subject, such as skeletal joints, to analyze movement or posture.

実際のアプリケーション

The utility of data labeling extends across virtually every industry employing AI.

  1. Autonomous Vehicles: Self-driving cars rely on massive datasets where every vehicle, pedestrian, traffic sign, and lane marker is meticulously labeled. This labeled data allows the perception system to navigate complex environments safely. Autonomous vehicle companies invest heavily in pixel-level labeling to ensure safety compliance.
  2. Precision Agriculture: In modern farming, AI in agriculture is used to detect crop diseases or monitor growth stages. Farmers use models trained on labeled images of "healthy" versus "diseased" leaves to automate treatment, reducing chemical usage and increasing yield.

The Labeling Workflow

Creating a labeled dataset is often the most time-consuming part of an AI project. The process typically involves a "Human-in-the-Loop" (HITL) approach, where human annotators verify labels to ensure high accuracy. Modern workflows leverage tools like the Ultralytics Platform, which simplifies dataset management and allows teams to collaborate on annotations. Advanced techniques like active learning can also be employed, where a model pre-labels the data, and humans only correct the low-confidence predictions, significantly speeding up the process.

The following example demonstrates how to use a pre-trained YOLO26 model to automatically generate labels (auto-labeling) for a new image, which can then be corrected by humans:

from ultralytics import YOLO

# Load the YOLO26n model (nano version)
model = YOLO("yolo26n.pt")

# Run inference on an image to detect objects
results = model("https://ultralytics.com/images/bus.jpg")

# Save the detection results to a text file in standard YOLO format
# This file can now be used as a starting point for data labeling
results[0].save_txt("bus_labels.txt")

Ultralytics コミュニティに参加する

AIの未来を共に切り開きましょう。グローバルなイノベーターと繋がり、協力し、成長を。

今すぐ参加