Glossary

Data Labeling

Learn the fundamentals of data labeling for machine learning. Discover key types like object detection and how to accelerate workflows using Ultralytics YOLO26.

Data labeling is the fundamental process of identifying raw data—such as images, video frames, text, or audio—and adding informative tags or metadata to provide context. In the realm of machine learning (ML), algorithms cannot inherently understand the physical world; they require a "teacher" to guide them. This guidance comes in the form of labeled datasets used during supervised learning. The labels serve as the ground truth, representing the correct answers the model strives to predict. Whether training a simple classifier or a complex architecture like Ultralytics YOLO26, the accuracy, consistency, and quality of these labels are the primary determinants of a model's success.

Data Labeling vs. Data Annotation

While the terms are often used interchangeably in casual conversation, there is a subtle distinction worth noting. "Data labeling" generally refers to the broad act of assigning a category or tag to a piece of data (e.g., tagging an email as "spam"). In contrast, data annotation is often more specific to computer vision (CV), involving the precise delineation of objects using bounding boxes, polygons, or keypoints. However, within most ML operations (MLOps) workflows, both terms describe the creation of high-quality training data.

Key Types in Computer Vision

The method of labeling changes based on the task the model must perform. Common types include:

Image Classification: Assigning a single label to an entire image, such as identifying a weather condition as "cloudy" or "sunny."
Object Detection: Drawing 2D bounding boxes around distinct objects to teach the model what the object is and where it is located.
Instance Segmentation: Creating pixel-perfect masks or polygons around objects, which is essential for determining precise shapes and boundaries.
Pose Estimation: Marking specific keypoints on a subject, such as skeletal joints, to analyze movement or posture.

Real-World Applications

The utility of data labeling extends across virtually every industry employing AI.

Autonomous Vehicles: Self-driving cars rely on massive datasets where every vehicle, pedestrian, traffic sign, and lane marker is meticulously labeled. This labeled data allows the perception system to navigate complex environments safely. Autonomous vehicle companies invest heavily in pixel-level labeling to ensure safety compliance.
Precision Agriculture: In modern farming, AI in agriculture is used to detect crop diseases or monitor growth stages. Farmers use models trained on labeled images of "healthy" versus "diseased" leaves to automate treatment, reducing chemical usage and increasing yield.

The Labeling Workflow

Creating a labeled dataset is often the most time-consuming part of an AI project. The process typically involves a "Human-in-the-Loop" (HITL) approach, where human annotators verify labels to ensure high accuracy. Modern workflows leverage tools like the Ultralytics Platform, which simplifies dataset management and allows teams to collaborate on annotations. Advanced techniques like active learning can also be employed, where a model pre-labels the data, and humans only correct the low-confidence predictions, significantly speeding up the process.

The following example demonstrates how to use a pre-trained YOLO26 model to automatically generate labels (auto-labeling) for a new image, which can then be corrected by humans:

from ultralytics import YOLO

# Load the YOLO26n model (nano version)
model = YOLO("yolo26n.pt")

# Run inference on an image to detect objects
results = model("https://ultralytics.com/images/bus.jpg")

# Save the detection results to a text file in standard YOLO format
# This file can now be used as a starting point for data labeling
results[0].save_txt("bus_labels.txt")

Data Labeling

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Data Labeling vs. Data Annotation

Key Types in Computer Vision

Real-World Applications

The Labeling Workflow

Read more in this category

12 aerial imagery use cases powered by computer vision

What is monocular depth estimation? An overview

A look at using Ultralytics YOLO models for AI threat detection

Join the Ultralytics community