Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Token

Learn how tokens serve as the fundamental units of information in AI. Explore their role in NLP, computer vision, and open-vocabulary detection with YOLO26.

In the sophisticated architecture of modern artificial intelligence, a token represents the fundamental, atomic unit of information that a model processes. Before an algorithm can interpret a sentence, analyze a software script, or recognize objects in an image, the raw input data must be broken down into these discrete, standardized elements. This segmentation is a pivotal step in data preprocessing, transforming unstructured inputs into a numerical format that neural networks can efficiently compute. While humans perceive language as a continuous stream of thoughts or images as seamless visual scenes, computational models require these granular building blocks to perform operations like pattern recognition and semantic analysis.

Link to this sectionToken vs. Tokenization#

To grasp the mechanics of machine learning, it is essential to distinguish between the data unit and the process used to create it. This differentiation prevents confusion when designing data pipelines and preparing training material on the Ultralytics Platform.

  • Tokenization: This is the algorithmic process (the verb) of splitting raw data into pieces. For text, this might involve using libraries like the Natural Language Toolkit (NLTK) to determine where one unit ends and another begins.
  • Token: This is the resulting output (the noun). It is the actual chunk of data—such as a word, a subword, or an image patch—that is eventually mapped to a numerical vector known as an embedding.

Link to this sectionTokens in Different AI Domains#

The nature of a token varies significantly depending on the modality of the data being processed, particularly between textual and visual domains.

Link to this sectionText Tokens in NLP#

In the field of Natural Language Processing (NLP), tokens are the inputs for Large Language Models (LLMs). Early approaches mapped strictly to whole words, but modern architectures utilize subword algorithms like Byte Pair Encoding (BPE). This method allows models to handle rare words by breaking them into meaningful syllables, balancing vocabulary size with semantic coverage. For instance, the word "unhappiness" might be tokenized into "un", "happi", and "ness".

Link to this sectionVisual Tokens in Computer Vision#

The concept of tokenization has expanded into computer vision with the advent of the Vision Transformer (ViT). Unlike traditional convolutional networks that process pixels in sliding windows, Transformers divide an image into a grid of fixed-size patches (e.g., 16x16 pixels). Each patch is flattened and treated as a distinct visual token. This approach enables the model to use self-attention mechanisms to understand the relationship between distant parts of an image, similar to how Google Research originally applied transformers to text.

Link to this sectionReal-World Applications#

Tokens act as the bridge between human data and machine intelligence in countless applications.

  1. Open-Vocabulary Object Detection: Advanced models like YOLO-World use a multi-modal approach where text tokens interact with visual features. A user can input custom text prompts (e.g., "blue helmet"), which the model tokenizes and matches against objects in the image. This enables zero-shot learning, allowing detection of objects the model was not explicitly trained on.

  2. Generative AI: In text generation systems like chatbots, the AI operates by predicting the probability of the next token in a sequence. By iteratively selecting the most likely subsequent token, the system constructs coherent sentences and paragraphs, powering tools ranging from automated customer support to virtual assistants.

Link to this sectionPython Example: Using Text Tokens for Detection#

The following code snippet demonstrates how the ultralytics package uses text tokens to guide object detection. While the state-of-the-art YOLO26 is recommended for high-speed, fixed-class inference, the YOLO-World architecture uniquely allows users to define classes as text tokens at runtime.

from ultralytics import YOLO

# Load a pre-trained YOLO-World model capable of understanding text tokens
model = YOLO("yolov8s-world.pt")

# Define specific classes; these text strings are tokenized internally
# The model will look specifically for these "tokens" in the visual data
model.set_classes(["bus", "backpack"])

# Run prediction on an image using the defined tokens
results = model.predict("https://ultralytics.com/images/bus.jpg")

# Display the results showing only the tokenized classes
results[0].show()

Understanding tokens is fundamental to navigating the landscape of generative AI and advanced analytics. Whether enabling a chatbot to converse fluently or helping a vision system distinguish between subtle object classes, tokens remain the essential currency of machine intelligence used by frameworks like PyTorch and TensorFlow.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning