Glossary

Token

Learn how tokens, the building blocks of AI models, power NLP, computer vision, and tasks like sentiment analysis and object detection.

In artificial intelligence, a token is the fundamental, discrete unit of data that a model processes. Before an AI model can analyze text or an image, the raw data must be broken down into these manageable pieces. For a language model, a token could be a word, a part of a word (a subword), or a single character. For a computer vision (CV) model, a token can be a small, fixed-size patch of an image. This process of breaking down data is a critical first step in the data preprocessing pipeline, as it converts complex, unstructured data into a structured format that neural networks can understand.

Token vs. Tokenization

It is essential to distinguish between a 'token' and 'tokenization'.

Token: The individual unit that results from the breakdown process. It is the actual piece of data—like the word "learn" or a 16x16 pixel image patch—that is fed into the model.
Tokenization: The method or process of performing this breakdown. It is the action of converting a sequence of text or an image into a sequence of tokens.

In short, tokenization is the action, and a token is the result of that action.

Types of Tokens and Their Importance

Tokens are the building blocks for how AI models perceive and interpret data. Once data is tokenized, each token is typically mapped to a numerical vector representation called an embedding. These embeddings capture the semantic meaning and context, allowing models built with frameworks like PyTorch or TensorFlow to learn complex patterns.

Word and Subword Tokens: In Natural Language Processing (NLP), using entire words as tokens can lead to enormous vocabularies and problems with unknown words. Subword tokenization, using algorithms like Byte Pair Encoding (BPE) or WordPiece, is a common solution. It breaks down rare words into smaller, meaningful parts. For example, the word "tokenization" might become two tokens: "token" and "##ization". This approach, used by models like BERT and GPT-4, helps the model handle complex vocabulary and grammatical structures. You can explore modern implementations in libraries like Hugging Face Tokenizers.
Visual Tokens: The concept of tokens extends beyond text into computer vision. In models like the Vision Transformer (ViT), an image is divided into a grid of patches (e.g., 16x16 pixels). Each patch is flattened and treated as a "visual token." This allows powerful Transformer architectures, which excel at processing sequences using self-attention, to perform tasks like image classification and object detection. This token-based approach is also foundational for multi-modal models that understand both images and text, such as CLIP.

Real-World Applications

The use of tokens is fundamental to countless AI systems, from simple applications to complex, state-of-the-art models.

Machine Translation: Services like Google Translate rely heavily on tokens. When you input a sentence, it is first broken down into a sequence of text tokens. A sophisticated sequence-to-sequence model processes these tokens, understands their collective meaning, and generates a new sequence of tokens in the target language. These output tokens are then assembled back into a coherent translated sentence. This process enables real-time translation across dozens of languages.
Autonomous Vehicles: In the field of autonomous vehicles, models must interpret complex visual scenes in real time. A model like Ultralytics YOLO11 processes camera feeds to perform tasks such as object tracking and instance segmentation. While classic CNN-based models like YOLO don't explicitly use "tokens" in the same way as Transformers, vision transformer variants designed for detection do. They break down the visual input into tokens (patches) to identify and locate pedestrians, other vehicles, and traffic signals with high accuracy. This tokenized understanding of the environment is crucial for safe navigation. Managing the entire workflow, from data collection to model deployment, can be streamlined using platforms like Ultralytics HUB.

Token

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

Token vs. Tokenization

Types of Tokens and Their Importance

Real-World Applications

Read more in this category

Industrial Internet of things (IIoT) explained

Key highlights from Ultralytics at WAIC 2025 in Shanghai

How is tea made using technologies like Vision AI?

Join the Ultralytics community