Yolo 비전 선전
선전
지금 참여하기
용어집

토큰

Explore how tokens act as the atomic units of AI processing. Learn how the [Ultralytics Platform](https://platform.ultralytics.com) uses tokens for NLP and computer vision.

In the sophisticated architecture of modern artificial intelligence, a token represents the fundamental, atomic unit of information that a model processes. Before an algorithm can interpret a sentence, analyze a software script, or recognize objects in an image, the raw input data must be broken down into these discrete, standardized elements. This segmentation is a pivotal step in data preprocessing, transforming unstructured inputs into a numerical format that neural networks can efficiently compute. While humans perceive language as a continuous stream of thoughts or images as seamless visual scenes, computational models require these granular building blocks to perform operations like pattern recognition and semantic analysis.

토큰 대 토큰화

To grasp the mechanics of machine learning, it is essential to distinguish between the data unit and the process used to create it. This differentiation prevents confusion when designing data pipelines and preparing training material on the Ultralytics Platform.

  • Tokenization: This is the algorithmic process (the verb) of splitting raw data into pieces. For text, this might involve using libraries like the Natural Language Toolkit (NLTK) to determine where one unit ends and another begins.
  • Token: This is the resulting output (the noun). It is the actual chunk of data—such as a word, a subword, or an image patch—that is eventually mapped to a numerical vector known as an embedding.

다양한 AI 영역에서의 토큰

The nature of a token varies significantly depending on the modality of the data being processed, particularly between textual and visual domains.

NLP의 텍스트 토큰

In the field of Natural Language Processing (NLP), tokens are the inputs for Large Language Models (LLMs). Early approaches mapped strictly to whole words, but modern architectures utilize subword algorithms like Byte Pair Encoding (BPE). This method allows models to handle rare words by breaking them into meaningful syllables, balancing vocabulary size with semantic coverage. For instance, the word "unhappiness" might be tokenized into "un", "happi", and "ness".

컴퓨터 비전의 비주얼 토큰

The concept of tokenization has expanded into computer vision with the advent of the Vision Transformer (ViT). Unlike traditional convolutional networks that process pixels in sliding windows, Transformers divide an image into a grid of fixed-size patches (e.g., 16x16 pixels). Each patch is flattened and treated as a distinct visual token. This approach enables the model to use self-attention mechanisms to understand the relationship between distant parts of an image, similar to how Google Research originally applied transformers to text.

실제 애플리케이션

토큰은 수많은 애플리케이션에서 인간 데이터와 기계 지능 사이의 가교 역할을 합니다.

  1. 개방형 어휘 객체 탐지: YOLO 같은 고급 모델은 텍스트 토큰이 시각적 특징과 상호작용하는 다중 모달 접근법을 사용합니다. 사용자는 사용자 정의 텍스트 프롬프트(예: "파란 헬멧")를 입력할 수 있으며, 모델은 이를 토큰화하여 이미지의 객체와 매칭합니다. 이를 통해 제로샷 학습이 가능해져 모델이 명시적으로 훈련되지 않은 객체도 탐지할 수 있습니다.
  2. 생성형 AI: 챗봇과 같은 텍스트 생성 시스템에서 AI는 시퀀스 내 다음 토큰의 확률을 예측하는 방식으로 작동합니다. 가장 가능성이 높은 후속 토큰을 반복적으로 선택함으로써 시스템은 일관된 문장과 단락을 구성하며, 자동화된 고객 지원부터 가상 비서에 이르는 다양한 도구를 구동합니다.

Python : 탐지를 위한 텍스트 토큰 사용

다음 코드 조각은 어떻게 하는지 보여줍니다. ultralytics package uses text tokens to guide 물체 감지. While the state-of-the-art YOLO26 is recommended for high-speed, fixed-class inference, the YOLO-World architecture uniquely allows users to define classes as text tokens at runtime.

from ultralytics import YOLO

# Load a pre-trained YOLO-World model capable of understanding text tokens
model = YOLO("yolov8s-world.pt")

# Define specific classes; these text strings are tokenized internally
# The model will look specifically for these "tokens" in the visual data
model.set_classes(["bus", "backpack"])

# Run prediction on an image using the defined tokens
results = model.predict("https://ultralytics.com/images/bus.jpg")

# Display the results showing only the tokenized classes
results[0].show()

Understanding tokens is fundamental to navigating the landscape of generative AI and advanced analytics. Whether enabling a chatbot to converse fluently or helping a vision system distinguish between subtle object classes, tokens remain the essential currency of machine intelligence used by frameworks like PyTorch and TensorFlow.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기