Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Transformer

Explore the Transformer architecture and self-attention mechanism. Learn how they power AI models like RT-DETR and Ultralytics YOLO26 for superior accuracy.

A Transformer is a deep learning architecture that relies on a mechanism called self-attention to process sequential input data, such as natural language or visual features. Originally introduced by Google researchers in the landmark paper _ Attention Is All You Need_, the Transformer revolutionized the field of artificial intelligence (AI) by discarding the sequential processing limitations of earlier Recurrent Neural Networks (RNNs). Instead, Transformers analyze entire sequences of data simultaneously, allowing for massive parallelization and significantly faster training times on modern hardware like GPUs.

Link to this sectionHow Transformers Work#

The core innovation of the Transformer is the self-attention mechanism. This allows the model to weigh the importance of different parts of the input data relative to each other. For instance, in a sentence, the model can learn that the word "bank" relates more closely to "money" than to "river" based on the surrounding context.

This architecture generally consists of two main components:

  • Encoder: Processes the input data into a rich numerical representation or embedding.
  • Decoder: Uses the encoder's output to generate the final result, such as a translated sentence or a predicted bounding box.

In the realm of computer vision (CV), models usually employ a variation called the Vision Transformer (ViT). Instead of processing text tokens, the image is split into fixed-size patches (e.g., 16x16 pixels). These patches are flattened and treated as a sequence, enabling the model to capture "global context"—understanding relationships between distant parts of an image—more effectively than a standard Convolutional Neural Network (CNN).

It is important to distinguish the Transformer architecture from related terms:

  • Attention Mechanism: This is the general concept of focusing on specific parts of data. The Transformer is a specific architecture built entirely around attention layers, whereas other models might use attention only as a small add-on.
  • Large Language Model (LLM): Terms like "GPT" refer to specific models trained on vast amounts of text. Almost all modern LLMs use the Transformer architecture as their underlying engine.

Link to this sectionReal-World Applications#

The versatility of Transformers has led to their adoption across various industries:

  1. Medical Imaging: In AI in Healthcare, Transformers are used for complex tasks like medical image analysis. Their ability to understand global spatial relationships helps in detecting subtle anomalies in high-resolution MRI or CT scans that local-feature-focused CNNs might miss.

  2. Autonomous Systems: For autonomous vehicles, understanding the trajectory of pedestrians and other vehicles is critical. Transformers excel at video understanding by tracking objects across time frames, predicting future movements to ensure safe navigation.

Link to this sectionObject Detection with Transformers#

While CNNs have traditionally dominated object detection, Transformer-based models like the Real-Time Detection Transformer (RT-DETR) have emerged as powerful alternatives. RT-DETR combines the speed of CNN backbones with the precision of Transformer decoding heads.

However, pure Transformer models can be computationally heavy. For many edge applications, highly optimized hybrid models like YOLO26—which integrate efficient attention mechanisms with rapid convolutional processing—offer a superior balance of speed and accuracy. You can manage the training and deployment of these models easily via the Ultralytics Platform, which streamlines the workflow from dataset annotation to model export.

Link to this sectionPython Example: Using RT-DETR#

The following example demonstrates how to perform inference using a Transformer-based model within the ultralytics package. This code loads a pre-trained RT-DETR model and detects objects in an image.

from ultralytics import RTDETR

# Load a pre-trained Real-Time Detection Transformer (RT-DETR) model
model = RTDETR("rtdetr-l.pt")

# Run inference on an image URL
# The model uses self-attention to identify objects with high accuracy
results = model("https://ultralytics.com/images/bus.jpg")

# Display the detection results with bounding boxes
results[0].show()

For further reading on the mathematical foundations, the PyTorch documentation on Transformer layers provides technical depth, while IBM's guide to Transformers offers a high-level business perspective.

Explore solutions

Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning