Yolo فيجن شنتشن
شنتشن
انضم الآن
مسرد المصطلحات

محوّل (Transformer)

اكتشف كيف تُحدث Architectures Transformer ثورة في الذكاء الاصطناعي، مما يدعم اختراقات في البرمجة اللغوية العصبية (NLP)، والرؤية الحاسوبية، ومهام تعلم الآلة المتقدمة.

A Transformer is a deep learning architecture that relies on a mechanism called self-attention to process sequential input data, such as natural language or visual features. Originally introduced by Google researchers in the landmark paper Attention Is All You Need, the Transformer revolutionized the field of artificial intelligence (AI) by discarding the sequential processing limitations of earlier Recurrent Neural Networks (RNNs). Instead, Transformers analyze entire sequences of data simultaneously, allowing for massive parallelization and significantly faster training times on modern hardware like GPUs.

كيف تعمل المحولات (Transformers)؟

The core innovation of the Transformer is the self-attention mechanism. This allows the model to weigh the importance of different parts of the input data relative to each other. For instance, in a sentence, the model can learn that the word "bank" relates more closely to "money" than to "river" based on the surrounding context.

This architecture generally consists of two main components:

  • Encoder: Processes the input data into a rich numerical representation or embedding.
  • Decoder: Uses the encoder's output to generate the final result, such as a translated sentence or a predicted bounding box.

In the realm of computer vision (CV), models usually employ a variation called the Vision Transformer (ViT). Instead of processing text tokens, the image is split into fixed-size patches (e.g., 16x16 pixels). These patches are flattened and treated as a sequence, enabling the model to capture "global context"—understanding relationships between distant parts of an image—more effectively than a standard Convolutional Neural Network (CNN).

Transformers vs. Related Concepts

It is important to distinguish the Transformer architecture from related terms:

  • Attention Mechanism: This is the general concept of focusing on specific parts of data. The Transformer is a specific architecture built entirely around attention layers, whereas other models might use attention only as a small add-on.
  • Large Language Model (LLM): Terms like "GPT" refer to specific models trained on vast amounts of text. Almost all modern LLMs use the Transformer architecture as their underlying engine.

تطبيقات واقعية

The versatility of Transformers has led to their adoption across various industries:

  1. Medical Imaging: In AI in Healthcare, Transformers are used for complex tasks like medical image analysis. Their ability to understand global spatial relationships helps in detecting subtle anomalies in high-resolution MRI or CT scans that local-feature-focused CNNs might miss.
  2. Autonomous Systems: For autonomous vehicles, understanding the trajectory of pedestrians and other vehicles is critical. Transformers excel at video understanding by tracking objects across time frames, predicting future movements to ensure safe navigation.

Object Detection with Transformers

While CNNs have traditionally dominated object detection, Transformer-based models like the Real-Time Detection Transformer (RT-DETR) have emerged as powerful alternatives. RT-DETR combines the speed of CNN backbones with the precision of Transformer decoding heads.

However, pure Transformer models can be computationally heavy. For many edge applications, highly optimized hybrid models like YOLO26—which integrate efficient attention mechanisms with rapid convolutional processing—offer a superior balance of speed and accuracy. You can manage the training and deployment of these models easily via the Ultralytics Platform, which streamlines the workflow from dataset annotation to model export.

Python Example: Using RT-DETR

The following example demonstrates how to perform inference using a Transformer-based model within the ultralytics package. This code loads a pre-trained RT-DETR model and detects objects in an image.

from ultralytics import RTDETR

# Load a pre-trained Real-Time Detection Transformer (RT-DETR) model
model = RTDETR("rtdetr-l.pt")

# Run inference on an image URL
# The model uses self-attention to identify objects with high accuracy
results = model("https://ultralytics.com/images/bus.jpg")

# Display the detection results with bounding boxes
results[0].show()

For further reading on the mathematical foundations, the PyTorch documentation on Transformer layers provides technical depth, while IBM's guide to Transformers offers a high-level business perspective.

انضم إلى مجتمع Ultralytics

انضم إلى مستقبل الذكاء الاصطناعي. تواصل وتعاون وانمو مع المبتكرين العالميين

انضم الآن