Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Swin Transformer

Discover how the Swin Transformer architecture uses shifted windows for efficient computer vision, and explore workflows on the Ultralytics Platform.

Introduced by researchers at Microsoft in the landmark 2021 paper "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows", this deep learning (DL) architecture adapts the attention mechanism to handle the complexities of high-resolution visual data. Unlike natural language processing models that process text tokens of uniform length, this architecture acknowledges that visual elements vary drastically in scale. By building a hierarchical representation and utilizing a unique windowing technique, it achieves linear computational complexity relative to image size, making it a highly efficient backbone for a variety of computer vision (CV) tasks.

Link to this sectionHow Shifted Windows And Hierarchical Design Work#

The primary innovation lies in how the model structures feature extraction. It starts by dividing an input image into small, non-overlapping patches. However, unlike earlier models, it progressively merges these neighboring patches into larger regions in deeper layers. This hierarchical approach allows the network to extract rich feature maps that represent global context at various scales, from tiny visual details to large objects.

To maintain computational efficiency, self-attention is computed only within local, isolated windows rather than across the entire image. To ensure information flows across these boundaries, the windows are "shifted" between successive layers. This shifted window scheme effectively bridges independent areas, providing comprehensive multi-scale spatial hierarchies without the heavy computational burden associated with global attention.

Link to this sectionSwin Transformer Vs. Vision Transformer (ViT)#

When comparing modern architectures, it is important to distinguish this model from the standard Vision Transformer (ViT). The original ViT treats images as a sequence of fixed-size patches and computes global attention across all of them simultaneously. While highly accurate, this results in quadratic computational complexity, meaning the processing time and memory requirements skyrocket as image resolution increases.

In contrast, the hierarchical and window-based design of the Swin architecture keeps complexity linear. This makes it far more practical for dense prediction tasks that require high-resolution inputs and outputs. Consequently, it achieves state-of-the-art results on benchmarks like the COCO test-dev dataset for multi-scale object detection and the ADE20K semantic segmentation dataset for precise image segmentation.

Link to this sectionReal-World Applications In Modern AI#

Because of its flexibility and efficiency, the official Microsoft Research GitHub repository implementation has been adapted across complex, high-stakes industries.

Link to this sectionIntegration With PyTorch And Ultralytics#

For developers building custom neural networks, implementing this architecture is straightforward using official PyTorch documentation. The torchvision library includes pre-trained versions, such as the lightweight Tiny variant, optimized on ImageNet.

import torch
from torchvision.models import Swin_T_Weights, swin_t

# Load a pre-trained Tiny variant with ImageNet weights
weights = Swin_T_Weights.IMAGENET1K_V1
model = swin_t(weights=weights)
model.eval()

# Run a single batch containing a 3-channel, 224x224 dummy image tensor
dummy_image = torch.randn(1, 3, 224, 224)
output = model(dummy_image)

# The output shape is [1, 1000], representing the 1000 ImageNet classes
print(f"Prediction tensor shape: {output.shape}")

While transformer-based backbones offer excellent multi-scale representation, modern applications often demand purely end-to-end optimizations for edge AI devices. For instance, Ultralytics YOLO26 provides a natively end-to-end architecture that is smaller, faster, and highly accurate out of the box, excelling in real-time edge environments. Whether utilizing transformer-heavy architectures or fast convolutional models, developers can manage their entire workflow—from data annotation to training—via the Ultralytics Platform. This comprehensive cloud toolchain makes model deployment and continuous model monitoring simple and efficient.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning