Yolo Vision Shenzhen
Shenzhen
Jetzt beitreten
Glossar

Verteiltes Training

Learn how distributed training scales AI by splitting workloads across GPUs. Explore data parallelism, use [YOLO26](https://docs.ultralytics.com/models/yolo26/) for faster results, and discover the [Ultralytics Platform](https://platform.ultralytics.com) for seamless model deployment.

Distributed training is a method in machine learning where the workload of training a model is split across multiple processors or machines. This approach is essential for handling large-scale datasets and complex neural network architectures that would otherwise take an impractical amount of time to train on a single device. By leveraging the combined computational power of multiple Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), distributed training significantly accelerates the development cycle, allowing researchers and engineers to iterate faster and achieve higher accuracy in their models.

How Distributed Training Works

The core idea behind distributed training is parallelization. Instead of processing data sequentially on one chip, the task is divided into smaller chunks that are processed simultaneously. There are two primary strategies for achieving this:

  • Data Parallelism: This is the most common approach for tasks like object detection. In this setup, a copy of the entire model is placed on every device. The global training data is split into smaller batches, and each device processes a different batch at the same time. After each step, the gradients (updates to the model) are synchronized across all devices to ensure the model weights remain consistent.
  • Model Parallelism: When a neural network (NN) is too large to fit into the memory of a single GPU, the model itself is split across multiple devices. Different layers or components of the model reside on different chips, and data flows between them. This is often necessary for training massive foundation models and Large Language Models (LLMs).

Anwendungsfälle in der Praxis

Distributed training has transformed industries by making it possible to solve problems that were previously computationally infeasible.

  • Autonomous Driving: Developing safe autonomous vehicles requires analyzing petabytes of video and sensor data. Automotive engineers use large distributed clusters to train vision models for real-time semantic segmentation and lane detection. This massive scale ensures that the AI in automotive systems can react reliably to diverse road conditions.
  • Medical Imaging: In the healthcare sector, analyzing high-resolution 3D scans like MRIs requires significant memory and processing power. Distributed training enables researchers to build high-performance diagnostic tools for tumor detection and other critical tasks. By using frameworks such as NVIDIA MONAI, hospitals can train models on diverse datasets without hitting memory bottlenecks, improving AI in healthcare outcomes.

Utilizing Distributed Training with Ultralytics

Die ultralytics library makes it straightforward to implement Distributed Data Parallel (DDP) training. You can scale your training of state-of-the-art YOLO26 models across multiple GPUs by simply specifying the device indices in your training arguments.

from ultralytics import YOLO

# Load a pre-trained YOLO26 model
model = YOLO("yolo26n.pt")

# Train the model using two GPUs (device 0 and 1)
# The library automatically handles the DDP communication backend
results = model.train(data="coco8.yaml", epochs=100, device=[0, 1])

Related Concepts and Comparisons

It is helpful to distinguish distributed training from similar terms in the machine learning ecosystem to understand their specific roles:

  • Vs. Federated Learning: While both involve multiple devices, their goals differ. Distributed training usually centralizes data in a high-performance cluster to maximize speed. In contrast, federated learning keeps data decentralized on user devices (like smartphones) to prioritize data privacy, updating the global model without raw data ever leaving the source.
  • Vs. High-Performance Computing (HPC): HPC is a broad field that includes supercomputing for scientific simulations like weather forecasting. Distributed training is a specific application of HPC applied to optimization algorithms in deep learning. It often relies on specialized communication libraries like NVIDIA NCCL to minimize latency between GPUs.

Scaling with Cloud Platforms

Managing the infrastructure for distributed training can be complex. Modern platforms simplify this by offering managed environments. For example, the Ultralytics Platform allows users to manage datasets and initiate training runs that can be deployed to cloud environments or local clusters. This integration streamlines the workflow from data annotation to final model deployment, ensuring that scaling up to multiple GPUs is as seamless as possible. Similarly, cloud providers like Google Cloud Vertex AI and Amazon SageMaker provide robust infrastructure for running distributed training jobs at enterprise scale.

Werden Sie Mitglied der Ultralytics

Gestalten Sie die Zukunft der KI mit. Vernetzen Sie sich, arbeiten Sie zusammen und wachsen Sie mit globalen Innovatoren

Jetzt beitreten