분산 학습
분산 학습으로 AI 학습을 가속화하세요! 복잡한 ML 프로젝트를 위해 학습 시간 단축, 모델 확장, 리소스 최적화 방법을 알아보세요.
Distributed training is a method in machine learning where the workload of training a model is split across multiple
processors or machines. This approach is essential for handling large-scale datasets and complex neural network
architectures that would otherwise take an impractical amount of time to train on a single device. By leveraging the
combined computational power of multiple
Graphics Processing Units (GPUs) or
Tensor Processing Units (TPUs), distributed training significantly accelerates the development cycle, allowing
researchers and engineers to iterate faster and achieve higher
accuracy in their models.
How Distributed Training Works
The core idea behind distributed training is parallelization. Instead of processing data sequentially on one chip, the
task is divided into smaller chunks that are processed simultaneously. There are two primary strategies for achieving
this:
-
Data Parallelism: This
is the most common approach for tasks like
object detection. In this setup, a copy of the
entire model is placed on every device. The global
training data is split into smaller batches, and
each device processes a different batch at the same time. After each step, the gradients (updates to the model) are
synchronized across all devices to ensure the
model weights remain consistent.
-
Model Parallelism: When a
neural network (NN) is too large to fit into
the memory of a single GPU, the model itself is split across multiple devices. Different layers or components of the
model reside on different chips, and data flows between them. This is often necessary for training massive
foundation models and
Large Language Models (LLMs).
실제 애플리케이션
Distributed training has transformed industries by making it possible to solve problems that were previously
computationally infeasible.
-
Autonomous Driving: Developing safe
autonomous vehicles requires analyzing
petabytes of video and sensor data. Automotive engineers use large distributed clusters to train vision models for
real-time semantic segmentation and lane
detection. This massive scale ensures that the
AI in automotive systems can react reliably to
diverse road conditions.
-
Medical Imaging: In the healthcare sector, analyzing high-resolution 3D scans like MRIs requires
significant memory and processing power. Distributed training enables researchers to build high-performance
diagnostic tools for
tumor detection
and other critical tasks. By using frameworks such as
NVIDIA MONAI, hospitals can train models on diverse datasets
without hitting memory bottlenecks, improving
AI in healthcare outcomes.
Utilizing Distributed Training with Ultralytics
그리고 ultralytics library makes it straightforward to implement Distributed Data Parallel (DDP) training.
You can scale your training of state-of-the-art
YOLO26 models across multiple GPUs by simply specifying the
device indices in your training arguments.
from ultralytics import YOLO
# Load a pre-trained YOLO26 model
model = YOLO("yolo26n.pt")
# Train the model using two GPUs (device 0 and 1)
# The library automatically handles the DDP communication backend
results = model.train(data="coco8.yaml", epochs=100, device=[0, 1])
Related Concepts and Comparisons
It is helpful to distinguish distributed training from similar terms in the machine learning ecosystem to understand
their specific roles:
-
Vs. Federated Learning: While both involve multiple devices, their goals differ. Distributed
training usually centralizes data in a high-performance cluster to maximize speed. In contrast,
federated learning keeps data decentralized on
user devices (like smartphones) to prioritize
data privacy, updating the global model without raw
data ever leaving the source.
-
Vs. High-Performance Computing (HPC): HPC is a broad field that includes supercomputing for
scientific simulations like weather forecasting. Distributed training is a specific application of HPC applied to
optimization algorithms in deep learning.
It often relies on specialized communication libraries like
NVIDIA NCCL to minimize latency between GPUs.
Scaling with Cloud Platforms
Managing the infrastructure for distributed training can be complex. Modern platforms simplify this by offering
managed environments. For example, the Ultralytics Platform allows
users to manage datasets and initiate training runs that can be deployed to cloud environments or local clusters. This
integration streamlines the workflow from
data annotation to final
model deployment, ensuring that scaling up to
multiple GPUs is as seamless as possible. Similarly, cloud providers like
Google Cloud Vertex AI and
Amazon SageMaker provide robust infrastructure for running distributed
training jobs at enterprise scale.