Yolo Tầm nhìn Thâm Quyến
Thâm Quyến
Tham gia ngay
Bảng chú giải thuật ngữ

GPU (Bộ xử lý đồ họa)

Khám phá cách GPU cách mạng hóa AI và máy học bằng cách tăng tốc học sâu, tối ưu hóa quy trình làm việc và cho phép các ứng dụng thực tế.

A Graphics Processing Unit (GPU) is a specialized electronic circuit originally designed to accelerate the manipulation and creation of images in a frame buffer for display output. While their roots lie in rendering computer graphics for gaming and professional visualization, GPUs have evolved into the fundamental engine of modern Artificial Intelligence (AI). Unlike a standard processor that uses a few powerful cores to handle tasks sequentially, a GPU architecture is composed of thousands of smaller, efficient cores designed to handle multiple tasks simultaneously. This capability, known as parallel computing, makes them exceptionally efficient for the massive matrix and vector operations that underpin Deep Learning (DL) and complex Neural Networks (NN).

Accelerating AI Workloads

The primary reason GPUs are indispensable for Machine Learning (ML) is their ability to perform high-speed matrix multiplications. Deep learning frameworks like PyTorch and TensorFlow are specifically optimized to leverage this hardware acceleration. This results in significantly reduced times for model training, often transforming what would be weeks of computation on a standard processor into hours on a GPU. The computational throughput of these devices is typically measured in FLOPS (Floating Point Operations Per Second), a critical metric for gauging the capability of hardware to handle the rigorous demands of state-of-the-art models like YOLO26.

Sự khác biệt về phần cứng: GPU so với CPU so với TPU

To understand the hardware landscape, it is helpful to distinguish the GPU from other processing units:

  • CPU (Central Processing Unit): The general-purpose "brain" of a computer. CPUs excel at sequential processing and complex logic branching but are less efficient for the massive parallelism required by large-scale AI training.
  • GPU (Graphics Processing Unit): The industry standard for training and inference. Leading manufacturers like NVIDIA utilize software ecosystems such as CUDA to allow developers to program the GPU directly for general-purpose computing.
  • TPU (Tensor Processing Unit): An Application-Specific Integrated Circuit (ASIC) developed specifically for neural network machine learning. While highly efficient for specific tensor operations, they are less versatile than GPUs for broader computing tasks.

Các Ứng dụng Thực tế

The implementation of high-performance GPUs has fueled innovations across diverse industries:

  • Autonomous Vehicles: Self-driving cars must process gigabytes of data from cameras, radar, and LiDAR sensors every second. GPUs enable real-time inference, allowing the vehicle's onboard computer to run Object Detection models that identify pedestrians, traffic signs, and obstacles instantaneously.
  • Medical Image Analysis: In healthcare, GPUs accelerate the processing of high-resolution scans such as MRIs and CTs. They enable sophisticated Image Segmentation algorithms to precisely delineate tumors or organs, assisting radiologists in making faster and more accurate diagnoses without relying solely on manual inspection.

Training with GPUs

Khi sử dụng ultralytics package, utilizing a GPU is straightforward and highly recommended for efficient workflows. The library supports automatic device detection, but users can also explicitly specify the device.

Ví dụ sau đây minh họa cách huấn luyện mô hình YOLO26 trên dữ liệu đầu tiên có sẵn. GPU :

from ultralytics import YOLO

# Load the YOLO26n model
model = YOLO("yolo26n.pt")

# Train the model on the first available GPU (device=0)
# This significantly accelerates training compared to CPU usage
results = model.train(data="coco8.yaml", epochs=5, imgsz=640, device=0)

Deployment and Optimization

Beyond training, GPUs play a crucial role in Model Deployment. To maximize efficiency during inference, models are often converted to optimized formats like TensorRT, which restructures the neural network to align perfectly with the specific GPU architecture, reducing latency. For developers who do not have access to high-end local hardware, the Ultralytics Platform offers cloud-based solutions to manage datasets and train models on powerful remote GPU clusters. This accessibility drives innovation in Edge AI, allowing complex Computer Vision (CV) tasks to be deployed on smaller, power-efficient devices in the field.

Tham gia Ultralytics cộng đồng

Tham gia vào tương lai của AI. Kết nối, hợp tác và phát triển cùng với những nhà đổi mới toàn cầu

Tham gia ngay