가지치기를 통해 AI 모델을 최적화하여 복잡성을 줄이고 효율성을 높이며 성능 저하 없이 에지 장치에 더 빠르게 배포합니다.
Pruning is a strategic model optimization technique used to reduce the size and computational complexity of neural networks by removing unnecessary parameters. Much like a gardener trims dead or overgrown branches to help a tree thrive, pruning algorithms identify and eliminate redundant weights and biases that contribute little to a model's predictive power. The primary objective is to create a compressed, "sparse" model that maintains high accuracy while consuming significantly less memory and energy. This reduction is essential for improving inference latency, allowing advanced architectures to run efficiently on resource-constrained hardware like mobile phones and embedded devices.
Modern deep learning models are often over-parameterized, meaning they contain far more connections than necessary to solve a specific task. Pruning exploits this by removing connections that have values close to zero, under the assumption that they have a negligible impact on the output. After parameters are removed, the model typically undergoes a process of fine-tuning, where it is retrained briefly to adjust the remaining weights and recover any lost performance. This concept is closely related to the Lottery Ticket Hypothesis, which suggests that large networks contain smaller, highly efficient subnetworks capable of reaching similar accuracy.
가지치기 전략에는 크게 두 가지 범주가 있습니다:
가지치기는 하드웨어 자원이 제한된 다양한 산업 분야에서 엣지 AI를 구현하기 위해 필수적입니다:
While state-of-the-art models like YOLO26 are designed for efficiency, developers can apply pruning to further optimize layers using libraries like PyTorch. The following example demonstrates how to apply unstructured pruning to a convolutional layer.
import torch
import torch.nn.utils.prune as prune
# Initialize a standard convolutional layer
layer = torch.nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
# Apply L1 unstructured pruning to remove 30% of weights with the lowest magnitude
prune.l1_unstructured(layer, name="weight", amount=0.3)
# Verify sparsity (percentage of zero parameters)
sparsity = 100.0 * float(torch.sum(layer.weight == 0)) / layer.weight.nelement()
print(f"Sparsity achieved: {sparsity:.2f}%")
배포를 위해 모델을 효과적으로 최적화하려면 가지치기(pruning)를 다른 전략과 구분하는 것이 도움이 됩니다:
포괄적인 라이프사이클 관리(훈련, 주석 부착, 최적화된 모델 배포 포함)를 위해 사용자는 Ultralytics 활용할 수 있습니다. 이를 통해 데이터셋 관리부터 ONNX와 같은 하드웨어 친화적 형식으로 모델을 내보내는 작업까지 워크플로우가 간소화됩니다. ONNX 또는 TensorRT와 같은 하드웨어 친화적 형식으로 모델을 내보내는 작업까지 간소화합니다.