Yolo Vision Shenzhen
Shenzhen
Şimdi katılın
Sözlük

Budama (Pruning)

Kırpma ile yapay zeka modellerini optimize edin; karmaşıklığı azaltın, verimliliği artırın ve performanstan ödün vermeden uç cihazlarda daha hızlı dağıtım yapın.

Pruning is a strategic model optimization technique used to reduce the size and computational complexity of neural networks by removing unnecessary parameters. Much like a gardener trims dead or overgrown branches to help a tree thrive, pruning algorithms identify and eliminate redundant weights and biases that contribute little to a model's predictive power. The primary objective is to create a compressed, "sparse" model that maintains high accuracy while consuming significantly less memory and energy. This reduction is essential for improving inference latency, allowing advanced architectures to run efficiently on resource-constrained hardware like mobile phones and embedded devices.

Mekanizmalar ve Metodoloji

Modern deep learning models are often over-parameterized, meaning they contain far more connections than necessary to solve a specific task. Pruning exploits this by removing connections that have values close to zero, under the assumption that they have a negligible impact on the output. After parameters are removed, the model typically undergoes a process of fine-tuning, where it is retrained briefly to adjust the remaining weights and recover any lost performance. This concept is closely related to the Lottery Ticket Hypothesis, which suggests that large networks contain smaller, highly efficient subnetworks capable of reaching similar accuracy.

Budama stratejileri iki ana kategoriye ayrılır:

  • Unstructured Pruning: This method removes individual weights based on their magnitude, regardless of their location. While it effectively reduces the total parameter count, it creates irregular sparse matrices that standard CPUs and GPUs may struggle to process efficiently without specialized software.
  • Structured Pruning: This approach removes entire geometric structures, such as neurons, channels, or layers within a convolutional neural network (CNN). By preserving the matrix structure, structured pruning is highly compatible with standard hardware accelerators, often resulting in immediate speedups for real-time inference.

Gerçek Dünya Uygulamaları

Budama, donanım kaynaklarının sınırlı olduğu çeşitli endüstrilerde Edge AI'yı mümkün kılmak için vazgeçilmezdir: - Akıllı telefonlar: Akıllı telefonların işlem gücü sınırlıdır ve bu sınırlı işlem gücü, Edge AI'nın akıllı telefonlarda kullanılmasını zorlaştırır.

  1. Autonomous Drones: Unmanned aerial vehicles used for search and rescue rely on computer vision to navigate complex environments. Pruned object detection models allow these devices to process video feeds locally in real-time, avoiding the latency issues associated with cloud communication.
  2. Mobile Healthcare: Handheld medical devices for ultrasound analysis utilize pruned models to detect anomalies directly on the device. This ensures patient data privacy and enables sophisticated diagnostics in remote areas without internet access.

Uygulama Örneği

While state-of-the-art models like YOLO26 are designed for efficiency, developers can apply pruning to further optimize layers using libraries like PyTorch. The following example demonstrates how to apply unstructured pruning to a convolutional layer.

import torch
import torch.nn.utils.prune as prune

# Initialize a standard convolutional layer
layer = torch.nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)

# Apply L1 unstructured pruning to remove 30% of weights with the lowest magnitude
prune.l1_unstructured(layer, name="weight", amount=0.3)

# Verify sparsity (percentage of zero parameters)
sparsity = 100.0 * float(torch.sum(layer.weight == 0)) / layer.weight.nelement()
print(f"Sparsity achieved: {sparsity:.2f}%")

Budama ve İlgili Optimizasyon Teknikleri

Bir modeli dağıtım için etkili bir şekilde optimize etmek için , budamayı diğer stratejilerden ayırmak yararlıdır:

  • Model Quantization: Unlike pruning, which removes connections, quantization reduces the precision of the weights (e.g., converting 32-bit floating-point numbers to 8-bit integers). Both techniques can be used together to maximize efficiency on embedded systems.
  • Knowledge Distillation: This involves training a smaller "student" model to mimic a larger "teacher" model's behavior. Pruning modifies the original model directly, whereas distillation trains a new, compact architecture.

Eğitim, açıklama ve optimize edilmiş modellerin dağıtımı dahil olmak üzere kapsamlı yaşam döngüsü yönetimi için kullanıcılar Ultralytics kullanabilirler. Bu, veri kümesi yönetiminden ONNX gibi donanım dostu formatlarda modellerin dışa aktarılmasına kadar iş akışını basitleştirir. ONNX veya TensorRTgibi donanım dostu formatlarda dışa aktarılmasına kadar iş akışını basitleştir

Ultralytics topluluğuna katılın

Yapay zekanın geleceğine katılın. Küresel yenilikçilerle bağlantı kurun, işbirliği yapın ve birlikte büyüyün

Şimdi katılın