اكتشف مشكلة تلاشي التدرج في التعلم العميق، وتأثيرها على الشبكات العصبية، والحلول الفعالة مثل ReLU و ResNets وغيرها.
The Vanishing Gradient problem is a significant challenge encountered during the training of deep artificial neural networks. It occurs when the gradients—the values that dictate how much the network's parameters should change—become incredibly small as they propagate backward from the output layer to the input layers. Because these gradients are essential for updating the model weights, their disappearance means the earlier layers of the network stop learning. This phenomenon effectively prevents the model from capturing complex patterns in the data, limiting the depth and performance of deep learning architectures.
To understand why this happens, it is helpful to look at the process of backpropagation. During training, the network calculates the error between its prediction and the actual target using a loss function. This error is then sent backward through the layers to adjust the weights. This adjustment relies on the chain rule of calculus, which involves multiplying the derivatives of activation functions layer by layer.
If a network uses activation functions like the sigmoid function or the hyperbolic tangent (tanh), the derivatives are often less than 1. When many of these small numbers are multiplied together in a deep network with dozens or hundreds of layers, the result approaches zero. You can visualize this like a game of "telephone" where a message is whispered down a long line of people; by the time it reaches the start of the line, the message has become inaudible, and the first person doesn't know what to say.
طور مجال الذكاء الاصطناعي عدة استراتيجيات قوية للتخفيف من التدرجات المتلاشية، مما مكن من إنشاء نماذج قوية مثل Ultralytics .
على الرغم من أنها تنبع من نفس الآلية الأساسية (الضرب المتكرر)، إلا أن التدرجات المتلاشية تختلف عن التدرجات المتفجرة.
NaN (Not a Number). This is often fixed by
gradient clipping.
لقد كان التغلب على التدرجات المتلاشية شرطًا أساسيًا لنجاح تطبيقات الذكاء الاصطناعي الحديثة.
تجرد الأطر والنماذج الحديثة العديد من هذه التعقيدات. عند تدريب نموذج مثل YOLO26، تتضمن البنية تلقائيًا مكونات مثل تنشيط SiLU وتطبيع الدُفعات لمنع التدرجات من الاختفاء.
from ultralytics import YOLO
# Load the YOLO26 model (latest generation, Jan 2026)
# This architecture includes residual connections and modern activations
# that inherently prevent vanishing gradients.
model = YOLO("yolo26n.pt")
# Train the model on a dataset
# The optimization process remains stable due to the robust architecture
results = model.train(data="coco8.yaml", epochs=10)