Learn how to use a [dropout layer](https://www.ultralytics.com/glossary/dropout-layer) to prevent overfitting. Discover how to train [YOLO26](https://docs.ultralytics.com/models/yolo26/) for more robust AI models.
dropout层是一种用于神经网络(NN)的基础正则化技术,旨在解决普遍存在的过拟合问题。 当模型仅基于有限样本集训练时, 往往会记忆训练数据中的噪声和细节, 而非识别潜在的普遍规律。这种记忆机制 导致模型在开发阶段表现优异, 却无法处理新颖的未知输入。 掉落法通过在训练过程中随机停用(即"掉落")某层部分神经元来解决此问题。这项由Srivastava等人开创性论文提出的简单而有效的策略,显著提升了深度学习(DL)架构的稳定性和性能。
The mechanism behind a dropout layer is intuitively similar to removing players from a sports team during practice to force the remaining players to work harder and not rely on a single star athlete. During the model training phase, the layer generates a probabilistic mask of zeros and ones. If the dropout rate is set to 0.5, approximately 50% of the neurons are temporarily ignored during that specific forward and backward pass. This process forces the remaining active neurons to learn robust features independently, preventing the network from relying too heavily on any single neuron—a phenomenon known in machine learning (ML) as feature co-adaptation.
在实时推理(即测试阶段)中,dropout层通常处于停用状态。所有神经元保持激活状态,以充分利用训练模型的预测能力。为确保总激活值与训练阶段保持一致,框架通常会自动对权重进行缩放。现代库如PyTorch会自动处理此类权重缩放操作。 PyTorch 能无缝处理这些数学缩放操作,使开发者能专注于架构设计而非运算细节。
对于 ultralytics 包,将dropout应用于像这样的尖端模型
YOLO26 is as simple as adjusting a training argument. This
is particularly useful when working with smaller datasets where the risk of overfitting is higher. By introducing
randomness, you can encourage the model to generalize better across diverse environments.
from ultralytics import YOLO
# Load the latest YOLO26 model (recommended for new projects)
model = YOLO("yolo26n.pt")
# Train the model with a custom dropout rate of 0.1 (10%)
# This encourages the model to learn more generalized features
results = model.train(data="coco8.yaml", epochs=50, dropout=0.1)
在人工智能(AI)的各个领域中, 当模型使用的参数数量远超可用数据时, 遗忘机制都不可或缺。
While dropout is highly effective, it is often used alongside other techniques. It is distinct from data augmentation, which modifies the input images (e.g., flipping or rotating) rather than the network architecture itself. Similarly, it differs from batch normalization, which normalizes layer inputs to stabilize learning but does not explicitly deactivate neurons.
For complex projects, managing these hyperparameters can be challenging. The Ultralytics Platform simplifies this by providing tools to visualize training metrics, helping users determine if their dropout rates are effectively reducing validation loss. Whether you are building a custom image classification system or a sophisticated segmentation pipeline, understanding dropout is key to building resilient AI systems.