激活函数
了解激活函数在神经网络中的作用、其类型以及在 AI 和机器学习中的实际应用。
An activation function is a fundamental component of a
neural network (NN) that determines the output of
a neuron given a set of inputs. Often described as the "gatekeeper," it decides whether a neuron should be
active—meaning it contributes to the network's prediction—or inactive. Without these mathematical operations, a neural
network would behave like a simple
linear regression model, unable to grasp complex
patterns regardless of its depth. By introducing non-linearity, activation functions enable
deep learning (DL) models to learn intricate
structures, such as the curves in handwritten digits or subtle anomalies in
medical image analysis.
核心功能与常见类型
The primary role of an activation function is to map input signals to a desired output range and introduce complexity
into the feature maps generated by the network.
Developers select specific functions based on the layer's position and the objectives of the
model training process.
-
ReLU (Rectified Linear Unit):
Currently the most widely used function for hidden layers. It outputs the input directly if it is positive and zero
otherwise. This simplicity accelerates computation and helps mitigate the
vanishing gradient problem, a frequent
challenge when training deep architectures.
-
Sigmoid: This function
"squashes" input values into a range between 0 and 1. It is frequently employed in the final layer for
binary classification tasks, such as determining if an email is spam, as the output can be interpreted as a
probability score.
-
Softmax:多分类问题中的核心算法,它将数值向量转换为概率分布,其中所有值之和为1。这是图像分类任务(ImageNet 中的任务)的标准处理方式。
-
SiLU (Sigmoid Linear Unit):
A smooth, non-monotonic function often used in state-of-the-art architectures like
YOLO26. SiLU allows for better gradient flow than ReLU in
very deep models, contributing to higher accuracy.
人工智能在现实世界中的应用
The choice of activation function directly impacts the performance and
inference latency of AI systems deployed in daily
operations.
-
Retail Object Detection: In automated checkout systems,
object detection models identify products on a
conveyor belt. Hidden layers use efficient functions like ReLU or SiLU to process visual features rapidly. The
output layer determines the class (e.g., "apple," "cereal") and the
bounding box coordinates, enabling the system to
tally the bill automatically. This is critical for
AI in retail to ensure speed and customer
satisfaction.
-
Sentiment Analysis: In
natural language processing (NLP), models analyze customer reviews to gauge satisfaction. A network might process text data and use a Sigmoid
function in the final layer to output a sentiment score between 0 (negative) and 1 (positive), helping businesses
understand customer feedback at scale using
machine learning (ML).
实施实例
您可以通过PyTorch可视化工具直观了解不同激活函数如何转换数据。
PyTorch 库可直观展示不同激活函数对数据的转换效果。以下代码片段演示了
ReLU(将负值归零)与Sigmoid(压缩数值)之间的差异。
import torch
import torch.nn as nn
# Input data: negative, zero, and positive values
data = torch.tensor([-2.0, 0.0, 2.0])
# Apply ReLU: Negatives become 0, positives stay unchanged
relu_output = nn.ReLU()(data)
print(f"ReLU: {relu_output}")
# Output: tensor([0., 0., 2.])
# Apply Sigmoid: Squashes values between 0 and 1
sigmoid_output = nn.Sigmoid()(data)
print(f"Sigmoid: {sigmoid_output}")
# Output: tensor([0.1192, 0.5000, 0.8808])
区分相关概念
区分激活函数与学习管道中的其他数学组件至关重要。
-
激活函数与损失函数:
激活函数在前向传播过程中运作,用于塑造神经元的输出。损失函数(如均方误差)则在前向传播结束时计算预测值与实际目标值之间的误差。
-
Activation Function vs.
Optimization Algorithm:
While the activation function defines the output structure, the optimizer (like
Adam or
Stochastic Gradient Descent)
decides how to update the model weights to
minimize the error calculated by the loss function.
-
Activation Function vs.
Transfer Learning:
Activation functions are fixed mathematical operations within the network's layers. Transfer learning is a technique
where a pre-trained model is adapted for a new task, often preserving the activation functions of the original
architecture while fine-tuning the weights on a custom dataset via the
Ultralytics Platform.
For a deeper dive into how these functions fit into larger systems, explore the
PyTorch documentation on non-linear activations
or read about how
computer vision tasks
rely on them for feature extraction.