Softmax
Khám phá cách Softmax biến đổi điểm số thành xác suất cho các tác vụ phân loại trong AI, thúc đẩy sự thành công của nhận dạng hình ảnh và NLP.
Softmax is a mathematical function pivotal to the field of artificial intelligence, specifically serving as the final
step in many classification algorithms. It transforms a vector of raw numbers, often called logits, into a vector of
probabilities. This transformation ensures that the output values are all positive and sum up to exactly one,
effectively creating a valid
probability distribution. Because of this
property, Softmax is the standard
activation function used in the output layer of
neural networks designed for multi-class
classification, where the system must choose a single category from more than two mutually exclusive options.
Cơ chế của Softmax
In a typical deep learning (DL) workflow, the
layers of a network perform complex matrix multiplications and additions. The output of the final layer, before
activation, consists of raw scores known as
logits. These values can range from
negative infinity to positive infinity, making them difficult to interpret directly as confidence levels.
Softmax addresses this by performing two main operations:
-
Exponentiation: It calculates the exponential of each input number. This step ensures that all
values are non-negative (since $e^x$ is always positive) and penalizes values that are significantly lower than the
maximum, while highlighting the largest scores.
-
Normalization: It sums these exponentiated values and divides each individual exponential by this
total sum. This normalization process scales the
numbers so they represent parts of a whole, allowing developers to interpret them as percentage confidence scores.
Các Ứng dụng Thực tế
The ability to output clear probabilities makes Softmax indispensable across various industries and
machine learning (ML) tasks.
-
Image Classification: In computer vision, models use Softmax to categorize images. For instance,
when the Ultralytics YOLO26 classification model analyzes
a photo, it might produce scores for classes like "Golden Retriever," "German Shepherd," and
"Poodle." Softmax converts these scores into probabilities (e.g., 0.85, 0.10, 0.05), indicating a high
confidence that the image contains a Golden Retriever. This is crucial for applications ranging from automated photo
organization to medical diagnosis in
AI in Healthcare.
-
Natural Language Processing (NLP): Softmax is the engine behind text generation in
Large Language Models (LLMs). When a
model like a Transformer generates a sentence, it
predicts the next word (token) by calculating a score for every word in its vocabulary. Softmax turns these scores
into probabilities, allowing the model to select the most likely next word, enabling fluid
machine translation and conversational AI.
-
Reinforcement Learning: Agents in
reinforcement learning often use Softmax
to select actions. Instead of always choosing the action with the highest value, an agent might use the
probabilities to explore different strategies, balancing exploration and exploitation in environments like robotic
control or game playing.
Python Ví dụ mã
The following example demonstrates how to load a pre-trained
YOLO26 classification model and access the probability
scores generated via Softmax.
from ultralytics import YOLO
# Load a pre-trained YOLO26 classification model
model = YOLO("yolo26n-cls.pt")
# Run inference on a sample image
results = model("https://ultralytics.com/images/bus.jpg")
# The model applies Softmax internally. Access the top prediction:
# The 'probs' attribute contains the probability distribution.
top_prob = results[0].probs.top1conf.item()
top_class = results[0].names[results[0].probs.top1]
print(f"Predicted Class: {top_class}")
print(f"Confidence (Softmax Output): {top_prob:.4f}")
Phân biệt Softmax với các khái niệm liên quan
While Softmax is dominant in multi-class scenarios, it is important to distinguish it from other mathematical
functions used in model training and architecture design:
-
Sigmoid: The Sigmoid function also
scales values between 0 and 1, but it treats each output independently. This makes Sigmoid ideal for
binary classification (yes/no) or multi-label
classification where classes are not mutually exclusive (e.g., an image can contain both a "Person" and a
"Backpack"). Softmax forces the probabilities to sum to one, making the classes compete with each other.
-
ReLU (Rectified Linear Unit):
ReLU is used primarily in the hidden layers of a network to introduce non-linearity. Unlike Softmax, ReLU does not
bound outputs to a specific range (it simply outputs zero for negative inputs and the input itself for positive
ones) and does not generate a probability distribution.
-
Argmax: While Softmax provides the probabilities for all classes, the
Argmax function is often used in
conjunction to select the single index with the highest probability. Softmax provides the "soft"
confidence, while Argmax provides the "hard" final decision.
Advanced Integration
In modern ML pipelines, Softmax is often computed implicitly within loss functions. For example,
Cross-Entropy Loss
combines Softmax and negative log-likelihood into a single mathematical step to improve numerical stability during
training. Platforms like the Ultralytics Platform handle these
complexities automatically, allowing users to train robust models without manually implementing these mathematical
operations.