Yolo Vision Shenzhen
Shenzhen
Join now
Glossary

Sigmoid

Discover the power of the Sigmoid function in AI. Learn how it enables non-linearity, aids binary classification, and drives ML advancements!

The Sigmoid function is a fundamental activation function widely used in the fields of machine learning (ML) and deep learning (DL). Mathematically represented as a logistic function, it is characterized by its distinct "S"-shaped curve, known as a sigmoid curve. The primary function of Sigmoid is to transform any real-valued input number into a value within the range of 0 and 1. This squashing property makes it exceptionally useful for models that need to predict probabilities, as the output can be directly interpreted as the likelihood of a specific event occurring. By introducing non-linearity into a neural network (NN), the Sigmoid function allows models to learn complex data patterns that go beyond simple linear regression.

Core Applications in Artificial Intelligence

The Sigmoid function plays a critical role in specific network architectures and tasks, particularly where outputs need to be interpreted as independent probabilities. While newer functions have replaced it in hidden layers for deep networks, it remains a standard in output layers for several key applications.

  • Binary Classification: In tasks where the objective is to categorize inputs into one of two mutually exclusive classes—such as determining if an email is "spam" or "not spam"—the Sigmoid function is the ideal choice for the final layer. It outputs a single scalar value between 0 and 1, representing the probability of the positive class. For example, in medical image analysis, a model might output 0.95, indicating a 95% confidence that a detected anomaly is malignant.
  • Multi-Label Classification: Unlike multi-class tasks where an input belongs to only one category, multi-label tasks allow an input to have multiple tags simultaneously. For instance, an object detection model like Ultralytics YOLO11 may need to detect a "person," "bicycle," and "helmet" in a single image. Here, Sigmoid is applied independently to each output node, allowing the model to predict the presence or absence of each class without forcing the probabilities to sum to one.
  • Recurrent Neural Network (RNN) Gating: Sigmoid is a crucial component in the gating mechanisms of advanced sequence models like Long Short-Term Memory (LSTM) networks. Within these architectures, "forget gates" and "input gates" use Sigmoid to output values between 0 (completely forget/block) and 1 (completely remember/pass), effectively regulating the flow of information over time. This mechanism is explained in depth in classic research on LSTMs.

Comparison with Related Activation Functions

To effectively design neural architectures, it is important to distinguish Sigmoid from other activation functions, as each serves a distinct purpose.

  • Softmax: While both functions relate to probability, Softmax is used for multi-class classification where classes are mutually exclusive. Softmax ensures that the outputs across all classes sum to exactly 1, creating a probability distribution. In contrast, Sigmoid treats each output independently, making it suitable for binary or multi-label tasks.
  • ReLU (Rectified Linear Unit): ReLU is the preferred activation function for hidden layers in modern deep networks. Unlike Sigmoid, which saturates at 0 and 1 causing the vanishing gradient problem during backpropagation, ReLU allows gradients to flow more freely for positive inputs. This accelerates training and convergence, as noted in Stanford CS231n course notes.
  • Tanh (Hyperbolic Tangent): The Tanh function is similar to Sigmoid but maps inputs to a range of -1 to 1. Because its output is zero-centered, Tanh is often preferred over Sigmoid in the hidden layers of older architectures and certain RNNs, as it helps with data centering for subsequent layers.

Implementation Example

The following Python snippet demonstrates how to apply the Sigmoid function using PyTorch. This is a common operation used to convert raw model outputs (logits) into interpretable probabilities.

import torch
import torch.nn as nn

# Raw outputs (logits) from a model for a binary or multi-label task
logits = torch.tensor([0.1, -2.5, 4.0])

# Apply the Sigmoid activation function
sigmoid = nn.Sigmoid()
probabilities = sigmoid(logits)

# Output values are squashed between 0 and 1
print(probabilities)
# Output: tensor([0.5250, 0.0759, 0.9820])

Understanding when to use Sigmoid is key to building effective AI systems. While it has limitations in deep hidden layers due to gradient saturation, its ability to model independent probabilities keeps it relevant in loss function calculations and final output layers for a wide variety of tasks.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now