Yolo 비전 선전
선전
지금 참여하기
용어집

SiLU (Sigmoid Linear Unit)

SiLU (Swish) 활성화 함수가 객체 감지 및 NLP와 같은 AI 작업에서 딥러닝 성능을 어떻게 향상시키는지 알아보세요.

The Sigmoid Linear Unit, commonly referred to as SiLU, is a highly effective activation function used in modern deep learning architectures to introduce non-linearity into neural networks. By determining how neurons process and pass information through the layers of a model, SiLU enables systems to learn complex patterns in data, functioning as a smoother and more sophisticated alternative to traditional step functions. Often associated with the term "Swish" from initial research on automated activation search, SiLU has become a standard in high-performance computer vision models, including the state-of-the-art YOLO26 architecture.

SiLU 작동 방식

At its core, the SiLU function operates by multiplying an input value by its own Sigmoid transformation. Unlike simple threshold functions that abruptly switch a neuron between "on" and "off," SiLU provides a smooth curve that allows for more nuanced signal processing. This mathematical structure creates distinct characteristics that benefit the model training process:

  • 부드러움: 곡선은 모든 점에서 연속적이고 미분 가능합니다. 이 특성은 모델 가중치를 조정하기 위한 일관된 환경을 제공함으로써 경사 하강법과 같은 최적화 알고리즘을 지원하며, 이는 종종 훈련 중 더 빠른 수렴을 이끌어냅니다.
  • Non-Monotonicity: Unlike standard linear units, SiLU is non-monotonic, meaning its output can decrease even as the input increases in certain negative ranges. This allows the network to capture complex features and retain negative values that might otherwise be discarded, helping to prevent the vanishing gradient problem in deep networks.
  • 자기 게이트: SiLU는 입력 신호의 크기 자체에 따라 통과시키는 양을 조절하는 자체 게이트 역할을 수행합니다. 이는 롱 숏텀 메모리(LSTM)네트워크에서 발견되는 게이트 메커니즘을 모방하지만, 컨볼루션 신경망(CNN)에 적합한 계산 효율적인 형태로 구현됩니다.

실제 애플리케이션

SiLU는 정밀도와 효율성이 가장 중요한 많은 최첨단 AI 솔루션에 필수적인 요소입니다.

  • Autonomous Vehicle Perception: In the safety-critical domain of autonomous vehicles, perception systems must identify pedestrians, traffic signs, and obstacles instantly. Models utilizing SiLU in their backbones can maintain high inference speeds while accurately performing object detection in varying lighting conditions, ensuring the vehicle reacts safely to its environment.
  • Medical Imaging Diagnostics: In medical image analysis, neural networks need to discern subtle texture differences in MRI or CT scans. The gradient-preserving nature of SiLU helps these networks learn the fine-grained details necessary for early tumor detection, significantly improving the reliability of automated diagnostic tools used by radiologists.

관련 개념과의 비교

SiLU를 완전히 이해하려면 Ultralytics 수록된 다른 활성화 함수들과 구분하는 것이 도움이 됩니다.

  • SiLU vs. ReLU (Rectified Linear Unit): ReLU is famous for its speed and simplicity, outputting zero for all negative inputs. While efficient, this can lead to "dead neurons" that stop learning. SiLU avoids this by allowing a small, non-linear gradient to flow through negative values, which often results in better accuracy for deep architectures trained on the Ultralytics Platform.
  • SiLU vs. GELU (Gaussian Error Linear Unit): These two functions are visually and functionally similar. GELU is the standard for Transformer models like BERT and GPT, while SiLU is frequently preferred for computer vision (CV) tasks and CNN-based object detectors.
  • SiLU 대 시그모이드: SiLU는 내부적으로 시그모이드 함수를 사용하지만, 두 함수는 서로 다른 역할을 수행합니다. 시그모이드는 일반적으로 이진 분류를 위한 최종 출력층에서 확률을 표현하는 데 사용되는 반면, SiLU는 숨겨진 층에서 특징 추출을 용이하게 하기 위해 사용됩니다.

구현 예시

You can visualize how different activation functions transform data using the PyTorch library. The following code snippet demonstrates the difference between ReLU (which zeroes out negatives) and SiLU (which allows smooth negative flow).

import torch
import torch.nn as nn

# Input data: negative, zero, and positive values
data = torch.tensor([-2.0, 0.0, 2.0])

# Apply ReLU: Negatives become 0, positives stay unchanged
relu_out = nn.ReLU()(data)
print(f"ReLU: {relu_out}")
# Output: tensor([0., 0., 2.])

# Apply SiLU: Smooth curve, small negative value retained
silu_out = nn.SiLU()(data)
print(f"SiLU: {silu_out}")
# Output: tensor([-0.2384,  0.0000,  1.7616])

By retaining information in negative values and providing a smooth gradient, SiLU plays a pivotal role in the success of modern neural networks. Its adoption in architectures like YOLO26 underscores its importance in achieving state-of-the-art performance across diverse computer vision tasks.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기