深圳Yolo 视觉
深圳
立即加入
词汇表

残差网络 (ResNet)

Explore Residual Networks (ResNet) and learn how skip connections solve the vanishing gradient problem. Discover ResNet-50 variants and deep learning applications.

Residual Networks, widely known as ResNets, are a specific type of artificial neural network (ANN) architecture designed to enable the training of extremely deep networks. Introduced by researchers at Microsoft in 2015, ResNet solved a critical bottleneck in deep learning known as the vanishing gradient problem. In traditional networks, stacking more layers often led to performance saturation or degradation because the signal required to update model weights would fade away as it propagated backward through the layers. ResNet introduced "skip connections" (or residual connections), which allow data to bypass one or more layers and flow directly to subsequent processing stages. This innovation proved that deeper networks could be trained effectively, leading to significant breakthroughs in computer vision (CV) and becoming a foundational concept for modern architectures.

The Core Concept: Residual Learning

The defining feature of a ResNet is the "residual block." In a standard convolutional neural network (CNN), each layer attempts to learn a direct mapping from input to output. As networks grow deeper, learning this direct mapping becomes increasingly difficult.

ResNet changes this approach by formulating the learning objective differently. Instead of hoping each stack of layers learns the entire underlying mapping, the residual block forces the layers to learn the "residual"—or the difference—between the input and the desired output. The original input is then added back to the learned residual through a skip connection. This structural change implies that if an identity mapping (passing the input unchanged) is optimal, the network can easily learn to push the residuals to zero. This makes deep learning (DL) models much easier to optimize, allowing them to scale from dozens to hundreds or even thousands of layers.

Key Architecture Variants

Since its inception, several variations of ResNet have become standard benchmarks in the AI community.

  • ResNet-50: A 50-layer version that utilizes a "bottleneck" design. This design uses 1x1 convolutions to reduce and then restore dimensions, making the network computationally efficient while maintaining high accuracy.
  • ResNet-101 and ResNet-152: Deeper variants with 101 and 152 layers, respectively. These are often used when computational resources allow for higher complexity to capture more intricate feature maps.
  • ResNeXt: An evolution of ResNet that introduces a "cardinality" dimension, splitting the residual block into multiple parallel paths, which improves efficiency and performance.

实际应用

The robustness of ResNet architectures has made them a go-to choice for a wide array of visual tasks.

  • Medical Image Analysis: In healthcare, identifying subtle anomalies in high-resolution scans is critical. ResNet-based models are frequently employed to detect conditions such as tumor detection in medical imaging, where the depth of the network helps in discerning fine-grained patterns in MRI or CT data.
  • Autonomous Vehicles: Self-driving cars require reliable feature extraction from camera feeds to identify pedestrians, signs, and obstacles. ResNets often serve as the backbone for object detection systems in AI in automotive applications, providing the rich visual features needed for safe navigation.

ResNet 与其他架构的比较

It is helpful to distinguish ResNet from other popular architectures to understand its specific utility.

  • ResNet vs. VGG: VGG (Visual Geometry Group) networks are also deep CNNs but lack residual connections. Consequently, they are much harder to train at depths comparable to ResNet and are generally more computationally expensive due to their large fully connected layers.
  • ResNet vs. Inception: Inception networks focus on width, using filters of multiple sizes within the same layer to capture features at different scales. ResNet focuses on depth. Modern architectures like Inception-ResNet combine both concepts.
  • ResNet vs. Vision Transformer (ViT): While ViTs use self-attention mechanisms to process images globally, ResNets rely on local convolutions. However, ResNets remain a strong baseline and are often faster for smaller datasets or real-time inference.

实施实例

Modern deep learning libraries like PyTorch make it simple to access pre-trained ResNet models. These models are invaluable for transfer learning, where a model trained on a large dataset like ImageNet is fine-tuned for a specific task.

The following Python snippet demonstrates how to load a pre-trained ResNet-50 model using torchvision (part of the PyTorch ecosystem) and perform a simple forward pass. While users of the Ultralytics 平台 might often use YOLO26 for detection, understanding the underlying backbone concepts like ResNet is crucial for advanced customization.

import torch
import torchvision.models as models

# Load a pre-trained ResNet-50 model
resnet50 = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
resnet50.eval()  # Set model to evaluation mode

# Create a dummy input tensor (batch_size, channels, height, width)
input_tensor = torch.randn(1, 3, 224, 224)

# Perform a forward pass to get predictions
with torch.no_grad():
    output = resnet50(input_tensor)

print(f"Output shape: {output.shape}")  # Expect [1, 1000] for ImageNet classes

Significance in Modern AI

Although newer architectures like YOLO26 employ highly optimized structures for maximum speed and accuracy, the principles of residual learning remain ubiquitous. The concept of skip connections is now a standard component in many advanced networks, including transformers used in natural language processing (NLP) and the latest object detection models. By enabling information to flow more freely through the network, ResNet paved the way for the deep, complex models that power today's artificial intelligence.

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入