Yolo Vision Shenzhen
Shenzhen
Join now
Glossary

Residual Networks (ResNet)

Discover how ResNets revolutionize deep learning by solving vanishing gradients, enabling ultradeep networks for image analysis, NLP, and more.

Residual Networks, commonly referred to as ResNets, represent a breakthrough architecture in deep learning that solved a fundamental problem in training very deep neural networks. Before their introduction by researchers at Microsoft Research in 2015, adding more layers to a neural network (NN) often led to a decrease in accuracy due to the vanishing gradient problem, where signals would fade as they passed through complex layers. ResNet introduced a clever structural change called "skip connections" or "shortcut connections," which allow information to bypass certain layers and flow directly to subsequent ones. This innovation enabled the training of networks with hundreds of layers, significantly advancing the capabilities of computer vision (CV) systems.

The Problem ResNets Solve

In traditional deep learning (DL) models, layers are stacked sequentially. As networks become deeper to capture more complex features, they become harder to train. This difficulty arises because the gradients—signals used to update model weights during training—can become infinitesimally small as they propagate back through many layers, a phenomenon known as the vanishing gradient problem.

ResNet addresses this by restructuring the network into residual blocks. Instead of learning a direct mapping from input to output, each block learns the difference (or residual) between the input and the desired output. The skip connection adds the original input directly to the output of the block. This simple addition creates a direct path for gradients to flow backward during backpropagation, ensuring that even very deep networks can learn effectively without performance degradation. For a deeper theoretical understanding, you can explore the original paper, Deep Residual Learning for Image Recognition.

Key Components and Architecture

The success of ResNet lies in its modular design, which has influenced many modern architectures.

  • Residual Blocks: The fundamental building block containing a skip connection. It typically consists of two or three convolutional neural network (CNN) layers, followed by batch normalization and a ReLU activation function.
  • Identity Mapping: The skip connection performs an identity mapping, meaning it passes the input signal unchanged. This ensures that in the worst-case scenario, a layer can simply pass information through without distorting it, preserving the network's performance.
  • Bottleneck Design: In deeper variants like ResNet-50 or ResNet-101, a "bottleneck" design is used to improve efficiency. This involves using 1x1 convolutions to reduce dimension before expensive 3x3 convolutions, effectively lowering computational cost while maintaining high accuracy.

Real-World Applications

The robustness of ResNet has made it a standard choice for various high-impact applications.

  • Medical Image Analysis: In healthcare, distinguishing between healthy tissue and anomalies like tumors requires detecting subtle textures. ResNet models are frequently used as backbones for systems that analyze MRI or CT scans. For instance, they help in tumor detection, where the depth of the network allows it to learn intricate biological patterns that shallower networks might miss.
  • Autonomous Vehicles: Self-driving cars rely on real-time object detection to identify pedestrians, traffic lights, and other vehicles. ResNet often serves as the backbone for detection frameworks, processing raw camera feeds to extract rich feature maps that subsequent layers use to localize objects, ensuring safety in AI in automotive applications.

ResNet in Modern AI Workflows

While newer architectures like the Vision Transformer (ViT) have gained popularity, ResNet remains a go-to baseline due to its balance of speed and accuracy. It is widely used in transfer learning, where a model pre-trained on a massive dataset like ImageNet is fine-tuned for a specific task with limited data.

Modern object detectors, including the state-of-the-art YOLO26, often incorporate concepts evolved from ResNet, such as residual connections within their backbones, to facilitate efficient feature extraction across multiple scales.

Implementation Example

You can easily utilize a ResNet model for image classification using the ultralytics Python package. This example demonstrates loading a pre-trained ResNet50 model to classify an image.

from ultralytics import YOLO

# Load a pre-trained ResNet50 model
model = YOLO("resnet50.pt")

# Perform inference on an image
results = model("https://ultralytics.com/images/bus.jpg")

# Print the top predicted class
print(f"Prediction: {results[0].names[results[0].probs.top1]}")

ResNet vs. VGG and Plain Networks

It is helpful to distinguish ResNet from other architectures to understand its unique contribution.

  • ResNet vs. Plain Networks: A "plain" network stacks layers directly without skip connections. As these networks get deeper (e.g., beyond 20 layers), their training error increases. ResNet solves this; a 152-layer ResNet has lower training error than a 20-layer plain network.
  • ResNet vs. VGG: The VGG network popularized the use of small 3x3 convolution filters but is very computationally expensive and heavy in parameters. ResNet achieves better performance with fewer parameters and much greater depth, making it more efficient for inference latency sensitive applications.

For a broader look at how these models fit into the landscape of computer vision, you can explore our guide on object detection architectures or learn how to train your own models on custom datasets.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now