Yolo 비전 선전
선전
지금 참여하기
용어집

RNN(Recurrent Neural Network)

NLP에서 시계열 분석에 이르기까지 순차적 데이터를 위한 RNN(Recurrent Neural Networks)의 강력한 기능을 경험해 보세요. 주요 개념과 응용 분야를 오늘 알아보세요!

A Recurrent Neural Network (RNN) is a type of artificial neural network specifically designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or the spoken word. Unlike traditional feedforward networks that assume all inputs (and outputs) are independent of each other, RNNs retain a form of memory. This internal memory allows them to process inputs with an understanding of previous information, making them uniquely suited for tasks where context and temporal order are critical. This architecture mimics how humans process information—reading a sentence, for instance, requires remembering previous words to understand the current one.

How RNNs Work

The core innovation of an RNN is its loop structure. In a standard feedforward network, information flows in only one direction: from input to output. In contrast, an RNN has a feedback loop that allows information to persist. As the network processes a sequence, it maintains a "hidden state"—a vector that acts as the network's short-term memory. At each time step, the RNN takes the current input and the previous hidden state to produce an output and update the hidden state for the next step.

This sequential processing capability is essential for Natural Language Processing (NLP) and time series analysis. However, standard RNNs often struggle with long sequences due to the vanishing gradient problem, where the network forgets earlier inputs as the sequence grows. This limitation led to the development of more advanced variants like Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), which introduce mechanisms to better regulate information flow over longer periods.

실제 애플리케이션

Recurrent Neural Networks have transformed many industries by enabling machines to understand sequential data. Here are two prominent examples:

  1. Machine Translation: Services like Google Translate originally relied heavily on RNN-based architectures (specifically sequence-to-sequence models) to convert text from one language to another. The network reads the entire input sentence (e.g., in English) and builds a context vector, which it then uses to generate the translated output (e.g., in French) word by word, ensuring grammatical consistency.
  2. Predictive Typing and Autocorrect: When you type on a smartphone, the keyboard suggests the next likely word. This is often powered by a language model trained with RNNs. The model analyzes the sequence of words you have already typed to predict the most probable next word, enhancing user speed and accuracy. Similar logic applies to speech recognition systems that transcribe spoken audio into text.

RNNs vs. CNNs and Transformers

It is helpful to distinguish RNNs from other major architectures. A Convolutional Neural Network (CNN) is primarily designed for spatial data, such as images, processing pixel grids to identify shapes and objects. For instance, Ultralytics YOLO26 uses a powerful CNN backbone for real-time object detection. While a CNN excels at "what is in this image?", an RNN excels at "what happens next in this video?".

More recently, the Transformer architecture has largely superseded RNNs for many complex NLP tasks. Transformers use an attention mechanism to process entire sequences in parallel rather than sequentially. However, RNNs remain highly efficient for specific low-latency, resource-constrained streaming applications and are simpler to deploy on edge devices for simple time-series forecasting.

PyTorch Implementation Example

While modern computer vision tasks often rely on CNNs, hybrid models may use RNNs to analyze temporal features extracted from video frames. Below is a simple, runnable example using PyTorch to create a basic RNN layer that processes a sequence of data.

import torch
import torch.nn as nn

# Define a basic RNN layer
# input_size: number of features in the input (e.g., 10 features per time step)
# hidden_size: number of features in the hidden state (memory)
# batch_first: input shape will be (batch, seq, feature)
rnn = nn.RNN(input_size=10, hidden_size=20, num_layers=1, batch_first=True)

# Create a dummy input: Batch size 1, Sequence length 5, Features 10
input_seq = torch.randn(1, 5, 10)

# Forward pass through the RNN
# output contains the hidden state for every time step
# hn contains the final hidden state
output, hn = rnn(input_seq)

print(f"Output shape: {output.shape}")  # Expected: torch.Size([1, 5, 20])
print(f"Final hidden state shape: {hn.shape}")  # Expected: torch.Size([1, 1, 20])

과제 및 향후 전망

Despite their utility, RNNs face computational hurdles. Sequential processing inhibits parallelization, making training slower compared to Transformers on GPUs. Furthermore, managing the exploding gradient problem requires careful hyperparameter tuning and techniques like gradient clipping.

Nevertheless, RNNs remain a fundamental concept in Deep Learning (DL). They are integral to understanding the evolution of Artificial Intelligence (AI) and are still widely used in simple anomaly detection systems for IoT sensors. For developers building complex pipelines—such as combining vision models with sequence predictors—managing datasets and training workflows is crucial. The Ultralytics Platform simplifies this process, offering tools to manage data and deploy models efficiently across various environments.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기