강화 학습을 통해 에이전트가 시행착오를 통해 행동을 최적화하여 보상을 극대화하는 방법을 알아보세요. 개념, 응용 분야 및 이점을 살펴보세요!
Reinforcement Learning (RL) is a goal-oriented subset of machine learning (ML) where an autonomous system, known as an agent, learns to make decisions by performing actions and receiving feedback from its environment. Unlike supervised learning, which relies on static datasets labeled with the correct answers, RL algorithms learn through a dynamic process of trial and error. The agent interacts with a simulation or the real world, observing the consequences of its actions to determine which strategies yield the highest long-term rewards. This approach closely mimics the psychological concept of operant conditioning, where behavior is shaped by positive reinforcement (rewards) and negative reinforcement (punishments) over time.
To understand how RL functions, it is helpful to visualize it as a continuous cycle of interaction. This framework is often mathematically formalized as a Markov Decision Process (MDP), which structures decision-making in situations where outcomes are partly random and partly controlled by the decision-maker.
The primary components of this learning loop include:
Reinforcement learning has moved beyond theoretical research into practical, high-impact deployments across various industries.
In many modern applications, the "state" an agent observes is visual. High-performance models like YOLO26 act as the perception layer for RL agents, converting raw images into structured data. This processed information—such as the location and class of objects—becomes the state that the RL policy uses to choose an action.
다음 예는 ultralytics 환경 프레임을 처리하는 패키지,
이론적 강화학습 루프를 위한 상태 표현(예: 객체 수) 생성.
from ultralytics import YOLO
# Load the YOLO26 model to serve as the agent's vision system
model = YOLO("yolo26n.pt")
# Simulate the agent observing the environment (an image frame)
observation_frame = "https://ultralytics.com/images/bus.jpg"
# Process the frame to extract the current 'state'
results = model(observation_frame)
# The agent uses detection data to inform its next action
# For example, an autonomous delivery robot might stop if it sees people
num_objects = len(results[0].boxes)
print(f"Agent Observation: {num_objects} objects detected. Calculating next move...")
It is important to distinguish Reinforcement Learning from other machine learning paradigms:
As computational power increases, techniques like Reinforcement Learning from Human Feedback (RLHF) are further refining how agents learn, aligning their objectives more closely with complex human values and safety standards. Researchers often use standardized environments like Gymnasium to benchmark and improve these algorithms. For teams looking to manage the datasets required for the perception layers of these agents, the Ultralytics Platform offers comprehensive tools for annotation and model management.