Glossary

Reinforcement Learning

Discover reinforcement learning, where agents optimize actions through trial & error to maximize rewards. Explore concepts, applications & benefits!

Reinforcement Learning (RL) is a domain of machine learning (ML) where an intelligent agent learns to make optimal decisions through trial and error. Unlike other learning paradigms, the agent is not told which actions to take. Instead, it interacts with an environment and receives feedback in the form of rewards or penalties. The fundamental goal of the agent is to learn a strategy, known as a policy, that maximizes its cumulative reward over time. This approach is inspired by behavioral psychology and is particularly powerful for solving sequential decision-making problems, as outlined in the foundational text by Sutton and Barto.

How Reinforcement Learning Works

The RL process is modeled as a continuous feedback loop involving several key components:

Agent: The learner and decision-maker, such as a robot or a game-playing program.
Environment: The external world with which the agent interacts.
State: A snapshot of the environment at a specific moment, providing the agent with the information it needs to make a decision.
Action: A move selected by the agent from a set of possible options.
Reward: A numerical signal sent from the environment to the agent after each action, indicating how desirable the action was.

The agent observes the current state of the environment, performs an action, and receives a reward along with the next state. This cycle repeats, and through this experience, the agent gradually refines its policy to favor actions that lead to higher long-term rewards. The formal framework for this problem is often described by a Markov Decision Process (MDP). Popular RL algorithms include Q-learning and Policy Gradients.

Comparison With Other Learning Paradigms

RL is distinct from the other main types of machine learning:

Supervised Learning: In supervised learning, a model learns from a dataset that is fully labeled with correct answers. For example, an image classification model is trained on images with explicit labels. In contrast, RL learns from reward signals without explicit supervision on what the best action is at each step. You can explore a detailed comparison of supervised and unsupervised learning.
Unsupervised Learning: This paradigm involves finding hidden patterns or structures in unlabeled data. Its goal is data exploration, like using k-means clustering, rather than decision-making to maximize a reward.
Deep Reinforcement Learning (DRL): DRL is not a different paradigm but an advanced form of RL that uses deep neural networks to handle complex, high-dimensional state and action spaces. This allows RL to scale to problems previously considered intractable, such as processing raw pixel data from a camera for autonomous vehicles.

Real-World Applications

RL has achieved remarkable success in a variety of complex domains:

Game Playing: RL agents have achieved superhuman performance in complex games. A prominent example is DeepMind's AlphaGo, which learned to defeat the world's best Go players. Another is OpenAI's work on Dota 2, where an agent learned complex team strategies.
Robotics: RL is used to train robots to perform intricate tasks such as object manipulation, assembly, and locomotion. Instead of being explicitly programmed, a robot can learn to walk or grasp objects by being rewarded for successful attempts in a simulated or real environment. This is a key area of research at institutions like the Berkeley Artificial Intelligence Research (BAIR) Lab.
Resource Management: Optimizing operations in complex systems, such as managing traffic flow in cities, balancing load in energy grids, and optimizing chemical reactions.
Recommendation Systems: RL can be used to optimize the sequence of items recommended to a user to maximize long-term engagement and satisfaction, rather than just immediate clicks.

Relevance In The AI Ecosystem

Reinforcement Learning is a crucial component of the broader Artificial Intelligence (AI) landscape, especially for creating autonomous systems. While companies like Ultralytics specialize in vision AI models like Ultralytics YOLO for tasks such as object detection and instance segmentation using supervised learning, the perception capabilities of these models are essential inputs for RL agents.

For instance, a robot might use a YOLO model for perception, deployed via Ultralytics HUB, to understand its surroundings (the "state"). An RL policy then uses this information to decide its next move. This synergy between Computer Vision (CV) for perception and RL for decision-making is fundamental to building intelligent systems. These systems are often developed using frameworks like PyTorch and TensorFlow and are frequently tested in standardized simulation environments like Gymnasium (formerly OpenAI Gym). To improve model alignment with human preferences, techniques like Reinforcement Learning from Human Feedback (RLHF) are also becoming increasingly important in the field. Progress in RL is continuously driven by organizations like DeepMind and academic conferences such as NeurIPS.

Reinforcement Learning

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

How Reinforcement Learning Works

Comparison With Other Learning Paradigms

Real-World Applications

Relevance In The AI Ecosystem

Read more in this category

Deploy Ultralytics YOLO models using the ExecuTorch integration

Key highlights from Ultralytics at PyTorch Conference 2025

Using self-supervised learning to denoise images

Join the Ultralytics community