Explore how Deep Reinforcement Learning combines neural networks with reward-based logic. Learn to build DRL agents using [YOLO26](https://docs.ultralytics.com/models/yolo26/) for advanced perception.
Deep Reinforcement Learning (DRL) is an advanced subset of artificial intelligence (AI) that combines the decision-making capabilities of reinforcement learning with the perceptual power of deep learning (DL). While traditional reinforcement learning relies on tabular methods to map situations to actions, these methods struggle when the environment is complex or visual. DRL overcomes this by using neural networks to interpret high-dimensional input data, such as video frames or sensor readings, enabling machines to learn effective strategies directly from raw experience without explicit human instruction.
In a DRL system, an AI agent interacts with an environment in discrete time steps. At each step, the agent observes the current "state," selects an action based on a policy, and receives a reward signal indicating the success or failure of that action. The primary goal is to maximize the cumulative reward over time.
The "deep" component refers to the use of deep neural networks to approximate the policy (the strategy for acting) or the value function (the estimated future reward). This allows the agent to process unstructured data, utilizing computer vision (CV) to "see" the environment much like a human does. This capability is powered by frameworks like PyTorch or TensorFlow, which facilitate the training of these complex networks.
DRL has moved beyond theoretical research into practical, high-impact applications across various industries:
For many DRL applications, the "state" is visual. High-speed models act as the eyes of the agent, converting raw imagery into structured data that the policy network can act upon. The following example illustrates how the YOLO26 model serves as the perception layer for an agent, extracting observations (e.g., obstacle counts) from the environment.
from ultralytics import YOLO
# Load YOLO26n to serve as the perception layer for a DRL agent
model = YOLO("yolo26n.pt")
# Simulate an observation from the environment (e.g., a robot's camera feed)
observation_frame = "https://ultralytics.com/images/bus.jpg"
# Perform inference to extract the state (detected objects)
results = model(observation_frame)
# The detection count serves as a simplified state feature for the agent's policy
print(f"State Observation: {len(results[0].boxes)} objects detected.")
Es útil diferenciar el aprendizaje profundo por refuerzo de otros términos similares para comprender su posición única en el panorama de la IA. panorama de la IA:
Developers looking to manage the datasets required for the perception layers of DRL systems can utilize the Ultralytics Platform, which simplifies annotation and cloud training workflows. Additionally, researchers often use standardized environments such as Gymnasium to benchmark their DRL algorithms against established baselines.