Yolo Vision Shenzhen
Shenzhen
Jetzt beitreten
Glossar

Deep Reinforcement Learning

Explore how Deep Reinforcement Learning combines neural networks with reward-based logic. Learn to build DRL agents using [YOLO26](https://docs.ultralytics.com/models/yolo26/) for advanced perception.

Deep Reinforcement Learning (DRL) is an advanced subset of artificial intelligence (AI) that combines the decision-making capabilities of reinforcement learning with the perceptual power of deep learning (DL). While traditional reinforcement learning relies on tabular methods to map situations to actions, these methods struggle when the environment is complex or visual. DRL overcomes this by using neural networks to interpret high-dimensional input data, such as video frames or sensor readings, enabling machines to learn effective strategies directly from raw experience without explicit human instruction.

The Core Mechanism of DRL

In a DRL system, an AI agent interacts with an environment in discrete time steps. At each step, the agent observes the current "state," selects an action based on a policy, and receives a reward signal indicating the success or failure of that action. The primary goal is to maximize the cumulative reward over time.

The "deep" component refers to the use of deep neural networks to approximate the policy (the strategy for acting) or the value function (the estimated future reward). This allows the agent to process unstructured data, utilizing computer vision (CV) to "see" the environment much like a human does. This capability is powered by frameworks like PyTorch or TensorFlow, which facilitate the training of these complex networks.

Anwendungsfälle in der Praxis

DRL has moved beyond theoretical research into practical, high-impact applications across various industries:

  • Advanced Robotics: In the field of AI in robotics, DRL enables machines to master complex motor skills that are difficult to hard-code. Robots can learn to grasp irregular objects or traverse uneven terrain by refining their movements within physics engines like NVIDIA Isaac Sim. This often involves training on synthetic data before deploying the policy to physical hardware.
  • Autonomous Driving: Autonomous vehicles leverage DRL to make real-time decisions in unpredictable traffic scenarios. While object detection models identify pedestrians and signs, DRL algorithms utilize that information to determine safe driving policies for lane merging, intersection navigation, and speed control, effectively managing the inference latency required for safety.

Vision als staatlicher Beobachter

For many DRL applications, the "state" is visual. High-speed models act as the eyes of the agent, converting raw imagery into structured data that the policy network can act upon. The following example illustrates how the YOLO26 model serves as the perception layer for an agent, extracting observations (e.g., obstacle counts) from the environment.

from ultralytics import YOLO

# Load YOLO26n to serve as the perception layer for a DRL agent
model = YOLO("yolo26n.pt")

# Simulate an observation from the environment (e.g., a robot's camera feed)
observation_frame = "https://ultralytics.com/images/bus.jpg"

# Perform inference to extract the state (detected objects)
results = model(observation_frame)

# The detection count serves as a simplified state feature for the agent's policy
print(f"State Observation: {len(results[0].boxes)} objects detected.")

Unterscheidung zwischen DRL und verwandten Konzepten

Es ist hilfreich, Deep Reinforcement Learning von ähnlichen Begriffen abzugrenzen, um seine einzigartige Position in der KI-Landschaft zu verstehen. KI-Landschaft zu verstehen:

  • Reinforcement Learning (RL): Standard RL is the foundational concept but typically relies on lookup tables (like Q-tables) which become impractical for large state spaces. DRL solves this by using deep learning to approximate functions, enabling it to handle complex inputs like images.
  • Reinforcement Learning from Human Feedback (RLHF): While DRL typically optimizes for a mathematically defined reward function (e.g., points in a game), RLHF refines models—specifically Large Language Models (LLMs)—using subjective human preferences to align AI behavior with human values, a technique popularized by research groups like OpenAI.
  • Unüberwachtes Lernen: Unüberwachte Methoden suchen ohne explizites Feedback nach versteckten Mustern in Daten. Im Gegensatz dazu ist DRL zielorientiert und wird von einem Belohnungssignal gesteuert, das den Agenten aktiv zu einem bestimmten Ziel führt, wie in den grundlegenden Texten von Sutton und Barto erläutert.

Developers looking to manage the datasets required for the perception layers of DRL systems can utilize the Ultralytics Platform, which simplifies annotation and cloud training workflows. Additionally, researchers often use standardized environments such as Gymnasium to benchmark their DRL algorithms against established baselines.

Werden Sie Mitglied der Ultralytics

Gestalten Sie die Zukunft der KI mit. Vernetzen Sie sich, arbeiten Sie zusammen und wachsen Sie mit globalen Innovatoren

Jetzt beitreten