Yolo Tầm nhìn Thâm Quyến
Thâm Quyến
Tham gia ngay
Bảng chú giải thuật ngữ

Học Tăng Cường Sâu (Deep Reinforcement Learning)

Explore how Deep Reinforcement Learning combines neural networks with reward-based logic. Learn to build DRL agents using [YOLO26](https://docs.ultralytics.com/models/yolo26/) for advanced perception.

Deep Reinforcement Learning (DRL) is an advanced subset of artificial intelligence (AI) that combines the decision-making capabilities of reinforcement learning with the perceptual power of deep learning (DL). While traditional reinforcement learning relies on tabular methods to map situations to actions, these methods struggle when the environment is complex or visual. DRL overcomes this by using neural networks to interpret high-dimensional input data, such as video frames or sensor readings, enabling machines to learn effective strategies directly from raw experience without explicit human instruction.

The Core Mechanism of DRL

In a DRL system, an AI agent interacts with an environment in discrete time steps. At each step, the agent observes the current "state," selects an action based on a policy, and receives a reward signal indicating the success or failure of that action. The primary goal is to maximize the cumulative reward over time.

The "deep" component refers to the use of deep neural networks to approximate the policy (the strategy for acting) or the value function (the estimated future reward). This allows the agent to process unstructured data, utilizing computer vision (CV) to "see" the environment much like a human does. This capability is powered by frameworks like PyTorch or TensorFlow, which facilitate the training of these complex networks.

Các Ứng dụng Thực tế

DRL has moved beyond theoretical research into practical, high-impact applications across various industries:

  • Advanced Robotics: In the field of AI in robotics, DRL enables machines to master complex motor skills that are difficult to hard-code. Robots can learn to grasp irregular objects or traverse uneven terrain by refining their movements within physics engines like NVIDIA Isaac Sim. This often involves training on synthetic data before deploying the policy to physical hardware.
  • Autonomous Driving: Autonomous vehicles leverage DRL to make real-time decisions in unpredictable traffic scenarios. While object detection models identify pedestrians and signs, DRL algorithms utilize that information to determine safe driving policies for lane merging, intersection navigation, and speed control, effectively managing the inference latency required for safety.

Tầm nhìn với tư cách là Quan sát viên Nhà nước

For many DRL applications, the "state" is visual. High-speed models act as the eyes of the agent, converting raw imagery into structured data that the policy network can act upon. The following example illustrates how the YOLO26 model serves as the perception layer for an agent, extracting observations (e.g., obstacle counts) from the environment.

from ultralytics import YOLO

# Load YOLO26n to serve as the perception layer for a DRL agent
model = YOLO("yolo26n.pt")

# Simulate an observation from the environment (e.g., a robot's camera feed)
observation_frame = "https://ultralytics.com/images/bus.jpg"

# Perform inference to extract the state (detected objects)
results = model(observation_frame)

# The detection count serves as a simplified state feature for the agent's policy
print(f"State Observation: {len(results[0].boxes)} objects detected.")

Phân biệt DRL với các khái niệm liên quan

Sẽ rất hữu ích khi phân biệt Deep Reinforcement Learning với các thuật ngữ tương tự để hiểu vị trí độc đáo của nó trong bối cảnh AI:

  • Reinforcement Learning (RL): Standard RL is the foundational concept but typically relies on lookup tables (like Q-tables) which become impractical for large state spaces. DRL solves this by using deep learning to approximate functions, enabling it to handle complex inputs like images.
  • Reinforcement Learning from Human Feedback (RLHF): While DRL typically optimizes for a mathematically defined reward function (e.g., points in a game), RLHF refines models—specifically Large Language Models (LLMs)—using subjective human preferences to align AI behavior with human values, a technique popularized by research groups like OpenAI.
  • Học không giám sát : Các phương pháp không giám sát tìm kiếm các mẫu ẩn trong dữ liệu mà không cần phản hồi rõ ràng. Ngược lại, DRL hướng đến mục tiêu, được thúc đẩy bởi tín hiệu phần thưởng chủ động hướng dẫn tác nhân đến một mục tiêu cụ thể, như đã được thảo luận trong các tài liệu nền tảng của Sutton và Barto .

Developers looking to manage the datasets required for the perception layers of DRL systems can utilize the Ultralytics Platform, which simplifies annotation and cloud training workflows. Additionally, researchers often use standardized environments such as Gymnasium to benchmark their DRL algorithms against established baselines.

Tham gia Ultralytics cộng đồng

Tham gia vào tương lai của AI. Kết nối, hợp tác và phát triển cùng với những nhà đổi mới toàn cầu

Tham gia ngay