Yolo 深圳
深セン
今すぐ参加
用語集

深層強化学習

Explore how Deep Reinforcement Learning combines neural networks with reward-based logic. Learn to build DRL agents using [YOLO26](https://docs.ultralytics.com/models/yolo26/) for advanced perception.

Deep Reinforcement Learning (DRL) is an advanced subset of artificial intelligence (AI) that combines the decision-making capabilities of reinforcement learning with the perceptual power of deep learning (DL). While traditional reinforcement learning relies on tabular methods to map situations to actions, these methods struggle when the environment is complex or visual. DRL overcomes this by using neural networks to interpret high-dimensional input data, such as video frames or sensor readings, enabling machines to learn effective strategies directly from raw experience without explicit human instruction.

The Core Mechanism of DRL

In a DRL system, an AI agent interacts with an environment in discrete time steps. At each step, the agent observes the current "state," selects an action based on a policy, and receives a reward signal indicating the success or failure of that action. The primary goal is to maximize the cumulative reward over time.

The "deep" component refers to the use of deep neural networks to approximate the policy (the strategy for acting) or the value function (the estimated future reward). This allows the agent to process unstructured data, utilizing computer vision (CV) to "see" the environment much like a human does. This capability is powered by frameworks like PyTorch or TensorFlow, which facilitate the training of these complex networks.

実際のアプリケーション

DRL has moved beyond theoretical research into practical, high-impact applications across various industries:

  • Advanced Robotics: In the field of AI in robotics, DRL enables machines to master complex motor skills that are difficult to hard-code. Robots can learn to grasp irregular objects or traverse uneven terrain by refining their movements within physics engines like NVIDIA Isaac Sim. This often involves training on synthetic data before deploying the policy to physical hardware.
  • Autonomous Driving: Autonomous vehicles leverage DRL to make real-time decisions in unpredictable traffic scenarios. While object detection models identify pedestrians and signs, DRL algorithms utilize that information to determine safe driving policies for lane merging, intersection navigation, and speed control, effectively managing the inference latency required for safety.

国家オブザーバーとしてのビジョン

For many DRL applications, the "state" is visual. High-speed models act as the eyes of the agent, converting raw imagery into structured data that the policy network can act upon. The following example illustrates how the YOLO26 model serves as the perception layer for an agent, extracting observations (e.g., obstacle counts) from the environment.

from ultralytics import YOLO

# Load YOLO26n to serve as the perception layer for a DRL agent
model = YOLO("yolo26n.pt")

# Simulate an observation from the environment (e.g., a robot's camera feed)
observation_frame = "https://ultralytics.com/images/bus.jpg"

# Perform inference to extract the state (detected objects)
results = model(observation_frame)

# The detection count serves as a simplified state feature for the agent's policy
print(f"State Observation: {len(results[0].boxes)} objects detected.")

DRLと関連概念の区別

ディープ強化学習を類似の用語と区別することは、そのユニークな位置づけを理解するのに役立つ。 を理解するのに役立つ:

  • Reinforcement Learning (RL): Standard RL is the foundational concept but typically relies on lookup tables (like Q-tables) which become impractical for large state spaces. DRL solves this by using deep learning to approximate functions, enabling it to handle complex inputs like images.
  • Reinforcement Learning from Human Feedback (RLHF): While DRL typically optimizes for a mathematically defined reward function (e.g., points in a game), RLHF refines models—specifically Large Language Models (LLMs)—using subjective human preferences to align AI behavior with human values, a technique popularized by research groups like OpenAI.
  • 教師なし学習 教師なし手法は、明示的なフィードバックなしにデータ内の隠れたパターンを探索する。これに対し、DRLは目標指向型であり、 サットンとバルトの基礎文献で論じられているように、報酬信号によって駆動され、エージェントを特定の目標に向けて積極的に導く。

Developers looking to manage the datasets required for the perception layers of DRL systems can utilize the Ultralytics Platform, which simplifies annotation and cloud training workflows. Additionally, researchers often use standardized environments such as Gymnasium to benchmark their DRL algorithms against established baselines.

Ultralytics コミュニティに参加する

AIの未来を共に切り開きましょう。グローバルなイノベーターと繋がり、協力し、成長を。

今すぐ参加