Yolo Vision Shenzhen
Шэньчжэнь
Присоединиться сейчас
Глоссарий

Глубокое обучение с подкреплением

Explore how Deep Reinforcement Learning combines neural networks with reward-based logic. Learn to build DRL agents using [YOLO26](https://docs.ultralytics.com/models/yolo26/) for advanced perception.

Deep Reinforcement Learning (DRL) is an advanced subset of artificial intelligence (AI) that combines the decision-making capabilities of reinforcement learning with the perceptual power of deep learning (DL). While traditional reinforcement learning relies on tabular methods to map situations to actions, these methods struggle when the environment is complex or visual. DRL overcomes this by using neural networks to interpret high-dimensional input data, such as video frames or sensor readings, enabling machines to learn effective strategies directly from raw experience without explicit human instruction.

The Core Mechanism of DRL

In a DRL system, an AI agent interacts with an environment in discrete time steps. At each step, the agent observes the current "state," selects an action based on a policy, and receives a reward signal indicating the success or failure of that action. The primary goal is to maximize the cumulative reward over time.

The "deep" component refers to the use of deep neural networks to approximate the policy (the strategy for acting) or the value function (the estimated future reward). This allows the agent to process unstructured data, utilizing computer vision (CV) to "see" the environment much like a human does. This capability is powered by frameworks like PyTorch or TensorFlow, which facilitate the training of these complex networks.

Применение в реальном мире

DRL has moved beyond theoretical research into practical, high-impact applications across various industries:

  • Advanced Robotics: In the field of AI in robotics, DRL enables machines to master complex motor skills that are difficult to hard-code. Robots can learn to grasp irregular objects or traverse uneven terrain by refining their movements within physics engines like NVIDIA Isaac Sim. This often involves training on synthetic data before deploying the policy to physical hardware.
  • Autonomous Driving: Autonomous vehicles leverage DRL to make real-time decisions in unpredictable traffic scenarios. While object detection models identify pedestrians and signs, DRL algorithms utilize that information to determine safe driving policies for lane merging, intersection navigation, and speed control, effectively managing the inference latency required for safety.

Видение в качестве государственного наблюдателя

For many DRL applications, the "state" is visual. High-speed models act as the eyes of the agent, converting raw imagery into structured data that the policy network can act upon. The following example illustrates how the YOLO26 model serves as the perception layer for an agent, extracting observations (e.g., obstacle counts) from the environment.

from ultralytics import YOLO

# Load YOLO26n to serve as the perception layer for a DRL agent
model = YOLO("yolo26n.pt")

# Simulate an observation from the environment (e.g., a robot's camera feed)
observation_frame = "https://ultralytics.com/images/bus.jpg"

# Perform inference to extract the state (detected objects)
results = model(observation_frame)

# The detection count serves as a simplified state feature for the agent's policy
print(f"State Observation: {len(results[0].boxes)} objects detected.")

Отличие DRL от смежных концепций

Полезно отличать Deep Reinforcement Learning от схожих терминов, чтобы понять его уникальное положение в ИИ:

  • Reinforcement Learning (RL): Standard RL is the foundational concept but typically relies on lookup tables (like Q-tables) which become impractical for large state spaces. DRL solves this by using deep learning to approximate functions, enabling it to handle complex inputs like images.
  • Reinforcement Learning from Human Feedback (RLHF): While DRL typically optimizes for a mathematically defined reward function (e.g., points in a game), RLHF refines models—specifically Large Language Models (LLMs)—using subjective human preferences to align AI behavior with human values, a technique popularized by research groups like OpenAI.
  • Неконтролируемое обучение: Неконтролируемые методы ищут скрытые закономерности в данных без явной обратной связи. В отличие от этого, DRL ориентировано на цель и управляется сигналом вознаграждения, который активно направляет агент к конкретной цели, как обсуждается в основополагающих текстах Саттона и Барто.

Developers looking to manage the datasets required for the perception layers of DRL systems can utilize the Ultralytics Platform, which simplifies annotation and cloud training workflows. Additionally, researchers often use standardized environments such as Gymnasium to benchmark their DRL algorithms against established baselines.

Присоединяйтесь к сообществу Ultralytics

Присоединяйтесь к будущему ИИ. Общайтесь, сотрудничайте и развивайтесь вместе с мировыми новаторами

Присоединиться сейчас