Yolo Vision Shenzhen
Шэньчжэнь
Присоединиться сейчас
Глоссарий

Обучение с подкреплением

Откройте для себя обучение с подкреплением, где агенты оптимизируют действия методом проб и ошибок, чтобы максимизировать вознаграждения. Изучите концепции, приложения и преимущества!

Reinforcement Learning (RL) is a goal-oriented subset of machine learning (ML) where an autonomous system, known as an agent, learns to make decisions by performing actions and receiving feedback from its environment. Unlike supervised learning, which relies on static datasets labeled with the correct answers, RL algorithms learn through a dynamic process of trial and error. The agent interacts with a simulation or the real world, observing the consequences of its actions to determine which strategies yield the highest long-term rewards. This approach closely mimics the psychological concept of operant conditioning, where behavior is shaped by positive reinforcement (rewards) and negative reinforcement (punishments) over time.

Core Concepts of the RL Loop

To understand how RL functions, it is helpful to visualize it as a continuous cycle of interaction. This framework is often mathematically formalized as a Markov Decision Process (MDP), which structures decision-making in situations where outcomes are partly random and partly controlled by the decision-maker.

The primary components of this learning loop include:

  • AI Agent: The entity responsible for learning and making decisions. It perceives the environment and takes actions to maximize its cumulative success.
  • Environment: The external world in which the agent operates. This could be a complex video game, a financial market simulation, or a physical warehouse in AI in logistics.
  • State: A snapshot or representation of the current situation. In visual applications, this often involves processing camera feeds using computer vision (CV) to detect objects and obstacles.
  • Action: The specific move or choice the agent makes. The complete set of all possible moves is referred to as the action space.
  • Reward: A numerical signal sent from the environment to the agent after an action. A well-designed reward function assigns positive values for beneficial actions and penalties for detrimental ones.
  • Policy: The strategy or rule set the agent uses to determine the next action based on the current state. Algorithms like Q-learning define how this policy is updated and optimized.

Применение в реальном мире

Reinforcement learning has moved beyond theoretical research into practical, high-impact deployments across various industries.

  • Advanced Robotics: In the field of AI in robotics, RL enables machines to master complex motor skills that are difficult to hard-code. Robots can learn to grasp irregular objects or navigate uneven terrain by training within physics engines like NVIDIA Isaac Sim before deploying to the real world.
  • Autonomous Systems: Autonomous vehicles utilize RL to make real-time decisions in unpredictable traffic scenarios. While object detection models identify pedestrians and signs, RL algorithms help determine safe driving policies for lane merging and intersection navigation.
  • Strategic Optimization: RL gained global attention when systems like Google DeepMind's AlphaGo defeated human world champions in complex board games. Beyond gaming, these agents optimize industrial logistics, such as controlling cooling systems in data centers to reduce energy consumption.

Интеграция Vision с RL

In many modern applications, the "state" an agent observes is visual. High-performance models like YOLO26 act as the perception layer for RL agents, converting raw images into structured data. This processed information—such as the location and class of objects—becomes the state that the RL policy uses to choose an action.

Следующий пример демонстрирует, как использовать ultralytics пакет для обработки фрейма окружения, создавая представление состояния (например, количество объектов) для теоретического цикла RL.

from ultralytics import YOLO

# Load the YOLO26 model to serve as the agent's vision system
model = YOLO("yolo26n.pt")

# Simulate the agent observing the environment (an image frame)
observation_frame = "https://ultralytics.com/images/bus.jpg"

# Process the frame to extract the current 'state'
results = model(observation_frame)

# The agent uses detection data to inform its next action
# For example, an autonomous delivery robot might stop if it sees people
num_objects = len(results[0].boxes)
print(f"Agent Observation: {num_objects} objects detected. Calculating next move...")

Дифференциация смежных терминов

It is important to distinguish Reinforcement Learning from other machine learning paradigms:

  • vs. Supervised Learning: Supervised learning requires a knowledgeable external supervisor to provide labeled training data (e.g., "this image contains a cat"). In contrast, RL learns from the consequences of its own actions without explicit labels, discovering optimal paths through exploration.
  • vs. Unsupervised Learning: Unsupervised learning focuses on finding hidden structures or patterns within unlabeled data (like clustering customers). RL differs because it is explicitly goal-oriented, focusing on maximizing a reward signal rather than just describing data structure.

As computational power increases, techniques like Reinforcement Learning from Human Feedback (RLHF) are further refining how agents learn, aligning their objectives more closely with complex human values and safety standards. Researchers often use standardized environments like Gymnasium to benchmark and improve these algorithms. For teams looking to manage the datasets required for the perception layers of these agents, the Ultralytics Platform offers comprehensive tools for annotation and model management.

Присоединяйтесь к сообществу Ultralytics

Присоединяйтесь к будущему ИИ. Общайтесь, сотрудничайте и развивайтесь вместе с мировыми новаторами

Присоединиться сейчас