Yolo Vision Shenzhen
Шэньчжэнь
Присоединиться сейчас
Глоссарий

AI Agent

Узнайте, что такое агент ИИ и как эти автономные системы обеспечивают современную автоматизацию. Откройте для себя их цикл «восприятие-мышление-действие» и роль в компьютерном зрении и робототехнике.

An AI Agent is an autonomous system capable of perceiving its environment, reasoning through complex logic to make decisions, and taking specific actions to achieve defined goals. Unlike a static machine learning model, which passively processes input to produce an output, an agent operates dynamically within a continuous workflow. These systems form the "active" layer of artificial intelligence, bridging the gap between digital predictions and real-world execution. By utilizing memory and adaptive learning, agents can handle tasks ranging from software automation to physical navigation without constant human intervention.

The Perception-Reasoning-Action Loop

The functionality of an AI agent relies on a cyclical process often described as the Perception-Action Loop. This architecture allows the agent to interact meaningfully with its surroundings.

  1. Perception (Sensing): The agent gathers information from the world. In computer vision applications, the agent uses cameras as "eyes." It employs high-speed models like YOLO26 to perform object detection or segmentation, converting raw pixels into structured data.
  2. Reasoning (Thinking): The agent processes the perceived data against its objectives. This stage often integrates Large Language Models (LLMs) for semantic understanding or reinforcement learning algorithms to optimize decision-making strategies. Advanced agents can plan multiple steps ahead, much like a chess player anticipating future moves.
  3. Action (Executing): Based on its reasoning, the agent executes a task. This could be a digital action, such as querying a database or sending an alert, or a physical action in robotics, such as a robotic arm picking a specific item from a conveyor belt.

ИИ-агент против ИИ-модели

It is important to distinguish between an agent and a model, as they serve different roles in the technology stack.

  • AI Model: A model is a mathematical engine, such as a neural network, trained to recognize patterns. It is a tool that provides predictions (e.g., "This is a car") but does not inherently act on them.
  • AI Agent: An agent is the encompassing system that uses models as tools. It possesses agency—the capacity to initiate change. For instance, while a model identifies a red light, the agent decides to apply the brakes.

Применение в реальном мире

AI agents are transforming industries by automating workflows that require cognitive flexibility.

  • Smart Manufacturing: In industrial automation, visual agents monitor production lines. If a defect is identified by a quality control system, the agent can autonomously halt machinery and log the incident, preventing waste.
  • Autonomous Logistics: Warehouses utilize agentic robots for inventory management. These agents navigate dynamic environments using SLAM (Simultaneous Localization and Mapping) and vision models to locate, pick, and transport packages efficiently.

Создание простого агента видения

Developers can build basic agents by combining perception models with conditional logic. The following Python example demonstrates a simple "Security Agent" using the ultralytics package. The agent detects a person and decides whether to trigger an alert based on the model's confidence.

from ultralytics import YOLO

# Load the YOLO26 model (The Agent's Perception)
model = YOLO("yolo26n.pt")

# 1. Perceive: The agent analyzes an image
results = model("bus.jpg")

# 2. Reason & 3. Act: Decision logic based on perception
for result in results:
    # Check if a 'person' (class 0) is detected with high confidence
    if 0 in result.boxes.cls and result.boxes.conf.max() > 0.5:
        print("ACTION: Person detected! Initiating security protocol.")
    else:
        print("ACTION: Area clear. Continuing surveillance.")

Связанные понятия

  • Edge AI: To react in real-time, agents often run locally on hardware like the NVIDIA Jetson, minimizing latency by processing data at the source rather than the cloud.
  • Artificial General Intelligence (AGI): While current agents are specialized (Narrow AI), AGI refers to hypothetical agents capable of performing any intellectual task that a human can do.
  • Generative AI: Modern agents frequently use GenAI to create dynamic responses or code, acting as assistants that can generate content as part of their workflow.

For those looking to train the underlying models for their agents, the Ultralytics Platform offers a streamlined environment for annotating datasets and managing training runs. Further reading on agent architectures can be found in research from organizations like Stanford HAI and DeepMind.

Присоединяйтесь к сообществу Ultralytics

Присоединяйтесь к будущему ИИ. Общайтесь, сотрудничайте и развивайтесь вместе с мировыми новаторами

Присоединиться сейчас