Glosario

Arquitecturas de detección de objetos

Descubre el poder de las arquitecturas de detección de objetos, la columna vertebral de la IA para la comprensión de imágenes. ¡Aprende tipos, herramientas y aplicaciones reales hoy mismo!

Entrena los modelos YOLO simplemente
con Ultralytics HUB

Saber más

Object detection architectures are the fundamental structures underpinning how artificial intelligence (AI) systems interpret visual information. These specialized neural networks are designed not just to classify objects within an image (identifying what is present) but also to precisely locate them, typically by drawing bounding boxes around each detected instance. For those familiar with basic machine learning (ML) concepts, understanding these architectures is crucial for leveraging the capabilities of modern computer vision (CV). They form the backbone of systems that enable machines to "see" and understand the world in a way similar to humans.

Componentes básicos

Most object detection architectures consist of several key components working together. A backbone network, often a Convolutional Neural Network (CNN), performs initial feature extraction from the input image, identifying low-level patterns like edges and textures, and progressively more complex features. A "neck" component often follows, aggregating features from different stages of the backbone to create richer representations suitable for detecting objects at various scales, a concept detailed in resources like the Feature Pyramid Network paper. Finally, the detection head uses these features to predict the class and location (bounding box coordinates) of objects. Performance is often measured using metrics like Intersection over Union (IoU) to assess localization accuracy and mean Average Precision (mAP) for overall detection quality, with detailed explanations available on sites like the COCO dataset evaluation page.

Tipos de arquitecturas

Las arquitecturas de detección de objetos se clasifican a grandes rasgos en función de su enfoque:

Distinción de términos similares

Es importante diferenciar las arquitecturas de detección de objetos de las tareas de visión por ordenador relacionadas:

  • Image Classification: Assigns a single label to an entire image (e.g., "cat," "dog"). It identifies what is in the image globally but not where specific objects are located. See the Ultralytics classification task documentation for examples.
  • Semantic Segmentation: Classifies each pixel in an image into a predefined category (e.g., all pixels belonging to cars are labeled "car"). It provides dense prediction but doesn't distinguish between different instances of the same object class.
  • Instance Segmentation: Goes a step further than semantic segmentation by classifying each pixel and differentiating between individual object instances (e.g., labeling "car 1," "car 2"). It combines object detection and semantic segmentation. Check the Ultralytics segmentation task documentation for more details.

Aplicaciones en el mundo real

Las arquitecturas de detección de objetos impulsan numerosas aplicaciones de IA en diversos sectores:

Herramientas y tecnologías

Desarrollar y desplegar modelos basados en estas arquitecturas suele implicar herramientas y marcos especializados:

Leer todo