Glossary

Computer Vision (CV)

Unlock AI's potential with Computer Vision! Explore its role in object detection, healthcare, self-driving cars, and beyond. Learn more now!

Computer Vision (CV) is a field of artificial intelligence (AI) that trains computers to interpret and understand the visual world. Using digital images from cameras, videos, and deep learning models, machines can accurately identify and classify objects and then react to what they "see." The goal is to enable computers to replicate human vision, a task that involves processing and analyzing vast amounts of visual data to make sense of it. As a field, it has grown rapidly thanks to advances in deep learning and the availability of large datasets.

How Computer Vision Works

Computer vision works by applying machine learning (ML) algorithms to visual data. Instead of being explicitly programmed to recognize an object, a CV model learns to identify patterns from thousands or millions of labeled images. For instance, to train a model to recognize cats, it would be fed countless images of cats until it can learn to distinguish the features of a cat on its own.

Modern CV heavily relies on deep learning models, particularly Convolutional Neural Networks (CNNs). A CNN is a type of neural network that is highly effective at processing image data. It works by applying filters (or kernels) to an image to create feature maps that highlight important characteristics like edges, textures, and shapes. These networks power many common computer vision tasks, enabling machines to analyze visual information with increasing accuracy.

Computer Vision vs. Image Processing

While closely related, computer vision and image processing are not the same. Image processing is a subset of CV that focuses on manipulating digital images to enhance them or extract useful information. It involves operations like sharpening, blurring, or filtering an image. In contrast, computer vision goes a step further by aiming to interpret and understand the content of the image. For example, image processing might be used to improve the quality of a photo, while computer vision would be used to identify the people, objects, and scene within that photo. You can learn more about the distinction in this detailed overview of digital image processing.

Key Tasks in Computer Vision

Computer vision encompasses several key tasks that allow machines to analyze and interpret visual data:

  • Object Detection: This involves identifying and locating objects within an image or video. A model like Ultralytics YOLO draws a bounding box around each detected object and assigns it a class label.
  • Image Classification: This task involves assigning a single label to an entire image from a predefined set of categories. For example, classifying an image as containing a "cat" or a "dog."
  • Image Segmentation: Unlike object detection, segmentation classifies each pixel in an image. It provides a much more detailed understanding of the image's content. Sub-tasks include instance segmentation and semantic segmentation.
  • Pose Estimation: This is used to determine the position and orientation of a person or object in space. It is widely used in robotics, augmented reality, and human activity analysis.
  • Object Tracking: This task involves following one or more objects over time in a video sequence. It is crucial for applications like surveillance and autonomous navigation.

Real-World Applications

Computer vision applications are increasingly prevalent across various sectors:

Tools and Frameworks

Developing and deploying computer vision models is made easier by various tools and frameworks. Libraries like PyTorch (visit the PyTorch official site) and TensorFlow (visit the TensorFlow official site) are foundational for building models. Open-source libraries like OpenCV provide a vast collection of functions for real-time computer vision.

Platforms such as Ultralytics HUB streamline the entire lifecycle of a CV project, from managing datasets and training custom models to deployment. The use of standardized formats like ONNX also helps ensure interoperability between different frameworks. As these technologies mature, they will continue to drive innovation across industries.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard