Glossary

Computer Vision (CV)

Unlock AI's potential with Computer Vision! Explore its role in object detection, healthcare, self-driving cars, and beyond. Learn more now!

Computer Vision (CV) is a field of artificial intelligence (AI) that trains computers to interpret and understand the visual world. Using digital images from cameras, videos, and deep learning models, machines can accurately identify and classify objects and then react to what they "see." The goal is to enable computers to replicate human vision, a task that involves processing and analyzing vast amounts of visual data to make sense of it. As a field, it has grown rapidly thanks to advances in deep learning and the availability of large datasets.

How Computer Vision Works

Computer vision works by applying machine learning (ML) algorithms to visual data. Instead of being explicitly programmed to recognize an object, a CV model learns to identify patterns from thousands or millions of labeled images. For instance, to train a model to recognize cats, it would be fed countless images of cats until it can learn to distinguish the features of a cat on its own.

Modern CV heavily relies on deep learning models, particularly Convolutional Neural Networks (CNNs). A CNN is a type of neural network that is highly effective at processing image data. It works by applying filters (or kernels) to an image to create feature maps that highlight important characteristics like edges, textures, and shapes. These networks power many common computer vision tasks, enabling machines to analyze visual information with increasing accuracy.

Computer Vision vs. Image Processing

While closely related, computer vision and image processing are not the same. Image processing is a subset of CV that focuses on manipulating digital images to enhance them or extract useful information. It involves operations like sharpening, blurring, or filtering an image. In contrast, computer vision goes a step further by aiming to interpret and understand the content of the image. For example, image processing might be used to improve the quality of a photo, while computer vision would be used to identify the people, objects, and scene within that photo. You can learn more about the distinction in this detailed overview of digital image processing.

Key Tasks in Computer Vision

Computer vision encompasses several key tasks that allow machines to analyze and interpret visual data:

Object Detection: This involves identifying and locating objects within an image or video. A model like Ultralytics YOLO draws a bounding box around each detected object and assigns it a class label.
Image Classification: This task involves assigning a single label to an entire image from a predefined set of categories. For example, classifying an image as containing a "cat" or a "dog."
Image Segmentation: Unlike object detection, segmentation classifies each pixel in an image. It provides a much more detailed understanding of the image's content. Sub-tasks include instance segmentation and semantic segmentation.
Pose Estimation: This is used to determine the position and orientation of a person or object in space. It is widely used in robotics, augmented reality, and human activity analysis.
Object Tracking: This task involves following one or more objects over time in a video sequence. It is crucial for applications like surveillance and autonomous navigation.

Real-World Applications

Computer vision applications are increasingly prevalent across various sectors:

Autonomous Vehicles: CV is critical for self-driving cars, enabling them to perceive their surroundings, detect pedestrians and other vehicles, read traffic signs, and navigate safely. Companies like Waymo and Tesla heavily rely on CV systems. Explore more about AI in Automotive solutions.
Healthcare: In medical image analysis, CV helps radiologists detect anomalies like tumors or fractures in X-rays, CT scans, and MRIs. It's also used in robotic surgery and patient monitoring. Read more about its impact in Radiology: Artificial Intelligence. You can also discover how YOLO11 is used for tumor detection.
Security and Surveillance: CV powers automated monitoring systems for detecting intrusions, tracking individuals, and analyzing crowd behavior. See our guide on how to build a security alarm system.
Retail: Applications include inventory management via shelf monitoring, customer behavior analysis, and cashier-less checkout systems like those from Amazon Go.
Manufacturing: CV is used for quality control, defect detection, assembly line monitoring, and robotics automation. Learn about making smart manufacturing solutions with YOLO11.
Agriculture: This technology enables precision farming through crop monitoring, disease detection, weed identification, and automated harvesting. Read about real-time crop health monitoring.

Tools and Frameworks

Developing and deploying computer vision models is made easier by various tools and frameworks. Libraries like PyTorch (visit the PyTorch official site) and TensorFlow (visit the TensorFlow official site) are foundational for building models. Open-source libraries like OpenCV provide a vast collection of functions for real-time computer vision.

Platforms such as Ultralytics HUB streamline the entire lifecycle of a CV project, from managing datasets and training custom models to deployment. The use of standardized formats like ONNX also helps ensure interoperability between different frameworks. As these technologies mature, they will continue to drive innovation across industries.

Computer Vision (CV)

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

How Computer Vision Works

Computer Vision vs. Image Processing

Key Tasks in Computer Vision

Real-World Applications

Tools and Frameworks

Read more in this category

Industrial Internet of things (IIoT) explained

Key highlights from Ultralytics at WAIC 2025 in Shanghai

How is tea made using technologies like Vision AI?

Join the Ultralytics community