深圳Yolo 视觉
深圳
立即加入
词汇表

计算机视觉 (CV)

利用计算机视觉释放 AI 的潜力!探索它在物体检测、医疗保健、自动驾驶汽车等领域的应用。立即了解更多!

Computer Vision (CV) is a sophisticated field of Artificial Intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs. While human vision has the innate ability to perceive and understand surroundings instantly, computers must be trained to recognize patterns and interpret pixels. By leveraging Machine Learning (ML) and specifically Deep Learning (DL) algorithms, CV systems can take visual data, process it, and make recommendations or take actions based on that information.

计算机视觉的工作原理

At its core, a computer sees an image as an array of numerical values representing pixels. Modern CV relies heavily on Convolutional Neural Networks (CNNs), which are designed to mimic the connectivity pattern of neurons in the human brain. These networks learn to identify a hierarchy of features—from simple edges and textures to complex shapes and objects—through a process called feature extraction.

To function effectively, these models require vast amounts of training data. For example, to recognize a car, a model needs to process thousands of labeled images of cars in various conditions. Tools like the Ultralytics Platform streamline this workflow, allowing users to annotate datasets, train models in the cloud, and deploy them efficiently.

计算机视觉的核心任务

Computer vision is not a single function but a collection of distinct tasks, each solving a specific problem:

  • Image Classification: This task assigns a class label to an entire image, answering the question, "What is in this picture?" (e.g., distinguishing between a cat and a dog).
  • Object Detection: Going a step further, detection identifies distinct objects within an image and draws a bounding box around them. This is crucial for counting items or locating specific features.
  • Instance Segmentation: This provides a precise pixel-level mask for each detected object, separating individual instances of the same class. It is vital for applications requiring high precision, such as analyzing medical images.
  • Pose Estimation: This involves detecting specific keypoints on an object, such as the joints of a human body, to track movement and posture.

实际应用

The utility of computer vision spans across virtually every industry, automating tasks that previously required human eyes.

  • Manufacturing and Quality Control: In industrial settings, CV is often referred to as Machine Vision. It is used to automate quality inspection, detecting minute defects in products on an assembly line faster and more accurately than human inspectors. For instance, AI in Manufacturing allows for real-time monitoring of equipment to prevent failures.
  • Autonomous Transportation: Self-driving cars rely entirely on CV to navigate safely. By processing input from cameras and LiDAR sensors, these vehicles perform 3D Object Detection to identify pedestrians, other vehicles, and traffic signs in real-time. This is a critical component of achieving high levels of vehicle automation.
  • Healthcare and Diagnostics: Radiologists use CV to assist in identifying anomalies in X-rays, MRIs, and CT scans. AI in Healthcare helps in early disease detection, such as identifying tumors, by highlighting regions of interest that might be missed by the naked eye.

计算机视觉与图像处理

It is important to distinguish CV from Image Processing, though they often work together.

  • Image Processing involves manipulating an image to enhance it or extract information (e.g., adjusting brightness, contrast, or applying filters like those in Adobe Photoshop). The output is usually another image.
  • Computer Vision takes an image as input and outputs information or an interpretation (e.g., "There are three people in this room"). CV uses image processing techniques to prepare images for analysis by Neural Networks.

Implementing Computer Vision with Python

Modern libraries have made implementing powerful CV models accessible. The example below demonstrates how to load the state-of-the-art YOLO26 model to detect objects in an image using the ultralytics 包装

from ultralytics import YOLO

# Load the YOLO26n model (nano version for speed)
model = YOLO("yolo26n.pt")

# Run inference on a standard example image
# The model identifies objects and their locations
results = model("https://ultralytics.com/images/bus.jpg")

# Display the resulting image with bounding boxes
results[0].show()

This simple script utilizes a pre-trained model to perform complex inference tasks, demonstrating the accessibility of modern AI tools. For developers looking to move beyond static images, CV also powers Video Understanding and real-time tracking systems used in security and sports analytics. By integrating with libraries like OpenCV, developers can build comprehensive applications that capture, process, and analyze the visual world.

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入