Yolo 비전 선전
선전
지금 참여하기
용어집

합성곱

Explore the fundamentals of convolution in computer vision. Learn how kernels and feature maps power [YOLO26](https://docs.ultralytics.com/models/yolo26/) for real-time AI.

Convolution is a fundamental mathematical operation that serves as the core building block of modern computer vision (CV) and deep learning (DL) systems. In the context of image processing, convolution involves sliding a small filter—often called a kernel—across an input image to create a map of significant features. This process allows artificial intelligence (AI) models to automatically learn and identify patterns such as edges, textures, and shapes without human intervention. Unlike traditional machine learning (ML) which often requires manual feature extraction, convolution enables networks to build a hierarchical understanding of visual data, starting from simple lines and progressing to complex objects like faces or vehicles.

합성곱 작동 방식

The operation functions by passing a filter over the input data, performing element-wise multiplication, and summing the results to produce a single value for each position. This output is known as a feature map.

  • The Kernel: This is a small matrix of numbers (weights) that detects specific features. For example, a Sobel operator is a specific type of kernel used to detect vertical or horizontal edges.
  • Sliding Window: The kernel moves across the image using a defined step size called a "stride." This spatial filtering process preserves the relationship between pixels, which is crucial for understanding images.
  • Layer Hierarchy: In deep architectures like Convolutional Neural Networks (CNNs), the initial layers capture low-level details, while deeper layers combine these into high-level concepts.

Convolution vs. Related Concepts

To fully grasp convolution, it is helpful to distinguish it from similar terms often encountered in neural network (NN) literature:

  • Cross-Correlation vs. Convolution: Mathematically, true convolution involves flipping the kernel before applying it. However, most deep learning frameworks, including the PyTorch library, implement cross-correlation (sliding without flipping) but label it "convolution" because the weights are learned during training, making the flip distinction irrelevant for performance.
  • Convolution vs. Attention: While convolution processes information locally (neighboring pixels), the attention mechanism allows a model to relate distant parts of an image simultaneously. Modern architectures like YOLO26 often utilize highly optimized convolutional layers to maintain real-time inference speeds, as attention layers can be computationally heavier.

실제 애플리케이션

The efficiency of convolution has enabled AI to revolutionize various industries by powering robust perception systems:

  1. Medical Diagnostics: In the field of AI in Healthcare, convolution helps analyze high-resolution MRI scans. By using specific kernels designed to highlight anomalies, models can detect early signs of tumors or fractures with accuracy that rivals human experts.
  2. 자율 주행: 자율주행 차량은 실시간 물체 감지를 위해 컨볼루션에 의존합니다. 차량이 이동함에 따라 컨볼루션 레이어는 영상 데이터를 처리하여 보행자, 차선 표시, 교통 표지판을 즉시 식별합니다. 이는 자동차 안전을 위한 AI의 핵심 구성 요소입니다.

Ultralytics 사용한 Python

You can inspect convolutional layers within state-of-the-art models using Python. The following example loads the YOLO26 model and verifies that its initial layer utilizes a standard convolutional operation, which is implemented via torch.nn.

import torch.nn as nn
from ultralytics import YOLO

# Load the latest YOLO26 model
model = YOLO("yolo26n.pt")

# Access the first layer of the model's backbone
first_layer = model.model.model[0]

# Verify it is a Convolutional layer
if isinstance(first_layer.conv, nn.Conv2d):
    print("Success: The first layer is a standard convolution.")
    print(f"Kernel size: {first_layer.conv.kernel_size}")

엣지 AI에서 컨볼루션이 중요한 이유

Convolutional operations are highly optimizable, making them ideal for Edge AI deployments where computational resources are limited. Because the same kernel is shared across the entire image (parameter sharing), the model requires significantly less memory than older fully connected architectures. This efficiency allows advanced models to run on smartphones and IoT devices.

For teams looking to leverage these operations for custom datasets, the Ultralytics Platform provides a seamless environment to annotate images and train convolution-based models without managing complex infrastructure. By using transfer learning, you can fine-tune pre-trained convolutional weights to recognize new objects with minimal training data.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기