Convolution

探索计算机视觉和深度学习中卷积的基础知识。了解内核和特征图如何为 Ultralytics YOLO26 的实时任务提供支持。

Convolution is a fundamental mathematical operation that serves as the core building block of modern computer vision (CV) and deep learning (DL) systems. In the context of image processing, convolution involves sliding a small filter—often called a kernel—across an input image to create a map of significant features. This process allows artificial intelligence (AI) models to automatically learn and identify patterns such as edges, textures, and shapes without human intervention. Unlike traditional machine learning (ML) which often requires manual feature extraction, convolution enables networks to build a hierarchical understanding of visual data, starting from simple lines and progressing to complex objects like faces or vehicles.

Link to this section卷积的工作原理#

该运算通过在输入数据上遍历过滤器，执行元素级乘法并将结果求和，从而为每个位置生成单个值。此输出被称为特征图。

卷积核： 这是一个包含数字（权重）的小矩阵，用于检测特定特征。例如，Sobel 算子是一种用于检测垂直或水平边缘的特定类型卷积核。
滑动窗口： 卷积核使用称为“步长”的预定义步幅在图像上移动。这种空间滤波过程保留了像素之间的关系，这对理解图像至关重要。
层级结构： 在像卷积神经网络 (CNNs) 这样的深度架构中，初始层捕捉低级细节，而更深层则将这些细节组合成高级概念。

Link to this section卷积与相关概念#

为了充分理解卷积，将其与神经网络 (NN) 文献中经常遇到的类似术语区分开来很有帮助：

互相关与卷积： 在数学上，真正的卷积涉及在应用之前翻转卷积核。然而，大多数深度学习框架（包括 PyTorch 库）实现的是互相关（滑动而不翻转），但将其标记为“卷积”，因为权重是在训练期间学习的，所以翻转区别对于性能而言无关紧要。
卷积与注意力机制： 虽然卷积是在局部（相邻像素）处理信息，但注意力机制允许模型同时关联图像的远处部分。现代架构如 YOLO26 通常利用高度优化的卷积层来维持实时推理速度，因为注意力层在计算上可能更重。

Link to this section实际应用#

卷积的效率使人工智能能够通过驱动强大的感知系统来彻底改变各个行业：

医疗诊断： 在医疗 AI 领域，卷积有助于分析高分辨率 MRI 扫描。通过使用专门用于突出异常的特定卷积核，模型可以以媲美人类专家的准确率检测出肿瘤或骨折的早期迹象。
自动驾驶： 自动驾驶车辆依靠卷积进行实时目标检测。当车辆移动时，卷积层处理视频流以即时识别行人、车道线和交通标志，这是汽车 AI 安全的关键组成部分。

Link to this section使用 Ultralytics 的 Python 示例#

你可以使用 Python 检查最先进模型中的卷积层。以下示例加载了 YOLO26 模型，并验证其初始层是否利用了标准卷积运算，该运算是通过 torch.nn 实现的。

import torch.nn as nn
from ultralytics import YOLO

# Load the latest YOLO26 model
model = YOLO("yolo26n.pt")

# Access the first layer of the model's backbone
first_layer = model.model.model[0]

# Verify it is a Convolutional layer
if isinstance(first_layer.conv, nn.Conv2d):
    print("Success: The first layer is a standard convolution.")
    print(f"Kernel size: {first_layer.conv.kernel_size}")