深圳Yolo 视觉
深圳
立即加入
词汇表

向量数据库

了解向量数据库如何通过为智能系统实现高效的相似性搜索、语义搜索和异常检测来革新 AI。

A vector database is a specialized storage system designed to manage, index, and query high-dimensional vector data, often commonly referred to as embeddings. Unlike a traditional relational database, which organizes structured data into rows and columns for exact keyword matching, a vector database is optimized for semantic retrieval. It enables intelligent systems to find data points that are conceptually similar rather than identical. This capability is fundamental to modern artificial intelligence (AI) infrastructure, allowing applications to process and understand unstructured data—such as images, audio, video, and text—by analyzing the mathematical relationships between them. These databases serve as the long-term memory for intelligent agents, facilitating tasks like visual search and personalized recommendations.

向量数据库的工作原理

向量数据库的核心功能基于向量空间的概念,其中数据项被映射为多维坐标系中的点。该过程始于特征提取阶段,此时深度学习(DL)模型将原始输入转换为数值向量。

  1. Ingestion: Data is processed by a neural network, such as the state-of-the-art YOLO26, to generate embeddings. These vectors compress the semantic meaning of the input into a dense list of floating-point numbers.
  2. 索引:为确保检索过程中的低推理延迟,数据库采用专用算法组织这些向量。诸如分层可导航小世界(HNSW) 或倒排文件索引(IVF)等技术,使系统能够高效遍历数十亿向量,而无需逐条扫描每个条目。
  3. Querying: When a user submits a search query (e.g., an image of a specific shoe style), the system converts the query into a vector and calculates its proximity to stored vectors using distance metrics like cosine similarity or Euclidean distance.
  4. 检索:数据库返回“最近邻”,这些结果代表了上下文相关性最高的匹配项。

以下Python 如何使用标准方法生成嵌入向量: ultralytics 模型, 这是填充向量数据库前的必要步骤。

from ultralytics import YOLO

# Load a pre-trained YOLO26 classification model
model = YOLO("yolo26n-cls.pt")

# Generate feature embeddings for an image file
# The 'embed' method creates the vector representation needed for the database
results = model.embed("https://ultralytics.com/images/bus.jpg")

# Output the shape of the resulting embedding vector
print(f"Embedding vector shape: {results[0].shape}")

实际应用

向量数据库是当今企业环境中众多先进计算机视觉(CV) 和自然语言处理(NLP)应用背后的核心引擎。

区分相关概念

要有效实施这些系统,有必要在机器学习运维(MLOps)领域中区分向量数据库与相关技术。

  • 向量数据库与向量搜索 向量搜索是查找相似向量的操作或算法过程(即“如何”实现)。 向量数据库则是为存储数据、管理索引并大规模执行搜索而构建的强大基础设施(即“何处”实现)。
  • 向量数据库与特征存储库 特征存储库是用于管理模型训练和推理所用特征的集中式存储库,确保特征的一致性。虽然它处理特征数据,但其主要优化方向并非基于相似度的检索查询——这正是向量数据库的核心定义。
  • 向量数据库与数据湖数据湖以原始格式存储海量原始数据。向量数据库则存储经过处理的数学表示(嵌入),这些表示专为相似性搜索进行了优化。

与现代人工智能工作流程集成

Implementing a vector database often involves a pipeline where models like the efficient YOLO26 act as the embedding engine. These models process visual data at the edge or in the cloud, and the resulting vectors are pushed to solutions like Pinecone, Milvus, or Qdrant.

For teams looking to streamline this entire lifecycle—from data curation and auto-annotation to model training and deployment—the Ultralytics Platform offers a comprehensive environment. By integrating model training with efficient deployment strategies, developers can ensure that the embeddings feeding their vector databases are accurate, resulting in higher quality search results and smarter AI agents.

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入