深圳Yolo 视觉
深圳
立即加入
词汇表

零样本学习

探索零点学习:一种尖端的人工智能方法,使模型能够对未见数据进行classify ,彻底改变物体检测、NLP 等技术。

Zero-Shot Learning (ZSL) is a machine learning paradigm that enables artificial intelligence models to recognize, classify, or detect objects they have never encountered during their training phase. In traditional supervised learning, a model requires thousands of labeled examples for every specific category it needs to identify. ZSL eliminates this strict dependency by leveraging auxiliary information—typically text descriptions, semantic attributes, or embeddings—to bridge the gap between seen and unseen classes. This capability allows artificial intelligence (AI) systems to be significantly more flexible, scalable, and capable of handling dynamic environments where collecting exhaustive data for every possible object is impractical.

零距离学习如何运作

The core mechanism of ZSL involves transferring knowledge from familiar concepts to unfamiliar ones using a shared semantic space. Instead of learning to recognize a "zebra" solely by memorizing pixel patterns of black and white stripes, the model learns the relationship between visual features and semantic attributes (e.g., "horse-like shape," "striped pattern," "four legs") derived from natural language processing (NLP).

This process often relies on multi-modal models that align image and text representations. For instance, foundational research like OpenAI's CLIP demonstrates how models can learn visual concepts from natural language supervision. When a ZSL model encounters an unseen object, it extracts the visual features and compares them against a dictionary of semantic vectors. If the visual features align with the semantic description of the new class, the model can correctly classify it, effectively performing a "zero-shot" prediction. This approach is fundamental to modern foundation models which generalize across vast arrays of tasks.

实际应用

零样本学习正通过使系统能够超越其初始训练数据进行泛化,推动各行业的创新发展。

  1. Open-Vocabulary Object Detection: Modern architectures like YOLO-World utilize ZSL to detect objects based on user-defined text prompts. This allows for object detection in scenarios where defining a fixed list of classes beforehand is impossible, such as searching for specific items in vast video archives. Researchers at Google Research continue to push the boundaries of these open-vocabulary capabilities.
  2. Medical Diagnostics: In AI in healthcare, obtaining labeled data for rare diseases is often difficult and expensive. ZSL models can be trained on common conditions and descriptions of rare symptoms from medical literature found in databases like PubMed, enabling the system to flag potential rare anomalies in medical imaging without requiring a massive dataset of positive cases.
  3. Wildlife Conservation: For AI in agriculture and ecology, identifying endangered species that are rarely photographed is critical. ZSL allows conservationists to detect these animals using attribute-based descriptions defined in biological databases like the Encyclopedia of Life.

利用Ultralytics进行零点检测

Ultralytics YOLO生动展现了零样本学习的应用实例。该模型支持用户在运行时动态定义自定义类别,无需重新训练模型。这一特性通过将强大的检测骨干网络与理解自然语言的文本编码器相结合实现。

The following Python example demonstrates how to use YOLO-World to detect objects that were not explicitly part of a standard training set using the ultralytics 包装

from ultralytics import YOLOWorld

# Load a pre-trained YOLO-World model capable of Zero-Shot Learning
model = YOLOWorld("yolov8s-world.pt")

# Define custom classes via text prompts (e.g., specific accessories)
# The model adjusts to detect these new classes without retraining
model.set_classes(["blue backpack", "red apple", "sunglasses"])

# Run inference on an image to detect the new zero-shot classes
results = model.predict("https://ultralytics.com/images/bus.jpg")

# Display the results
results[0].show()

与相关概念的区别

要充分理解 ZSL,最好将其与计算机视觉(CV)中使用的类似学习策略区分开来。 计算机视觉 (CV) 中使用的类似学习策略区分开来:

  • 少样本学习(FSL)虽然零样本学习(ZSL)无需目标类别的实例,但少样本学习仅为模型提供极小的支持集(通常为1至5个实例)进行适应。由于零样本学习完全依赖语义推理而非视觉实例,通常被认为更具挑战性。
  • 一次性学习单次学习 单次学习:FSL 的一个子集,即模型完全从一个标注示例中学习。ZSL 的本质区别在于,它的运行甚至不需要 甚至没有一张新类别的图像。
  • 迁移学习这一广义 这一广义术语是指将知识从一项任务转移到另一项任务。ZSL 是迁移学习的一种特殊类型,它 利用语义属性将知识迁移到未见过的类别,而无需在新数据上进行传统的 对新数据进行微调

挑战与未来展望

While ZSL offers immense potential, it faces challenges such as the domain shift problem, where the semantic attributes learned during training do not perfectly map to the visual appearance of unseen classes. Additionally, ZSL models can suffer from bias, where prediction accuracy is significantly higher for seen classes compared to unseen ones.

Research from organizations like Stanford University's AI Lab and the IEEE Computer Society continues to address these limitations. As computer vision tools become more robust, ZSL is expected to become a standard feature, reducing the reliance on massive data labeling efforts. For teams looking to manage datasets efficiently before deploying advanced models, the Ultralytics Platform offers comprehensive tools for annotation and dataset management.

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入