Discover how One-Shot Learning enables AI to recognize objects from a single example. Learn about Siamese networks, embeddings, and real-world apps using YOLO26.
One-Shot Learning is a specialized classification technique in machine learning (ML) designed to learn information about object categories from a single training example. Unlike traditional deep learning (DL) algorithms, which require massive datasets containing thousands of annotated images to generalize effectively, One-Shot Learning mimics the human cognitive ability to grasp a new concept instantly. For instance, a person can usually recognize a specific exotic bird after seeing it just once; this methodology attempts to replicate that efficiency in artificial intelligence (AI) systems. It is particularly valuable in scenarios where data labeling is expensive, data is scarce, or new categories must be added dynamically without retraining the entire model.
The core principle of One-Shot Learning involves shifting the objective from standard classification to similarity evaluation. Instead of training a neural network (NN) to output a specific class label (e.g., "dog" or "cat"), the model learns a distance function. A common architecture employed for this is the Siamese neural network, which consists of two identical sub-networks that share the same model weights.
During operation, the network performs feature extraction to convert input images into compact numerical vectors known as embeddings. The system then compares the embedding of a new query image against the embedding of the single reference "shot." If the mathematical distance—often calculated using Euclidean distance or cosine similarity—is below a certain threshold, the images are determined to belong to the same class. This allows the model to verify identity or classify objects based on their proximity in the learned feature space.
The following Python code demonstrates how to extract embeddings and calculate similarity using a
YOLO26 の分類モデルである。
ultralytics パッケージで提供される。
import numpy as np
from ultralytics import YOLO
# Load a pre-trained YOLO26 classification model for feature extraction
model = YOLO("yolo26n-cls.pt")
# Extract embeddings for a reference 'shot' and a query image
# The embed() method returns the feature vector directly
shot_vec = model.embed("reference_img.jpg")[0]
query_vec = model.embed("query_img.jpg")[0]
# Calculate similarity (higher dot product implies greater similarity)
similarity = np.dot(shot_vec, query_vec) / (np.linalg.norm(shot_vec) * np.linalg.norm(query_vec))
print(f"Similarity Score: {similarity:.4f}")
It is important to differentiate One-Shot Learning from other data-efficient learning techniques, as they solve similar problems through different constraints:
ワンショット学習は、膨大な量の学習データを収集することが現実的でない分野において、新たな可能性を切り開いた。
The most ubiquitous application of One-Shot Learning is in biometric security. When setting up Face ID on a smartphone or enrolling in an employee access system, the device captures a single mathematical representation of the user's face. During daily use, the facial recognition system compares the live camera feed against this stored "one shot" to verify identity. This relies on robust embedding techniques, such as those discussed in the foundational FaceNet research, to ensure that changes in lighting or angle do not break the similarity match.
In AI in manufacturing, creating a balanced dataset of "defective" parts is difficult because defects are rare and inconsistent. One-Shot Learning allows computer vision (CV) systems to learn the representation of a single "perfect" reference part. Any item on the assembly line that yields an embedding significantly distant from this reference is flagged for anomaly detection. This enables immediate quality assurance without needing thousands of images of broken parts, which can be managed and deployed via the Ultralytics Platform.
While powerful, One-Shot Learning is susceptible to noise; if the single reference image is blurry, obstructed, or unrepresentative, the model's ability to recognize that class degrades significantly. Researchers often employ meta-learning, or "learning to learn," to improve model stability and generalization. As architectures evolve, newer models like YOLO26 are incorporating more robust feature extractors that make one-shot inference faster and more accurate, paving the way for more adaptive and intelligent edge AI devices.