Yolo 비전 선전
선전
지금 참여하기
용어집

대조 학습

Explore how contrastive learning enables AI to understand data by comparing samples. Learn about self-supervised features and train models on the Ultralytics Platform.

Contrastive learning is a machine learning paradigm that teaches models to understand data by comparing similar and dissimilar samples. Unlike traditional supervised learning, which relies heavily on manually labeled datasets, contrastive learning is often used in self-supervised learning contexts. The core idea is simple yet powerful: the model learns to pull representations of related items (positive pairs) closer together in a vector space while pushing unrelated items (negative pairs) farther apart. This process allows algorithms to build robust, generalizable features from vast amounts of unlabeled data, which is crucial for scaling artificial intelligence (AI) systems.

The Mechanism of Contrastive Learning

At the heart of contrastive learning is the concept of learning by comparison. Instead of memorizing that a specific image is a "cat," the model learns that two different photos of a cat are more similar to each other than either is to a photo of a dog. This is typically achieved through data augmentation. An input image, often called the "anchor," is transformed into two different versions using techniques like cropping, flipping, or color jittering. These two versions form a positive pair. The model is then trained to minimize the distance between their embeddings while maximizing the distance to other random images (negative samples) in the batch.

This approach helps the neural network focus on high-level semantic features rather than low-level pixel details. For instance, whether a car is red or blue, or facing left or right, the underlying concept of "car" remains the same. By ignoring these superficial variations, the model develops a deeper understanding of the visual world, which significantly benefits downstream tasks like object detection and classification.

실제 애플리케이션

Contrastive learning has become a cornerstone for many state-of-the-art AI applications, particularly where labeled data is scarce or expensive to obtain.

  1. Zero-Shot Image Classification: Models like CLIP (Contrastive Language-Image Pre-training) use contrastive learning to align images and text in a shared feature space. By training on millions of image-text pairs, the model learns to associate visual concepts with natural language descriptions. This enables zero-shot learning, where the model can classify images into categories it has never seen during training simply by matching the image to a textual prompt.
  2. Robust Pre-training for Medical Imaging: In healthcare, obtaining expert-labeled medical scans is costly and time-consuming. Researchers use contrastive learning to pre-train models on large databases of unlabeled X-rays or MRI scans. This unsupervised pre-training creates a powerful backbone that can be fine-tuned with a small number of labeled examples to detect diseases like pneumonia or tumors with high accuracy. This technique leverages transfer learning to improve diagnostic tools in AI in healthcare.

관련 개념 구분하기

It is helpful to differentiate contrastive learning from similar techniques to understand its unique role in the machine learning (ML) landscape.

  • vs. Autoencoders: While both are unsupervised methods, autoencoders aim to reconstruct the input data pixel-by-pixel, compressing it into a bottleneck layer. Contrastive learning, on the other hand, does not try to recreate the image but focuses solely on learning discriminative representations that separate different concepts.
  • vs. Generative Adversarial Networks (GANs): GANs involve a generator creating fake data and a discriminator trying to detect it. Contrastive learning focuses on representation learning rather than data generation, making it more suitable for tasks like search, retrieval, and classification.
  • vs. Triplet Loss: Traditional triplet loss explicitly requires an anchor, a positive, and a negative sample. Modern contrastive methods, such as SimCLR or MoCo, generalize this by comparing an anchor against many negative samples simultaneously within a batch, often using a specific loss function like InfoNCE.

Practical Example with Embeddings

While training a contrastive model from scratch is resource-intensive, you can easily use pre-trained models to extract features. The following example demonstrates how to load a model and extract the feature vector (embedding) for an image using the ultralytics package. This embedding represents the semantic content learned via techniques akin to contrastive pre-training.

from ultralytics import YOLO

# Load a pre-trained YOLO26 classification model
model = YOLO("yolo26n-cls.pt")

# Run inference on an image to get the results
# The 'embed' argument can be used in advanced workflows to extract feature layers
results = model("https://ultralytics.com/images/bus.jpg")

# Access the top predicted class probability
# This prediction is based on the learned feature representations
print(f"Top class: {results[0].names[results[0].probs.top1]}")
print(f"Confidence: {results[0].probs.top1conf:.4f}")

This ability to extract rich, meaningful features makes contrastive learning essential for building modern computer vision (CV) systems, enabling efficient image search and advanced analytics. For managing datasets and training custom models that benefit from these advanced architectures, the Ultralytics Platform provides a streamlined environment for deployment and monitoring.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기