Glossary

Embeddings

Learn what embeddings are and how they power AI by capturing semantic relationships in data for NLP, recommendations, and computer vision.

Embeddings are a cornerstone of modern machine learning (ML), representing a powerful method for converting high-dimensional data like words, images, or even users into meaningful, dense, and low-dimensional numerical vectors. The primary goal of an embedding is to capture the semantic relationships and underlying context of the original data. In this vector space, items with similar meanings or characteristics are positioned closer to each other. This allows AI models to perform complex reasoning and similarity tasks that would be impossible with raw, unstructured data.

How Embeddings Are Created

Embeddings are typically learned automatically by a deep learning model during the training process. A neural network, often built with frameworks like PyTorch or TensorFlow, is trained on a relevant task, such as predicting the next word in a sentence or classifying an image. One of the hidden layers within this network is then used as the embedding layer. As the model learns to perform its task, it adjusts the weights in this layer, effectively learning to map each input item to a vector that encapsulates its most important features. This process is a form of dimensionality reduction, compressing vast amounts of information into a compact and useful format.

Applications and Examples

Embeddings are fundamental to a wide range of AI applications, from natural language processing (NLP) to computer vision.

  • E-commerce Recommendation Engines: Recommendation systems use embeddings to represent both users and products. If a user frequently purchases or views items with similar embeddings (e.g., various types of running gear), the system can identify other products in that vector neighborhood (like energy gels or hydration packs) and recommend them. This is far more effective than simple keyword matching.
  • Semantic Search and Image Retrieval: Instead of relying on tags or metadata, semantic search systems use embeddings to find results based on conceptual meaning. A user can search for "summer vacation photos," and the system will retrieve images of beaches, mountains, and travel scenes, even if those exact words aren't in the image's description. This is powered by models like CLIP, which generate aligned embeddings for both text and images, enabling powerful multi-modal model capabilities. This same principle allows for powerful visual search, a key feature in many modern applications. You can even build your own with our similarity search guide.

Other applications include drug discovery, where molecules are embedded to predict interactions, and music streaming services that recommend songs with similar audio features.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard