Glossary

Retrieval Augmented Generation (RAG)

Discover how Retrieval Augmented Generation (RAG) enhances AI models by integrating real-time, reliable external data for accurate, up-to-date responses.

Retrieval-Augmented Generation (RAG) is an advanced AI framework designed to improve the quality, accuracy, and relevance of responses generated by Large Language Models (LLMs). It works by connecting a generative model to an external, up-to-date knowledge base. This allows the model to "retrieve" relevant information before generating an answer, effectively grounding its output in verifiable facts and reducing the likelihood of hallucinations or outdated responses. RAG makes LLMs more reliable for knowledge-intensive tasks by giving them access to specialized or proprietary information they weren't trained on.

How Retrieval-Augmented Generation Works

The RAG process can be broken down into two main stages: retrieval and generation. This dual-stage approach combines the strengths of information retrieval systems and generative models.

Retrieval: When a user provides a prompt or asks a question, the RAG system first uses the prompt to search a knowledge source for relevant information. This source is typically a vector database containing embeddings of documents, articles, or other data. The retriever component identifies and pulls the most relevant snippets of text or data based on the user's query. An optional but powerful step is to use a reranker to refine these retrieved results, ensuring only the most contextually important information is passed on.
Augmented Generation: The retrieved information is then combined with the original user prompt. This new, enriched prompt is fed into the generative AI model (the LLM). The model uses this added context to formulate a comprehensive, accurate, and relevant response. Frameworks such as LangChain and LlamaIndex are commonly used to build and manage these complex RAG pipelines.

Applications and Examples

RAG is particularly useful in scenarios requiring factual accuracy and access to dynamic or specialized data.

Advanced Question-Answering Systems: A customer support chatbot can use RAG to access a company's entire knowledge base of product manuals, troubleshooting guides, and policy documents. When a customer asks, "What is the warranty policy for my product?", the system retrieves the latest warranty document and uses it to provide a precise, up-to-date answer, a significant improvement over generic responses.
Content Creation and Research: A financial analyst could use a RAG-powered tool to write a market summary. The tool could retrieve the latest financial reports, market news, and stock performance data from trusted sources like Bloomberg or Reuters. The LLM then synthesizes this information into a coherent report, complete with citations, vastly speeding up the research process.

RAG vs. Related Concepts

It is helpful to distinguish RAG from other methods used to enhance LLM performance:

Fine-tuning: Fine-tuning adapts a pre-trained model by continuing training on a smaller, specialized dataset, which modifies the model's internal weights. Unlike RAG, it does not consult external data during inference. Fine-tuning is ideal for teaching a model a new style or skill, while RAG is better for incorporating factual knowledge. These approaches can also be complementary.
Prompt Engineering: This is the manual process of carefully designing prompts to get the desired output from an LLM. RAG automates a part of this by programmatically adding ("augmenting") the prompt with retrieved data, rather than relying on a human to manually provide all the context.
Prompt Enrichment: While similar to RAG, prompt enrichment is a broader term. It might involve adding context from user history or conversation flow. RAG is a specific type of enrichment focused on retrieving factual information from an external knowledge base to ground the model's response.

RAG in Computer Vision

While RAG is predominantly used in Natural Language Processing (NLP), its core concept is being explored for computer vision (CV) tasks. For instance, a system could retrieve relevant visual information to guide image generation or analysis. This could involve finding similar images from a large dataset to improve the performance of an object detection model like Ultralytics YOLO. Managing these complex models and datasets is streamlined with platforms like Ultralytics HUB, which could serve as a foundation for future multi-modal model applications that use RAG. You can explore a related implementation in our blog on enhancing AI with RAG and computer vision.

Retrieval Augmented Generation (RAG)

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

How Retrieval-Augmented Generation Works

Applications and Examples

RAG vs. Related Concepts

RAG in Computer Vision

Read more in this category

Deploy Ultralytics YOLO models using the ExecuTorch integration

Key highlights from Ultralytics at PyTorch Conference 2025

Using self-supervised learning to denoise images

Join the Ultralytics community