Glossary

Retrieval Augmented Generation (RAG)

Discover how Retrieval Augmented Generation (RAG) enhances AI models by integrating real-time, reliable external data for accurate, up-to-date responses.

Retrieval-Augmented Generation (RAG) is an advanced AI framework designed to improve the quality, accuracy, and relevance of responses generated by Large Language Models (LLMs). It works by connecting a generative model to an external, up-to-date knowledge base. This allows the model to "retrieve" relevant information before generating an answer, effectively grounding its output in verifiable facts and reducing the likelihood of hallucinations or outdated responses. RAG makes LLMs more reliable for knowledge-intensive tasks by giving them access to specialized or proprietary information they weren't trained on.

How Retrieval-Augmented Generation Works

The RAG process can be broken down into two main stages: retrieval and generation. This dual-stage approach combines the strengths of information retrieval systems and generative models.

  1. Retrieval: When a user provides a prompt or asks a question, the RAG system first uses the prompt to search a knowledge source for relevant information. This source is typically a vector database containing embeddings of documents, articles, or other data. The retriever component identifies and pulls the most relevant snippets of text or data based on the user's query. An optional but powerful step is to use a reranker to refine these retrieved results, ensuring only the most contextually important information is passed on.
  2. Augmented Generation: The retrieved information is then combined with the original user prompt. This new, enriched prompt is fed into the generative AI model (the LLM). The model uses this added context to formulate a comprehensive, accurate, and relevant response. Frameworks such as LangChain and LlamaIndex are commonly used to build and manage these complex RAG pipelines.

Applications and Examples

RAG is particularly useful in scenarios requiring factual accuracy and access to dynamic or specialized data.

  • Advanced Question-Answering Systems: A customer support chatbot can use RAG to access a company's entire knowledge base of product manuals, troubleshooting guides, and policy documents. When a customer asks, "What is the warranty policy for my product?", the system retrieves the latest warranty document and uses it to provide a precise, up-to-date answer, a significant improvement over generic responses.
  • Content Creation and Research: A financial analyst could use a RAG-powered tool to write a market summary. The tool could retrieve the latest financial reports, market news, and stock performance data from trusted sources like Bloomberg or Reuters. The LLM then synthesizes this information into a coherent report, complete with citations, vastly speeding up the research process.

RAG in Computer Vision

While RAG is predominantly used in Natural Language Processing (NLP), its core concept is being explored for computer vision (CV) tasks. For instance, a system could retrieve relevant visual information to guide image generation or analysis. This could involve finding similar images from a large dataset to improve the performance of an object detection model like Ultralytics YOLO. Managing these complex models and datasets is streamlined with platforms like Ultralytics HUB, which could serve as a foundation for future multi-modal model applications that use RAG. You can explore a related implementation in our blog on enhancing AI with RAG and computer vision.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard