Retrieval Augmented Generation (RAG)
Khám phá cách Retrieval Augmented Generation (RAG) tăng cường các mô hình AI bằng cách tích hợp dữ liệu bên ngoài theo thời gian thực, đáng tin cậy để có các phản hồi chính xác và cập nhật.
Retrieval Augmented Generation (RAG) is an advanced technique in the field of artificial intelligence that optimizes
the output of a
Large Language Model (LLM) by referencing
an authoritative knowledge base outside of its training data. Traditional generative models rely solely on static
information learned during their initial training, which can lead to outdated answers or confident inaccuracies known
as hallucinations. RAG bridges this gap by
retrieving relevant, up-to-date information from external sources—such as company databases, current news, or
technical manuals—and feeding it to the model as context before a response is generated. This process ensures that the
AI's outputs are not only linguistically coherent but also factually accurate and grounded in specific data.
Hệ thống RAG hoạt động như thế nào?
The architecture of a RAG system typically involves two main phases: retrieval and generation. This workflow allows
developers to maintain a foundation model without
the expensive need for frequent retraining.
-
Retrieval: When a user submits a query, the system first performs a
semantic search across a specialized storage
system called a vector database. This database
contains data that has been converted into numerical representations known as
embeddings, allowing the system to find conceptually
similar information rather than just matching keywords.
-
Generation: The relevant documents or data snippets found during retrieval are combined with the
user's original question. This enriched prompt is then sent to the generative model. The model uses this provided
context to synthesize an answer, ensuring the response relies on the retrieved facts. For a deeper dive into the
mechanics,
IBM provides a comprehensive guide on RAG workflows.
Visual RAG: Integrating Computer Vision
While RAG is traditionally text-based, the rise of
multi-modal learning has introduced
"Visual RAG." In this scenario,
computer vision models act as the retrieval
mechanism. They analyze images or video streams to extract structured textual data—such as object names, counts, or
activities—which is then fed into an LLM to answer questions about the visual scene.
For example, a developer can use YOLO26 to detect objects in
an image and pass that list of objects to a text model to generate a descriptive report.
from ultralytics import YOLO
# Load the YOLO26 model for state-of-the-art detection
model = YOLO("yolo26n.pt")
# Perform inference to 'retrieve' visual facts from an image
results = model("https://ultralytics.com/images/bus.jpg")
# Extract class names to build a text context for an LLM
detected_classes = [model.names[int(c)] for c in results[0].boxes.cls]
context_string = f"The scene contains: {', '.join(detected_classes)}."
print(context_string)
# Output example: "The scene contains: bus, person, person, person."
Các Ứng dụng Thực tế
RAG is transforming industries by enabling AI agents to
access proprietary or real-time data securely.
-
Enterprise Knowledge Bases: Companies use RAG to build internal chatbots that answer employee
questions about HR policies or technical documentation. By connecting an LLM to a live document repository, the
system avoids providing obsolete policy information. For more on enterprise implementations, see
Google Cloud's overview of RAG in Vertex AI.
-
Clinical Decision Support: In
AI in healthcare, RAG systems can retrieve
patient history and recent medical research papers to assist doctors in diagnosis, ensuring the advice considers the
very latest clinical studies.
-
Smart Retail Assistants: Applications using
AI in retail leverage RAG to check live inventory
databases. If a customer asks a chatbot, "Do you have these running shoes in size 10?", the model
retrieves real-time stock levels before answering, preventing frustration over out-of-stock items.
RAG so với Tinh chỉnh
It is crucial to distinguish RAG from fine-tuning, as
they solve different problems.
-
RAG (Retrieval Augmented Generation): Best for accessing dynamic, frequently changing data (e.g.,
stock prices, news) or private data not present in the public training set. It focuses on providing
new information at runtime.
-
Fine-Tuning: Best for adapting the model's behavior, style, or terminology. It involves updating
the model weights on a specific dataset. While
fine-tuning helps a model learn a specific language pattern (like medical jargon), it does not grant access to
real-time facts. See
OpenAI's guide on fine-tuning vs. RAG for
decision-making frameworks.
Các Khái Niệm Liên Quan
-
LangChain: A popular open-source
framework specifically designed to simplify the creation of RAG applications by chaining together retrievers and
LLMs.
-
Knowledge Graph: A structured
way of representing data that can be used as a retrieval source, offering more contextually rich relationships than
simple vector similarity.
-
Prompt Engineering: The art
of crafting inputs to guide the model. RAG is essentially an automated form of prompt engineering where the
"prompt" is enriched with retrieved data programmatically.
-
Ultralytics Platform: While RAG handles the text
generation side, platforms like this are essential for managing the
data preprocessing and training of the vision
models that feed visual data into multimodal RAG pipelines.