Yolo 비전 선전
선전
지금 참여하기
용어집

검색 증강 생성(RAG)

검색 증강 생성(RAG)이 정확하고 최신 응답을 위해 실시간의 신뢰할 수 있는 외부 데이터를 통합하여 AI 모델을 어떻게 향상시키는지 알아보세요.

Retrieval Augmented Generation (RAG) is an advanced technique in the field of artificial intelligence that optimizes the output of a Large Language Model (LLM) by referencing an authoritative knowledge base outside of its training data. Traditional generative models rely solely on static information learned during their initial training, which can lead to outdated answers or confident inaccuracies known as hallucinations. RAG bridges this gap by retrieving relevant, up-to-date information from external sources—such as company databases, current news, or technical manuals—and feeding it to the model as context before a response is generated. This process ensures that the AI's outputs are not only linguistically coherent but also factually accurate and grounded in specific data.

RAG 시스템의 작동 방식

The architecture of a RAG system typically involves two main phases: retrieval and generation. This workflow allows developers to maintain a foundation model without the expensive need for frequent retraining.

  1. Retrieval: When a user submits a query, the system first performs a semantic search across a specialized storage system called a vector database. This database contains data that has been converted into numerical representations known as embeddings, allowing the system to find conceptually similar information rather than just matching keywords.
  2. Generation: The relevant documents or data snippets found during retrieval are combined with the user's original question. This enriched prompt is then sent to the generative model. The model uses this provided context to synthesize an answer, ensuring the response relies on the retrieved facts. For a deeper dive into the mechanics, IBM provides a comprehensive guide on RAG workflows.

Visual RAG: Integrating Computer Vision

While RAG is traditionally text-based, the rise of multi-modal learning has introduced "Visual RAG." In this scenario, computer vision models act as the retrieval mechanism. They analyze images or video streams to extract structured textual data—such as object names, counts, or activities—which is then fed into an LLM to answer questions about the visual scene.

For example, a developer can use YOLO26 to detect objects in an image and pass that list of objects to a text model to generate a descriptive report.

from ultralytics import YOLO

# Load the YOLO26 model for state-of-the-art detection
model = YOLO("yolo26n.pt")

# Perform inference to 'retrieve' visual facts from an image
results = model("https://ultralytics.com/images/bus.jpg")

# Extract class names to build a text context for an LLM
detected_classes = [model.names[int(c)] for c in results[0].boxes.cls]
context_string = f"The scene contains: {', '.join(detected_classes)}."

print(context_string)
# Output example: "The scene contains: bus, person, person, person."

실제 애플리케이션

RAG is transforming industries by enabling AI agents to access proprietary or real-time data securely.

  • Enterprise Knowledge Bases: Companies use RAG to build internal chatbots that answer employee questions about HR policies or technical documentation. By connecting an LLM to a live document repository, the system avoids providing obsolete policy information. For more on enterprise implementations, see Google Cloud's overview of RAG in Vertex AI.
  • Clinical Decision Support: In AI in healthcare, RAG systems can retrieve patient history and recent medical research papers to assist doctors in diagnosis, ensuring the advice considers the very latest clinical studies.
  • Smart Retail Assistants: Applications using AI in retail leverage RAG to check live inventory databases. If a customer asks a chatbot, "Do you have these running shoes in size 10?", the model retrieves real-time stock levels before answering, preventing frustration over out-of-stock items.

RAG 대 미세 조정

It is crucial to distinguish RAG from fine-tuning, as they solve different problems.

  • RAG (Retrieval Augmented Generation): Best for accessing dynamic, frequently changing data (e.g., stock prices, news) or private data not present in the public training set. It focuses on providing new information at runtime.
  • Fine-Tuning: Best for adapting the model's behavior, style, or terminology. It involves updating the model weights on a specific dataset. While fine-tuning helps a model learn a specific language pattern (like medical jargon), it does not grant access to real-time facts. See OpenAI's guide on fine-tuning vs. RAG for decision-making frameworks.

관련 개념

  • LangChain: A popular open-source framework specifically designed to simplify the creation of RAG applications by chaining together retrievers and LLMs.
  • Knowledge Graph: A structured way of representing data that can be used as a retrieval source, offering more contextually rich relationships than simple vector similarity.
  • Prompt Engineering: The art of crafting inputs to guide the model. RAG is essentially an automated form of prompt engineering where the "prompt" is enriched with retrieved data programmatically.
  • Ultralytics Platform: While RAG handles the text generation side, platforms like this are essential for managing the data preprocessing and training of the vision models that feed visual data into multimodal RAG pipelines.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기