Glossary

Hallucination (in LLMs)

Explore the causes and risks of AI hallucinations in LLMs. Learn how to mitigate factual errors using RAG, RLHF, and grounding with Ultralytics YOLO26.

In the realm of Artificial Intelligence (AI), a hallucination refers to a phenomenon where a Large Language Model (LLM) generates content that is confident and syntactically correct but factually inaccurate, nonsensical, or unfaithful to the source input. Unlike standard software errors that might produce a crash or a visible glitch, a hallucinating model behaves like a convincing fabricator, presenting false information with the same authority as valid facts. This poses significant challenges for organizations deploying Generative AI in sensitive fields like healthcare, law, and finance, where data integrity is paramount.

Why Do Hallucinations Occur?

To understand why models hallucinate, it is helpful to look at how they are built. LLMs are typically based on the Transformer architecture, which functions as a sophisticated prediction engine. Instead of querying a structured database of verified facts, the model predicts the next token in a sequence based on statistical probabilities derived from its training data.

Several factors drive this behavior:

Probabilistic Guessing: The model prioritizes fluency and coherence over factual truth. If a specific sequence of words is statistically probable—even if factually wrong—the model may generate it. This concept is often discussed in research regarding stochastic parrots, where models mimic language patterns without understanding meaning.
Data Quality Issues: If the massive corpus of text used for training contains contradictions, outdated information, or fiction, the model may reproduce these inaccuracies.
Source Amnesia: LLMs compress vast amounts of information into model weights. In this process, they often lose the link to specific sources, leading to "confabulation" where distinct concepts or events are merged incorrectly.

Real-World Examples of Hallucination

Hallucinations can manifest in various ways, from harmless creative embellishments to serious factual errors:

Legal Fabrication: There have been documented instances where legal professionals used AI to draft briefs, only to find the model had invented non-existent court cases and citations to support an argument.
Code Generation: Developers using AI assistants might encounter "package hallucinations," where the model suggests importing a software library or calling a function that does not actually exist, simply because the name follows standard naming conventions.
Biographical Errors: When asked about less famous individuals, models may confidently attribute incorrect achievements, birthplaces, or career histories to them, effectively blending details from multiple people.

Mitigation Strategies

Reducing the frequency of hallucinations is a major focus of AI Safety. Engineers and researchers employ several techniques to ground models in reality:

Retrieval-Augmented Generation (RAG): This method connects the LLM to an external, trusted knowledge base, often indexed in a vector database. By retrieving relevant documents before generating an answer, the model is constrained by actual data.
Chain-of-Thought Prompting: This prompt engineering technique encourages the model to "show its work" by breaking down complex reasoning into intermediate steps, which often reduces logic errors.
Reinforcement Learning from Human Feedback (RLHF): During the fine-tuning stage, human evaluators rank the model's responses. By penalizing hallucinations and rewarding truthfulness, the model learns to align better with human expectations.

Grounding LLMs with Computer Vision

In Multimodal AI systems, text generation can be grounded using visual data. If an LLM is asked to describe a scene, it might hallucinate objects that aren't there. By integrating a high-precision object detector like YOLO26, developers can provide a factual list of present objects to the LLM, strictly limiting its output to verified detections.

The following Python example shows how to use the ultralytics package to extract a verified list of objects, which can then serve as a factual constraint for a language model prompt.

from ultralytics import YOLO

# Load the YOLO26n model (latest generation, efficient and accurate)
model = YOLO("yolo26n.pt")

# Run inference on an image to get ground-truth detections
results = model("https://ultralytics.com/images/bus.jpg")

# Extract the class names of actually detected objects
detected_objects = [model.names[int(c)] for c in results[0].boxes.cls]

# This factual list prevents the LLM from hallucinating items
print(f"Verified Objects for Prompt Context: {detected_objects}")
# Output example: ['bus', 'person', 'person', 'person', 'person']

Differentiating Related Concepts

It is important to distinguish hallucinations from other common AI errors:

Vs. Bias in AI: Bias refers to systematic prejudice in outputs (e.g., favoring one demographic over another), whereas hallucination is a failure of factual accuracy. A response can be unbiased yet hallucinated (e.g., "The moon is made of cheese").
Vs. Overfitting: Overfitting occurs when a model memorizes training data too closely and cannot generalize to new inputs. Hallucinations often happen when a model tries to generalize too much into areas where it lacks data.
Vs. Misclassification: In object detection, labeling a car as a truck is a classification error (accuracy issue), not a hallucination. Hallucination is specific to the generative creation of false content.

For those looking to manage datasets and train models with high data integrity to prevent downstream errors, the Ultralytics Platform offers comprehensive tools for annotation and dataset management. Furthermore, guidance from the NIST AI Risk Management Framework provides standards for evaluating and mitigating these risks in production environments.

Hallucination (in LLMs)

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Why Do Hallucinations Occur?

Real-World Examples of Hallucination

Mitigation Strategies

Grounding LLMs with Computer Vision

Differentiating Related Concepts

Read more in this category

12 aerial imagery use cases powered by computer vision

What is monocular depth estimation? An overview

A look at using Ultralytics YOLO models for AI threat detection

Join the Ultralytics community