Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Data Leakage

Explore what data leakage is in machine learning and learn how to prevent it. Discover best practices to keep your Ultralytics YOLO pipeline secure.

Data leakage in machine learning (ML) occurs when information from outside the training data is inappropriately used to create a model. This hidden algorithmic flaw creates a misleading illusion of exceptional performance during training and model testing, but it results in severe generalization failure when the model faces real-world, unseen data. Unlike traditional cybersecurity definitions where a data leak refers to unauthorized data exposure, the definition of data leakage in machine learning centers entirely on training contamination and compromised predictive integrity.

Link to this sectionHow Data Leakage Occurs#

To understand what data leakage is in machine learning, it helps to look at the two primary mechanisms through which this failure point manifests in modern pipelines:

  • Train-Test Contamination: This happens when the test data accidentally bleeds into the training set. A common cause is performing data preprocessing (such as normalization or calculating mean values) on the entire dataset before splitting it, rather than applying these transformations independently.
  • Target Leakage: This occurs when predictive features include information that will not logically be available at the time of inference. For instance, including a feature that is a direct consequence of the target variable inherently gives the model the answer key in advance.

Link to this sectionReal-World Examples of Data Leakage#

Understanding how to spot and prevent leakage is critical for building trustworthy AI. Here are two concrete examples of how this concept disrupts production deployments:

  • AI in Healthcare: If a medical facility trains an algorithm to detect lung disease using patient X-rays, but the positive scans all contain surgical markers placed by doctors after a diagnosis, target leakage occurs. The model simply learns to identify the surgical marker rather than the biological signs of the disease.
  • Computer Vision Video Analysis: In visual tasks like action recognition, randomly splitting adjacent video frames into both the training and validation sets causes massive train-test contamination. Because consecutive frames are nearly identical, the model memorizes the overlapping backgrounds instead of learning the complex human action, violating standard OpenAI model evaluation practices.

Link to this sectionData Leakage Prevention and Protection#

Data leakage protection relies on maintaining strict data hygiene and utilizing structured environments throughout the engineering lifecycle.

from ultralytics import YOLO

# Load the recommended Ultralytics YOLO26 model
model = YOLO("yolo26n.pt")

# Train the model using a strict dataset configuration (data.yaml)
# The YAML file enforces rigid, isolated paths for 'train' and 'val' directories,
# ensuring data leakage protection between the learning and evaluation phases.
results = model.train(data="dataset.yaml", epochs=50, imgsz=640)

Because terminology often overlaps between data science and cybersecurity, it is important to distinguish data leakage from closely related ideas.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.

Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.

Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.

Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.

Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.

Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.

Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.

Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning