Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Half-Precision

Learn how half-precision (FP16) accelerates AI. Discover how to optimize Ultralytics YOLO26 for faster inference and reduced memory on GPUs and edge devices.

Half-precision, often denoted as FP16, is a floating-point data format that occupies 16 bits of computer memory, unlike the standard single-precision (FP32) format which uses 32 bits. In the context of artificial intelligence and machine learning, half-precision is a critical optimization technique used to accelerate model training and inference while significantly reducing memory consumption. By storing numerical values—such as neural network model weights and gradients—using fewer bits, developers can fit larger models onto GPU graphics processing units or run existing models much faster. This efficiency gain is essential for deploying modern, complex architectures like YOLO26 on resource-constrained devices without sacrificing substantial accuracy.

Link to this sectionThe Mechanics of Floating-Point Formats#

To understand half-precision, it helps to contrast it with full precision. A standard 32-bit floating-point number (FP32) dedicates more bits to the exponent and mantissa, providing a very wide dynamic range and high numerical precision. However, deep learning models are notoriously resilient to small numerical errors. Neural networks can often learn effectively even with the reduced dynamic range and granularity offered by the 16-bit format.

Transitioning to half-precision cuts the memory bandwidth requirement in half. This allows for larger batch sizes during training, which can stabilize gradient updates and speed up the overall training process. Modern hardware accelerators, such as NVIDIA's Tensor Cores, are specifically optimized to perform matrix multiplications in FP16 at significantly higher speeds than FP32.

Link to this sectionKey Benefits in AI Workflows#

The adoption of half-precision offers several tangible advantages for AI practitioners:

  • Reduced Memory Footprint: Models require half the VRAM (Video RAM), allowing developers to train larger networks or use higher-resolution training data on the same hardware.
  • Faster Inference: For real-time applications, such as autonomous vehicles or video analytics, FP16 can double the throughput (frames per second), reducing inference latency.
  • Energy Efficiency: Processing fewer bits requires less energy, which is crucial for edge AI devices and mobile phones where battery life is a constraint.
  • Mixed Precision Training: Many modern frameworks utilize mixed precision, where the model keeps a master copy of weights in FP32 for stability but performs heavy computations in FP16. This provides the "best of both worlds"—speed and convergence stability.

Link to this sectionReal-World Applications#

Half-precision is ubiquitous in production-grade AI systems. Here are two concrete examples:

  1. Real-Time Object Detection on Edge Devices: Consider a security camera system running Ultralytics YOLO26 to detect intruders. Deploying the model in FP16 allows it to run smoothly on an embedded chip like an NVIDIA Jetson or a Raspberry Pi AI Kit. The reduced computational load ensures the system can process video feeds in real-time inference mode without lagging, which is vital for timely alerts.

  2. Large Language Model (LLM) Deployment: Generative AI models, such as GPT-4 or Llama variants, have billions of parameters. Loading these models in full precision (FP32) would require massive amounts of server memory that are often cost-prohibitive. By converting these models to FP16 (or even lower formats), cloud providers can serve foundation models to thousands of users simultaneously, making services like chatbots and automated content generation economically viable.

Link to this sectionHalf-Precision vs. Quantization#

While both techniques aim to reduce model size, it is important to distinguish 'Half-Precision' from model quantization.

  • Half-Precision (FP16): Reduces the bit-width from 32 to 16 but keeps the data as a floating-point number. It retains a reasonable dynamic range and is often the default choice for GPU training and inference.
  • Quantization (INT8): Converts floating-point numbers into integers (usually 8-bit). This offers even greater speed and memory savings but can sometimes lead to a more noticeable drop in accuracy if not done carefully (e.g., via quantization-aware training). FP16 is generally safer for preserving model performance, while INT8 is used for extreme optimization.

Link to this sectionImplementing Half-Precision with Ultralytics#

The ultralytics library makes it straightforward to utilize half-precision. During prediction, the model can automatically switch to half-precision if the hardware supports it, or it can be explicitly requested.

Here is a Python example demonstrating how to load a YOLO26 model and perform inference using half-precision. Note that running in half=True typically requires a CUDA-enabled GPU.

import torch
from ultralytics import YOLO

# Check if CUDA (GPU) is available, as FP16 is primarily for GPU acceleration
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the latest YOLO26n model
model = YOLO("yolo26n.pt")

# Run inference on an image with half-precision enabled
# The 'half=True' argument tells the engine to use FP16
results = model.predict("https://ultralytics.com/images/bus.jpg", device=device, half=True)

# Print the device and precision status
print(f"Inference device: {results[0].orig_img.shape}, Speed: {results[0].speed}")

For users managing datasets and training pipelines, the Ultralytics Platform handles many of these optimizations automatically in the cloud, streamlining the transition from annotation to optimized model deployment.

Link to this sectionFurther Reading and Resources#

To explore more about numerical formats and their impact on AI, consult the NVIDIA Deep Learning Performance Documentation regarding Tensor Cores. For a broader understanding of how these optimizations fit into the development lifecycle, read about machine learning operations (MLOps).

Additionally, those interested in the trade-offs between different optimization strategies might look into pruning, which removes connections rather than reducing bit precision, or explore the IEEE Standard for Floating-Point Arithmetic (IEEE 754) for the technical specifications of digital arithmetic. Understanding these fundamentals helps in making informed decisions when exporting models to formats like ONNX or TensorRT for production environments.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning