Узнайте, как блоки Tensor обработки (TPU) ускоряют задачи машинного обучения, такие как обучение, вывод и обнаружение объектов, с непревзойденной эффективностью.
A Tensor Processing Unit (TPU) is a specialized application-specific integrated circuit (ASIC) designed by Google specifically to accelerate machine learning (ML) workloads. Unlike general-purpose processors that handle a broad range of computing tasks, TPUs are engineered from the ground up to optimize the massive matrix operations fundamental to neural networks. This specific focus allows them to achieve exceptionally high throughput and energy efficiency, making them a cornerstone of modern artificial intelligence (AI) infrastructure, particularly within the Google Cloud ecosystem. They play a vital role in reducing the time required for both training complex models and running real-time inference at scale.
The architecture of a TPU differs significantly from traditional processors. While a standard CPU (Central Processing Unit) excels at sequential tasks and complex logic, and a GPU (Graphics Processing Unit) uses parallel cores for graphics and general computing, a TPU utilizes a systolic array architecture. This design enables data to flow through thousands of multipliers simultaneously without accessing memory for every operation. By maximizing computational density and minimizing latency, TPUs are uniquely suited for the heavy linear algebra found in deep learning (DL) applications.
This specialized hardware is heavily optimized for frameworks like TensorFlow and increasingly supported by PyTorch, allowing developers to train massive foundation models or deploy efficient edge solutions without completely rewriting their codebases.
Understanding the hardware landscape is critical for optimizing machine learning operations (MLOps).
TPUs are deployed in various environments, from massive cloud clusters to tiny edge devices.
Developers can leverage TPU acceleration for Ultralytics models, particularly when using the Ultralytics Platform for cloud training or exporting models for edge deployment. The Edge TPU, for instance, requires models to be quantized and compiled specifically for its architecture.
The following example demonstrates how to export a YOLO26 model to the TFLite format, which is a prerequisite step before compiling for an Edge TPU:
from ultralytics import YOLO
# Load the latest lightweight YOLO26 nano model
model = YOLO("yolo26n.pt")
# Export the model to TFLite format
# This creates a '.tflite' file suitable for mobile and edge deployment
# Set int8=True for quantization, which is often required for Edge TPU performance
model.export(format="tflite", int8=True)
Once exported, the model can be further compiled for the Edge TPU using the Edge TPU Compiler, allowing it to run efficiently on devices like the Raspberry Pi with a Coral USB Accelerator. For more details on deployment, exploring the TFLite integration documentation can be very helpful.