Yolo Vision Shenzhen
Шэньчжэнь
Присоединиться сейчас
Глоссарий

TPU Tensor блокTensor обработки)

Узнайте, как блоки Tensor обработки (TPU) ускоряют задачи машинного обучения, такие как обучение, вывод и обнаружение объектов, с непревзойденной эффективностью.

A Tensor Processing Unit (TPU) is a specialized application-specific integrated circuit (ASIC) designed by Google specifically to accelerate machine learning (ML) workloads. Unlike general-purpose processors that handle a broad range of computing tasks, TPUs are engineered from the ground up to optimize the massive matrix operations fundamental to neural networks. This specific focus allows them to achieve exceptionally high throughput and energy efficiency, making them a cornerstone of modern artificial intelligence (AI) infrastructure, particularly within the Google Cloud ecosystem. They play a vital role in reducing the time required for both training complex models and running real-time inference at scale.

Архитектура и функциональность

The architecture of a TPU differs significantly from traditional processors. While a standard CPU (Central Processing Unit) excels at sequential tasks and complex logic, and a GPU (Graphics Processing Unit) uses parallel cores for graphics and general computing, a TPU utilizes a systolic array architecture. This design enables data to flow through thousands of multipliers simultaneously without accessing memory for every operation. By maximizing computational density and minimizing latency, TPUs are uniquely suited for the heavy linear algebra found in deep learning (DL) applications.

This specialized hardware is heavily optimized for frameworks like TensorFlow and increasingly supported by PyTorch, allowing developers to train massive foundation models or deploy efficient edge solutions without completely rewriting their codebases.

Distinguishing Processing Units

Understanding the hardware landscape is critical for optimizing machine learning operations (MLOps).

  • CPU: The general-purpose "brain" of a computer, ideal for sequential processing, data preprocessing, and handling complex logic. It is often used for data augmentation pipelines but is slower for heavy matrix math.
  • GPU: Originally built for image rendering, GPUs are the industry standard for model training due to their versatility and massive parallelism. They are excellent for training flexible models like Ultralytics YOLO26.
  • TPU: A purpose-built accelerator that trades flexibility for raw speed in tensor operations. It is designed to maximize FLOPS (floating-point operations per second) specifically for neural network calculations, often providing superior performance-per-watt for specific large-scale workloads.

Применение в реальном мире

TPUs are deployed in various environments, from massive cloud clusters to tiny edge devices.

  1. Large Language Model Training: Google utilizes vast interconnected clusters, known as TPU Pods, to train immense large language models (LLMs) such as PaLM and Gemini. These systems can process petabytes of training data in a fraction of the time it would take traditional hardware, accelerating advancements in generative AI.
  2. Edge AI and IoT: The Coral Edge TPU brings this acceleration to low-power devices. It enables efficient computer vision (CV) applications, such as running object detection on a manufacturing line to identify defects locally. This allows for immediate decision-making without relying on cloud connectivity, preserving bandwidth and privacy.

Using TPUs with Ultralytics

Developers can leverage TPU acceleration for Ultralytics models, particularly when using the Ultralytics Platform for cloud training or exporting models for edge deployment. The Edge TPU, for instance, requires models to be quantized and compiled specifically for its architecture.

The following example demonstrates how to export a YOLO26 model to the TFLite format, which is a prerequisite step before compiling for an Edge TPU:

from ultralytics import YOLO

# Load the latest lightweight YOLO26 nano model
model = YOLO("yolo26n.pt")

# Export the model to TFLite format
# This creates a '.tflite' file suitable for mobile and edge deployment
# Set int8=True for quantization, which is often required for Edge TPU performance
model.export(format="tflite", int8=True)

Once exported, the model can be further compiled for the Edge TPU using the Edge TPU Compiler, allowing it to run efficiently on devices like the Raspberry Pi with a Coral USB Accelerator. For more details on deployment, exploring the TFLite integration documentation can be very helpful.

Присоединяйтесь к сообществу Ultralytics

Присоединяйтесь к будущему ИИ. Общайтесь, сотрудничайте и развивайтесь вместе с мировыми новаторами

Присоединиться сейчас