Yolo 深圳
深セン
今すぐ参加
用語集

TensorRT

TensorRT ディープラーニングモデルを最適化し、NVIDIA GPU上でより高速で効率的な推論を実現。YOLO AIアプリケーションでリアルタイムのパフォーマンスを実現します。

TensorRT is a high-performance deep learning inference software development kit (SDK) developed by NVIDIA. It is designed to optimize neural network models for deployment, delivering low inference latency and high throughput for deep learning applications. By acting as an optimization compiler, TensorRT takes trained networks from popular frameworks like PyTorch and TensorFlow and restructures them to execute efficiently on NVIDIA GPUs. This capability is crucial for running complex AI models in production environments where speed and efficiency are paramount.

How TensorRT Optimizes Models

The core function of TensorRT is to convert a trained neural network into an optimized "engine" specifically tuned for the target hardware. It achieves this through several advanced techniques:

  • Layer Fusion: The optimizer combines multiple layers of a neural network into a single kernel, reducing memory access overhead and improving execution speed.
  • Precision Calibration: TensorRT supports reduced precision modes, such as mixed precision (FP16) and integer quantization (INT8). By reducing the number of bits used to represent numbers—often with minimal accuracy loss—developers can significantly accelerate math operations and reduce memory usage. This is a form of model quantization.
  • Kernel Auto-Tuning: The software automatically selects the best data layers and algorithms for the specific GPU architecture being used, ensuring maximum utilization of the hardware's parallel processing capabilities via CUDA.

実際のアプリケーション

TensorRT 、膨大な量のデータを最小限の遅延で処理できるため、 コンピュータービジョンや複雑なAIタスクに依存し、 タイミングが極めて重要な産業分野で広く採用TensorRT 。

  1. 自律システム: 自動車分野のAIにおいて、自動運転車は複数のカメラからの映像フィードを処理し、detect 、標識、障害物を瞬時にdetect 。TensorRTを使用することで、物体検出ネットワークなどの知覚モデルはフレームを数ミリ秒で分析でき、車両制御システムが遅延なく安全上重要な判断を下せるようにする。
  2. 産業オートメーション:現代の工場では、製造工程における自動光学検査にAIを活用しています。高速カメラが組立ライン上の製品画像を撮影し、TensorRTモデルがリアルタイムで欠陥や異常を識別します。これにより、品質管理が高速生産環境に追従することが保証され、多くの場合、NVIDIA プラットフォームのようなエッジAIデバイスを工場フロアに直接配置して運用されています。

Ultralytics TensorRT YOLO TensorRT の使用

TensorRT ワークフローに統合するのは、最新のAIツールを使えば簡単だ。その ultralytics package provides a seamless method to convert standard PyTorch models into TensorRT engines. This allows users to leverage the state-of-the-art architecture of Ultralytics YOLO26 with the hardware acceleration of NVIDIA GPUs. For teams looking to manage their datasets and training pipelines before export, the Ultralytics offers a comprehensive environment to prepare models for such high-performance deployment.

以下の例は、YOLO26モデルをTensorRT エンジンファイルにエクスポートする方法を示しています(.engine) そして それを リアルタイム推論:

from ultralytics import YOLO

# Load the latest stable YOLO26 model (nano size)
model = YOLO("yolo26n.pt")

# Export the model to TensorRT format (creates 'yolo26n.engine')
# This step optimizes the computational graph for your specific GPU
model.export(format="engine")

# Load the optimized TensorRT engine for high-speed inference
trt_model = YOLO("yolo26n.engine")

# Run inference on an image source
results = trt_model("https://ultralytics.com/images/bus.jpg")

TensorRT vs. ONNX vs. Training Frameworks

It is important to distinguish TensorRT from other terms often heard in the model deployment landscape:

  • Vs. PyTorch/TensorFlow: Frameworks like PyTorch are primarily designed for model training and research, offering flexibility and ease of debugging. TensorRT is an inference engine designed solely for executing trained models as fast as possible. It is not used for training.
  • Vs. ONNX: The ONNX (Open Neural Network Exchange) format acts as an intermediary bridge between frameworks. While ONNX provides interoperability (e.g., moving a model from PyTorch to another platform), TensorRT focuses on hardware-specific optimization. Often, a model is converted to ONNX first, and then parsed by TensorRT to generate the final engine.

For developers aiming to maximize the performance of their AI agents or vision systems, understanding the transition from a training framework to an optimized runtime like TensorRT is a key step in professional MLOps.

Ultralytics コミュニティに参加する

AIの未来を共に切り開きましょう。グローバルなイノベーターと繋がり、協力し、成長を。

今すぐ参加