深圳Yolo 视觉
深圳
立即加入
词汇表

TensorRT

利用TensorRT 优化深度学习模型,在NVIDIA ®)GPU 上实现更快、更高效的推理。利用YOLO 和 AI 应用程序实现实时性能。

TensorRT is a high-performance deep learning inference software development kit (SDK) developed by NVIDIA. It is designed to optimize neural network models for deployment, delivering low inference latency and high throughput for deep learning applications. By acting as an optimization compiler, TensorRT takes trained networks from popular frameworks like PyTorch and TensorFlow and restructures them to execute efficiently on NVIDIA GPUs. This capability is crucial for running complex AI models in production environments where speed and efficiency are paramount.

How TensorRT Optimizes Models

The core function of TensorRT is to convert a trained neural network into an optimized "engine" specifically tuned for the target hardware. It achieves this through several advanced techniques:

  • Layer Fusion: The optimizer combines multiple layers of a neural network into a single kernel, reducing memory access overhead and improving execution speed.
  • Precision Calibration: TensorRT supports reduced precision modes, such as mixed precision (FP16) and integer quantization (INT8). By reducing the number of bits used to represent numbers—often with minimal accuracy loss—developers can significantly accelerate math operations and reduce memory usage. This is a form of model quantization.
  • Kernel Auto-Tuning: The software automatically selects the best data layers and algorithms for the specific GPU architecture being used, ensuring maximum utilization of the hardware's parallel processing capabilities via CUDA.

实际应用

由于能够以极低延迟处理海量数据,TensorRT 广泛应用于依赖计算机视觉和复杂AI任务的行业,这些领域对时效性要求极高。

  1. 自动驾驶系统:汽车人工智能领域,自动驾驶汽车必须即时处理来自多路摄像头的视频流以detect 、路标和障碍物。借助TensorRT,物体检测网络等感知模型能在毫秒级完成帧分析,使车辆控制系统得以无延迟地做出关键安全决策。
  2. 工业自动化:现代工厂在制造过程中运用人工智能实现自动化光学检测。高速摄像机捕捉装配线上的产品图像,TensorRT模型实时识别缺陷或异常。这种方案确保质量控制与高速生产环境同步,通常直接部署在工厂车间边缘AI设备上,NVIDIA 平台。

Ultralytics YOLO TensorRT 使用TensorRT

使用现代人工智能工具,将TensorRT 集成到工作流程中非常简单。TensorRT ultralytics package provides a seamless method to convert standard PyTorch models into TensorRT engines. This allows users to leverage the state-of-the-art architecture of Ultralytics YOLO26 with the hardware acceleration of NVIDIA GPUs. For teams looking to manage their datasets and training pipelines before export, the Ultralytics 平台 offers a comprehensive environment to prepare models for such high-performance deployment.

以下示例演示了如何将YOLO26模型TensorRT 文件(.engine) 并 用于 实时推理:

from ultralytics import YOLO

# Load the latest stable YOLO26 model (nano size)
model = YOLO("yolo26n.pt")

# Export the model to TensorRT format (creates 'yolo26n.engine')
# This step optimizes the computational graph for your specific GPU
model.export(format="engine")

# Load the optimized TensorRT engine for high-speed inference
trt_model = YOLO("yolo26n.engine")

# Run inference on an image source
results = trt_model("https://ultralytics.com/images/bus.jpg")

TensorRT vs. ONNX vs. Training Frameworks

It is important to distinguish TensorRT from other terms often heard in the model deployment landscape:

  • Vs. PyTorch/TensorFlow: Frameworks like PyTorch are primarily designed for model training and research, offering flexibility and ease of debugging. TensorRT is an inference engine designed solely for executing trained models as fast as possible. It is not used for training.
  • Vs. ONNX: The ONNX (Open Neural Network Exchange) format acts as an intermediary bridge between frameworks. While ONNX provides interoperability (e.g., moving a model from PyTorch to another platform), TensorRT focuses on hardware-specific optimization. Often, a model is converted to ONNX first, and then parsed by TensorRT to generate the final engine.

For developers aiming to maximize the performance of their AI agents or vision systems, understanding the transition from a training framework to an optimized runtime like TensorRT is a key step in professional MLOps.

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入