تعرف على YOLO26: جيل جديد من ذكاء الرؤية الاصطناعي.
Ultralytics

Ultralytics Benchmarks

How YOLO26 performs on the Ultralytics Platform's NVIDIA GPUs — measured training throughput, auto-batch, memory, power, and cost-efficiency, so you can pick the right GPU for your budget and timeline.

GPU Training Throughput

YOLO26 detection trained on COCO at 640px with auto-batch across every NVIDIA GPU on the Platform. Switch the model size and chart metric, or filter by GPU generation; the table is fully sortable.

Model
Chart metric
Generation
Throughput (img/s) by GPU
YOLO26n on COCO at 640px — higher is better ↑
H200 SXM
Hopper
141 GB
490.6
221
40.2 GB
361 W
$4.39
402.3K
B300
Blackwell
288 GB
474.6
411
73.4 GB
457 W
$7.39
231.2K
H200 NVL
Hopper
143 GB
469.3
221
39.8 GB
283 W
$3.39
498.4K
H100 NVL
Hopper
94 GB
432.1
147
26.8 GB
274 W
$3.19
487.6K
H100 SXM
Hopper
80 GB
424.3
123
23.1 GB
315 W
$3.29
464.3K
RTX PRO 6000
Blackwell
96 GB
420.6
149
27.6 GB
319 W
$2.09
724.5K
B200
Blackwell
180 GB
404.9
281
50.4 GB
419 W
$5.89
247.5K
RTX 5090
Blackwell
32 GB
356
49
9.7 GB
304 W
$0.99
1.3M
RTX 4090
Ada
24 GB
306
35
13.5 GB
235 W
$0.69
1.6M
H100 PCIe
Hopper
80 GB
302.4
123
22.6 GB
197 W
$2.89
376.7K
RTX 6000 Ada
Ada
48 GB
294.8
73
13.9 GB
254 W
$0.77
1.4M
A100 SXM
Ampere
80 GB
286.8
123
23.1 GB
342 W
$1.49
692.9K
A100 PCIe
Ampere
80 GB
283.2
123
23.2 GB
328 W
$1.39
733.5K
L40S
Ada
48 GB
265.1
70
13.3 GB
258 W
$0.86
1.1M
L40
Ada
48 GB
255.5
68
13.1 GB
255 W
$0.99
929.1K
RTX PRO 4500
Blackwell
32 GB
249.7
49
11.0 GB
161 W
$0.64
1.4M
RTX A6000
Ampere
48 GB
209.9
73
13.9 GB
278 W
$0.49
1.5M
RTX 3090
Ampere
24 GB
184.8
35
13.3 GB
312 W
$0.46
1.4M
RTX A5000
Ampere
24 GB
171.2
35
7.1 GB
216 W
$0.27
2.3M
A40
Ampere
48 GB
161.2
70
13.4 GB
262 W
$0.44
1.3M
RTX A4500
Ampere
20 GB
149.2
30
9.1 GB
191 W
$0.25
2.1M
RTX 4000 Ada
Ada
20 GB
127.9
30
6.5 GB
98 W
$0.26
1.8M
L4
Ada
24 GB
116
33
10.2 GB
78 W
$0.39
1.1M
RTX 2000 Ada
Ada
16 GB
88
22
4.7 GB
60 W
$0.24
1.3M

Training methodology

We use training throughput — images processed per second during training — as the yardstick; it correlates directly with time-to-solution. Every result is measured on Ultralytics Platform GPUs — the same NVIDIA hardware you rent for cloud training in one click, from entry-level workstation cards up to flagship data-center GPUs. We train YOLO26 at all five sizes (n/s/m/l/x) so you can match a model to your hardware budget.

Settings. 2 epochs on 25% of COCO at 640px, AMP mixed precision, single GPU, with auto-batch (batch=-1) selecting the largest batch that fits in memory. We report the steady-state second epoch, which excludes first-epoch warmup (dataset caching, CUDA graph capture) and the end-of-run validation pass. Resolved batch size, peak VRAM (including the CUDA context), and peak board power are recorded directly from each GPU.

Cost-efficiency. Images per dollar = throughput × 3600 ÷ hourly price, using Ultralytics Platform on-demand pricing — it often reorders the ranking dramatically, as value cards out-earn flagship GPUs per dollar. Measured on ultralytics 8.4.68, torch 2.8, CUDA 12.8. See also the Train and Benchmark mode docs.

Coming soon

Inference Benchmarks

Predict latency and throughput on CPU and GPU across export formats — PyTorch, ONNX, TensorRT, OpenVINO, CoreML, TF.js, and more — so you can pick the fastest path to deployment.

Ready to build your next vision AI project?

Built on Ultralytics open source with 132.7k+ GitHub stars. Start training models in minutes.