The company reported that its GB200 NVL72 and newer GB300 NVL72 systems topped all benchmarks in MLPerf Training 6.0, a widely followed industry test of AI training performance.
The latest benchmark round introduced two new workloads based on mixture-of-experts models, a design increasingly used to improve efficiency when training large language models.
Nvidia said the GB300 NVL72 achieved up to 1.6 times faster training performance than the previous GB200 NVL72 generation and was the only platform to complete submissions across the entire benchmark suite.
The company also highlighted its ability to scale training workloads to thousands of graphics processing units, including a DeepSeek-V3 671 billion-parameter model run spanning 8,192 GPUs and a separate 5,120-GPU submission for Llama 3.1 405B.
Cloud computing partners reported similar large-scale deployments.
Microsoft Azure said it trained the Llama 3.1 405B model to the benchmark's target quality level in just over seven minutes using 8,192 GPUs, while CoreWeave recorded the fastest DeepSeek-V3 training result at 2.02 minutes on the same scale.
Nvidia attributed part of the performance gains to networking technology that links 72 GPUs within a single rack, allowing them to operate as a unified pool of computing power and memory.
The company also emphasised system reliability, highlighting features designed to detect faults, reroute network traffic and resume training runs after interruptions.
Results submitted by 19 partner organisations included claims of significantly faster training times and improved efficiency for AI developers deploying large-scale models.