NVIDIA Blackwell Leads MLPerf Training v5.1 with Record-Breaking Performance

Joerg Hiller
Nov 13, 2025 02:50

NVIDIA’s Blackwell architecture achieves top performance across all MLPerf Training v5.1 benchmarks, highlighting advancements in AI training efficiency and precision.

NVIDIA’s Blackwell architecture has set new standards in AI training, achieving the fastest times across all MLPerf Training v5.1 benchmarks. This accomplishment marks a significant advancement in AI training capabilities, underscoring NVIDIA’s commitment to innovation in AI technology, according to NVIDIA’s official blog.

Benchmark Performance

MLPerf Training v5.1, a prominent suite of benchmarks, evaluates the performance of AI models in training. NVIDIA’s Blackwell architecture, powering both NVIDIA Blackwell and Blackwell Ultra GPUs, excelled in all benchmarks, demonstrating unparalleled performance at maximum and submitted scales. Notably, the architecture achieved a record-breaking 10-minute training time for the Llama 3.1 405B model using 5,120 Blackwell GPUs.

Technological Innovations

The success of the Blackwell architecture is attributed to several technological advancements. Key among these is the introduction of the NVFP4 data format, which enhances precision and training efficiency. This format allows for faster training times by using fewer tokens while maintaining accuracy, a crucial factor in large-scale AI training.

Further innovations include enhancements in GPU hardware, software libraries like NVIDIA Transformer Engine, and numerical techniques that optimize the training process. These improvements contribute to the architecture’s ability to handle complex AI models efficiently.

Blackwell Ultra’s Enhanced Capabilities

Blackwell Ultra GPUs, part of NVIDIA’s latest offerings, incorporate significant improvements over previous iterations. These include increased peak NVFP4 throughput and enhanced memory capacity, enabling more efficient handling of large language models (LLMs). The enhancements contribute to a substantial leap in performance, particularly in LLM training benchmarks.

Networking and Scale

NVIDIA’s advancements extend beyond GPUs to include networking improvements. The introduction of the NVIDIA Quantum-X800 networking platform, featuring 800 Gb/s connectivity, supports the scale required for large AI models. This infrastructure enables seamless communication between GPUs, facilitating faster training times and improved scalability.

Future Implications

NVIDIA’s achievements with the Blackwell architecture highlight the potential for accelerated AI development and deployment. By continuously innovating across hardware and software, NVIDIA aims to reduce the cost of intelligence and pave the way for future breakthroughs in AI technology.

Image source: Shutterstock

Source link