NVIDIA Blackwell Ultra Breaks Records in MLPerf Inference v5.1

Rongchai Wang
Sep 09, 2025 17:20

NVIDIA’s Blackwell Ultra architecture achieves groundbreaking results in MLPerf Inference v5.1, setting new performance benchmarks with its innovative design and technology.

NVIDIA’s latest innovation, the Blackwell Ultra architecture, has set unprecedented records in the MLPerf Inference v5.1 benchmarks, highlighting its superior performance in AI inference tasks. According to NVIDIA, the GB300 NVL72 rack-scale system, powered by the Blackwell Ultra architecture, outperformed previous systems by delivering up to 1.4 times more DeepSeek-R1 inference throughput compared to its predecessor, the GB200 NVL72.

Innovative Architectural Advancements

The Blackwell Ultra architecture builds upon the success of the original Blackwell design, incorporating significant enhancements. It features 1.5 times more NVFP4 AI compute and doubles the attention-layer acceleration, with up to 288GB of HBM3e memory per GPU. These advancements have allowed NVIDIA to set new performance records across all new data center benchmarks in the MLPerf Inference v5.1 suite, including DeepSeek-R1, Llama 3.1 405B Interactive, Llama 3.1 8B, and Whisper.

Full-Stack Co-Design and Optimization

The remarkable performance of the Blackwell Ultra architecture is attributed to NVIDIA’s full-stack co-design approach, which includes hardware acceleration for the NVFP4 data format. This 4-bit floating point format offers superior accuracy compared to other FP4 formats and is comparable to higher-precision formats. NVIDIA’s TensorRT Model Optimizer played a crucial role in optimizing models like DeepSeek-R1 and Llama 3.1, enhancing performance while maintaining accuracy.

Record-Setting Performance Techniques

NVIDIA’s innovative disaggregated serving technique, which separates context and generation tasks, was pivotal in achieving record-setting performance on the Llama 3.1 405B Interactive benchmark. This method increased performance per GPU by nearly 50% with the GB200 NVL72 systems compared to traditional serving methods.

Industry Collaboration and Market Impact

NVIDIA’s achievements in AI inference have been bolstered by collaborations with cloud service providers and server manufacturers, including Azure, Broadcom, Cisco, and Dell Technologies. These partnerships ensure that the cutting-edge performance of NVIDIA’s AI platform is accessible to a wide range of organizations, offering lower total cost of ownership (TCO) and enhanced return on investment for AI applications.

For a deeper understanding of NVIDIA’s technological advances, visit the NVIDIA Technical Blog for more insights on MLPerf Inference v5.1 and the Blackwell Ultra architecture.

Image source: Shutterstock

Source link