Joerg Hiller
Dec 16, 2025 17:17
NVIDIA introduces CUDA MPS, a tool to boost GPU memory performance without code changes, leveraging MLOPart technology for optimized latency.
NVIDIA has unveiled a novel approach to enhancing GPU memory performance with its CUDA Multi-Process Service (MPS), allowing developers to optimize GPU utilization without altering existing codebases. This announcement highlights the ability of CUDA MPS to share GPU resources across multiple processes, thereby improving efficiency and performance, according to NVIDIA.
Introducing MLOPart Technology
Central to this development is the Memory Locality Optimized Partition (MLOPart), a feature designed for NVIDIA’s CUDA MPS that enhances latency performance. MLOPart enables multi-GPU-aware applications to interact with MLOPart devices, which are essentially optimized for lower latency operations. This feature is particularly significant for applications that are latency-sensitive rather than bandwidth-intensive, a common scenario when dealing with large language models.
Benefits of MLOPart Devices
MLOPart devices present themselves as distinct CUDA devices with their own compute and memory resources, akin to NVIDIA’s Multi-Instance GPU (MIG) technology. This allows for a more granular allocation of resources, which can be particularly beneficial for applications that require specific performance characteristics. For instance, NVIDIA’s DGX B200 and B300 systems can support multiple MLOPart devices per GPU, enhancing flexibility and performance tuning capabilities.
Deployment and Configuration
Deploying CUDA MPS with MLOPart is managed through MPS controller commands, which facilitate the configuration of MPS servers to create MLOPart-enabled clients. This setup allows for a tailored application environment, accommodating various user requirements. The use of the MPS controller’s device_query command provides insights into the enumerated CUDA devices, aiding in configuration and optimization tasks.
Comparative Analysis with MIG
While both MLOPart and MIG offer mechanisms to partition GPU resources, they operate under different paradigms. MIG requires superuser privileges for configuration and provides strict memory and performance isolation. In contrast, MLOPart, being a part of MPS, allows for user-specific configurations without the need for superuser access, although it doesn’t enforce the same level of isolation.
Overall, NVIDIA’s CUDA MPS with MLOPart technology represents a significant advancement in GPU resource management, enabling developers to achieve enhanced performance without the need for extensive code modifications. This innovation is poised to benefit a wide range of applications, especially those requiring low-latency processing capabilities.
Image source: Shutterstock
