resources
blog
Pioneering Multinode Heterogeneous Inference: MangoBoost and AMD, in Collaboration with OEM Partners, Sets Record Llama2-70B Performance in MLPerf Inference v5.1
September 09, 2025
by MangoBoost
Interested in trying our LLMBoost software on AMD GPU servers, register to our virtual demo
MangoBoost is proud to announce groundbreaking results in the MLPerf Inference v5.1 benchmark. This round showcases how LLMBoost™, our enterprise-grade GenAI platform, unlocks the full potential of heterogeneous and multi-node GPU clusters.
By combining software intelligence, optimized ROCm™ integration, and deep collaborations with AMD, Dell, and Supermicro, we continue to push the frontiers of performance, scalability, and cost efficiency for large-scale LLM inference.
The scale and innovation of MangoBoost’s submission have been recognized across the industry:
David Kanter – Founder & Head of MLPerf, MLCommons: “I am thrilled to see MangoBoost’s creativity in pushing the limits of LLM serving with the first-ever heterogeneous results for MLPerf Inference. This submission integrates multiple generations of AMD GPUs with MangoBoost's LLMBoost platform to deliver impressive performance for Llama2-70B at 160K tokens/s, underscoring the power of software and system architecture in inference serving.”
Meena Arunachalam – Fellow, AMD: “We are thrilled with our MLPerf Inference v5.1 co-submission with MangoBoost. Together, we set a new performance record for Llama2-70B using AMD Instinct™ MI355X GPUs, scaling to 4-node and 8-node configurations. MangoBoost also delivered the first heterogeneous multi-node submission showcasing the combined power of AMD Instinct MI300X and MI325X GPUs, along with strong multi-node results on Dell and Supermicro servers. With nine high-performance MLPerf inference submissions, MangoBoost’s LLMBoost GenAI, powered by AMD ROCm™, provides seamless scalability and easy deployment for enterprise AI workloads.”
Frank Han – Distinguished Member of Technical Staff, Dell: “Dell's collaboration with MangoBoost is built on a foundation that began with MLPerf Training v5.0. In our official MLPerf Inference v5.1 co-submission (Dell_MangoBoost), MangoBoost’s LLMBoost software demonstrated impressive linear scaling across multi-node Dell PowerEdge server configurations with AMD accelerators.”
LLMBoost™ delivers a turn-key, high-performance inference stack with broad compatibility and advanced optimizations:
MangoBoost’s success in MLPerf Inference v5.1 is built on deep, strategic collaborations with industry leaders such as AMD, Dell, and Supermicro (listed alphabetically), alongside validation on Gigabyte platforms. These partnerships ensure LLMBoost™ is optimized, trusted, and deployed across a wide range of hardware ecosystems to unlock the full performance potential of AMD Instinct™ GPUs.
Advancing the frontier of LLM inference serving requires more than just software innovation — it depends on tight hardware-software co-design and validation across diverse platforms. By working directly with leading technology partners, MangoBoost ensures that LLMBoost™ is not only optimized for the latest AMD Instinct™ GPUs, but also proven on a variety of server architectures from major vendors. These collaborations demonstrate that enterprises can trust LLMBoost™ to deliver consistent, high-performance results on their infrastructure of choice, whether in single-node systems, multi-node clusters, or heterogeneous GPU deployments.
MangoBoost Collaboration with AMD:
Through co-engineering efforts with AMD, MangoBoost gained early access to next-generation AMD Instinct™ GPUs such as the MI355X, enabling us to tightly integrate LLMBoost™ with the ROCm™ software stack. This collaboration resulted in the first-ever third-party MLPerf submission on AMD’s flagship MI355X, achieving 648K tok/s on 64 GPUs and delivering record-setting performance in the open division with 8-node MI355X clusters.Read more on AMD’s perspective in their blog.
MangoBoost Collaboration with Dell:
In collaboration with Dell Technologies, MangoBoost validated LLMBoost™ on Dell PowerEdge servers, demonstrating reliable performance across enterprise-scale deployments. This collaboration included validation on multi-node MI300X clusters with near-linear scalability and further confirmed LLMBoost™’s enterprise-ready performance on Dell PowerEdge XE9680. Learn more from Dell’s blog.
MangoBoost Collaboration with Supermicro:
Working with Supermicro, MangoBoost validated LLMBoost™ across a wide range of single-node, multi-node, and heterogeneous GPU deployments. This included the first-ever heterogeneous GPU submission, combining MI300X nodes (Supermicro AS-8125GS-TNMR2) with MI325X nodes (Supermicro AS-8126GS-TNMR) and achieving near-linear scalability. The results demonstrated how LLMBoost™ can seamlessly orchestrate workloads across different GPU generations while maintaining consistent performance and efficiency.
The figure above summarizes the throughput (tokens/s) achieved by MangoBoost and our partners across homogeneous and heterogeneous configurations in MLPerf Inference v5.1.
Below are some of the key highlights from this round:
As a result of deep collaboration with AMD and full integration of the optimized ROCm software stack, our submission delivered 169K token/s throughput in the closed division, outperforming the next-best NVIDIA result by 35% and 307% higher performance on average across all submissions by primary metric.
In the open division, our joint submission with AMD scaled an 8-node MI355X cluster to an unprecedented 648K token/s, highlighting the scalability and efficiency of LLMBoost™ on the latest GPU architectures.
For the first time in MLPerf history, MangoBoost submitted results using heterogeneous GPU configurations, combining MI300X and MI325X GPUs. This innovative setup achieved 169K token/s, with near-perfect scaling, proving LLMBoost™’s ability to efficiently orchestrate workloads across multiple GPU generations.
This capability gives customers the flexibility to mix and match hardware, allowing them to integrate newer GPUs into their infrastructure without sacrificing performance, while optimizing for cost-efficiency.
MangoBoost’s submissions in MLPerf Inference v5.1 demonstrate exceptional scalability of LLMBoost™ across a wide range of multi-node configurations, from homogeneous clusters to large-scale heterogeneous deployments.
With intelligent load balancing, optimized communication libraries, and full-stack auto-config tuning, LLMBoost™ delivers up to 97% scalability, while maintaining an average scalability of ~94% across all deployments.
LLMBoost™ is engineered to deliver best-in-class performance across a wide range of workloads — from text-only deployments to highly complex vision-text (multi-modal) applications. Its ability to scale across different models and workloads ensures enterprise-grade reliability and efficiency for any GenAI deployment.
1. Text-Only Models: Superior Throughput and Cost Efficiency
For text-only LLM deployments, LLMBoost™ consistently outperforms competing solutions like vLLM and Ollama across multiple model families.
This unmatched performance translates into significant cost savings, enabling enterprises to scale inference workloads at a fraction of the cost compared to existing frameworks.
2. High-Performance Vision-Text Multi-Modal Deployments
LLMBoost™ extends its performance and scalability leadership to multi-modal inference, powering complex vision-text applications such as image-grounded LLMs.
Together, these results showcase LLMBoost™ as the preferred choice for vision-text workloads, offering a high-performance, scalable, and cost-efficient solution for enterprises deploying multi-modal AI systems at scale.
Getting started with LLMBoost™ is as simple as selecting the model you want on HuggingFace and then running one command.
Alongside LLMBoost™, MangoBoost accelerates infrastructure with DPU-powered hardware solutions:
To experience the record-setting performance of MLPerf v5.1, register to our virtual demo page.