MiniMax AI Reveals M2.7 Model Inference Speed on Various GPUs

MiniMax AI has released performance test results for its 230-billion-parameter model, M2.7, showcasing inference speeds across different hardware configurations. Using Unsloth’s UD-IQ3_XXS (80GB) quantized version, the model achieved an inference speed of 71.52 tokens per second (tok/s) with a time-to-first-token (TTFT) of 1045 milliseconds on four RTX 4090 (96GB) GPUs. Performance improved on four RTX 5090 (128GB) GPUs, reaching 120.54 tok/s with a TTFT of 725 ms. Additionally, a single RTX PRO 6000 (96GB) GPU recorded a speed of 118.74 tok/s with a TTFT of 765 ms. Tests on DGX systems were also conducted, though specific results were not disclosed.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.