DeepSeek V4 has demonstrated performance parity on Huawei Ascend NPUs and NVIDIA GPUs, dispelling rumors of adaptation delays. The V4 technical report highlights that the Fine-Grained Expert Partitioning Scheme has been successfully implemented, achieving 1.50x to 1.73x acceleration for standard inference workloads and up to 1.96x in latency-sensitive scenarios. The team has also open-sourced the CUDA version of the MegaMoE kernel as part of DeepGEMM, confirming that V4 maintains near-theoretical efficiency across both platforms without performance loss.
DeepSeek V4 Matches NVIDIA Performance on Huawei Ascend, Dispels Delay Rumors
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
