SonicMoE has announced a significant performance milestone, achieving peak throughput on NVIDIA Blackwell GPUs as of April 23 (UTC+8). The model's forward and backward pass TFLOPS performance surpasses the DeepGEMM baseline by 54% and 35%, respectively. Additionally, it exceeds the official Triton example by 21% in forward pass TFLOPS. SonicMoE also maintains a minimal activation memory footprint, comparable to dense models, marking a notable advancement in GPU efficiency.