Alibaba's Tongyi Qianwen has unveiled its latest flagship model, Qwen3.7-Max, which achieved a remarkable 10x improvement in Triton operator performance on the Pingtouge Zhenwu M890 processor. This was accomplished during a 35-hour autonomous kernel optimization task, involving 1,158 tool calls, without any chip architecture documentation. The model underwent five evolutionary stages, optimizing memory and processing strategies to fully utilize the processor's capabilities. Qwen3.7-Max outperformed competitors, achieving a 10x geometric mean speedup, surpassing GLM 5.1's 7.3x and Kimi K2.6's 5.0x improvements. The model's training involved decoupling tasks and employing cross-framework reinforcement learning, enhancing its generalization abilities. On benchmarks like MCP-Mark and SpreadSheetBench, Qwen3.7-Max demonstrated performance nearing that of Claude-4.6-Opus-Max.