Chinese AI models have made significant strides in the SWE-bench rankings, now holding four of the top ten positions. The SWE-bench, a real-time benchmark for software engineering tasks, recently updated its leaderboard, with Claude Opus 4.6 leading at 65.3%. Zhipu AI's open-source model GLM-5 ranks third at 62.8%, marking the highest position for an open-source model. Other Chinese models in the top ten include DeepSeek-V3.2, Qwen3.5-397B-A17B by Alibaba, and Step-3.5-Flash by Jiepoin星辰.
This marks a notable improvement for Chinese AI models, which previously fell outside the top ten. Li Zixuan, Global Head of Zhipu Z.ai, highlighted the progress, noting past criticisms of Chinese models for "benchmaxing." The latest update removed previous example demonstrations and the 80-step operation limit, adding auxiliary evaluation tasks to enhance the benchmark's rigor.
Chinese AI Models Secure Four Spots in SWE-bench Top 10
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
