V4 has achieved a perfect score of 120/120 on the Putnam-2025 math benchmark, tying for first place with Axiom. This accomplishment was achieved under the Frontier Regime, which utilized a hybrid formal-informal reasoning approach. V4's method involved generating candidate solutions through informal reasoning, self-verification, and completing rigorous proofs using a formal agent in Lean.
In the Practical Regime, V4-Flash-Max scored 81.00 on the Putnam-200 Pass@8 benchmark, outperforming Seed-2.0-Prover's 35.50 and Gemini 3 Pro and Seed-1.5-Prover's 26.50. The results highlight V4's advanced capabilities in mathematical reasoning and problem-solving, showcasing its potential in both typical deployment and large-scale computational scenarios.
V4 Achieves Perfect Score on Putnam-2025 Math Benchmark
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
