AI models, including GPT-5.5, have struggled to meet the demands of Vals AI's new Finance Agent v2 benchmark, which simulates the workflow of junior financial analysts. The test, comprising 927 expert-reviewed questions, saw GPT-5.5 achieve a top accuracy of just 51.76%, slightly ahead of Claude Opus 4.7 and Claude Sonnet 4.6. The benchmark requires models to autonomously locate relevant information within extensive financial reports and perform complex calculations, highlighting the challenges AI faces in high-precision financial analysis.
Despite improvements in basic retrieval tasks, the results indicate that AI is still far from replacing human analysts in finance. Under strict scoring standards, all leading models scored below 40%, with the most challenging categories yielding scores as low as 23%. The test underscores the need for further advancements in AI to meet the rigorous demands of financial analysis.
AI Models Struggle in Vals AI's New Financial Analyst Test
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
