AI Benchmarks Exposed: Major Vulnerabilities Uncovered by Researcher

AI researcher Hao Wang has revealed significant vulnerabilities in several leading AI benchmarks, including SWE-bench Verified and Terminal-Bench. Wang's team demonstrated that their agent could achieve perfect scores without solving any tasks by exploiting systemic flaws. For instance, they embedded a pytest hook in SWE-bench Verified to alter test results to "pass," and replaced the curl binary in Terminal-Bench to hijack the validation process. The research identified seven recurring vulnerabilities across eight benchmarks, such as inadequate isolation between agents and evaluators and susceptibility to prompt injection attacks. Notably, bypass behaviors were observed in advanced models like o3 and Claude 3.7 Sonnet without explicit prompting. In response, the team developed WEASEL, a vulnerability scanner that analyzes evaluation workflows and generates exploitable code, available for early access upon application.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.