AI Models Struggle with Pokémon, Exposing Reasoning Gaps

Top AI models, including Anthropic's Claude and Google's Gemini, have struggled to master the children's game Pokémon, highlighting significant gaps in long-term reasoning and planning. Despite excelling in tasks like medical exams and coding, these AI systems falter in the open-world environment of Pokémon, where continuous reasoning and memory are crucial. Anthropic's Claude, even in its advanced Opus 4.5 version, has been unable to consistently navigate the game, often making basic errors and getting stuck for extended periods. In contrast, Google's Gemini 2.5 Pro successfully completed a challenging Pokémon game, aided by a robust toolset that compensates for its visual and reasoning limitations. The Pokémon challenge underscores the broader difficulties AI faces in tasks requiring sustained focus and adaptability, contrasting with its success in specialized domains like chess and Go. This ongoing struggle serves as a benchmark for evaluating AI's progress toward achieving general artificial intelligence.