The National University of Singapore (NUS) has launched GameWorld, a new benchmark aimed at standardizing the evaluation of multimodal large language models (MLLMs) as general agents in video games. GameWorld encompasses 34 browser games and 170 tasks, each with verifiable metrics to objectively assess outcomes. This initiative addresses the limitations of inconsistent input interfaces and manual verification in current evaluations. The NUS team tested two agent interfaces: a "computer-use" agent that outputs keyboard and mouse commands, and a general multimodal agent using semantic parsing. In a large-scale evaluation involving 18 model-interface combinations, results indicated that current AI agents still fall short of human-level gaming abilities. The study highlights challenges such as real-time interaction latency and sensitivity to contextual memory. The research paper and project code are available on Hugging Face and GitHub.