AI and crypto news broke on April 17 as the NUS team launched GameWorld, a benchmark for testing multimodal AI agents across 34 browser games. The benchmark includes 170 tasks with verifiable metrics. Two agent interfaces were tested, revealing that AI still lags behind humans in gaming. On-chain developments and AI advancements continue to shape the tech landscape. The project is available on Hugging Face and GitHub.

ME News reports that on April 17 (UTC+8), according to monitoring by Beating, a team from the National University of Singapore (NUS) has released GameWorld, a benchmark designed to standardize the evaluation of multimodal large language models (MLLMs) as general agents in video games. The study notes that while video games offer an ideal closed-loop interactive testbed, existing evaluations are often limited by inconsistent input interfaces and manual heuristic verification. GameWorld includes 34 diverse browser games and 170 tasks, each equipped with verifiable metrics based on the game’s underlying state to enable objective outcome assessment. The research team tested two agent interfaces: a “computer-use” agent that directly outputs keyboard and mouse commands, and a general multimodal agent that operates via semantic parsing within a semantic action space. In a large-scale evaluation of 18 model-interface combinations, results showed that even the best-performing current AI agents remain far below human-level gaming ability. The study further reveals significant challenges for game agents in real-time interaction latency, sensitivity to contextual memory, and action effectiveness. The related paper and project code have been publicly released on Hugging Face and GitHub. (Source: BlockBeats)