multimodal agents news - NUS Team Launches GameWorld Benchmark to Evaluate Multimodal AI Agents in 34 Browser Games
ME News reports that on April 17 (UTC+8), according to monitoring by Beating, a team from the National University of Singapore (NUS) has released GameWorld, a benchmark designed to standardize the evaluation of multimodal large language models (MLLMs) as general agents in video games. The study notes that while video games offer an ideal closed-loop interactive testbed, existing evaluations are often limited by inconsistent input interfaces and manual heuristic verification. GameWorld includes 34 diverse browser games and 170 tasks, each equipped with verifiable metrics based on the game’s underlying state to enable objective outcome assessment. The research team tested two agent interfaces: a “computer-use” agent that directly outputs keyboard and mouse commands, and a general multimodal agent that operates via semantic parsing within a semantic action space. In a large-scale evaluation of 18 model-interface combinations, results showed that even the best-performing current AI agents remain far below human-level gaming ability. The study further reveals significant challenges for game agents in real-time interaction latency, sensitivity to contextual memory, and action effectiveness. The related paper and project code have been publicly released on Hugging Face and GitHub. (Source: BlockBeats)Source:Show originalDisclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.

This detailed match analysis covers key moments, player performances, and tactical insights.