OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang et al.|June 8, 2026arXiv

Key Takeaway

Game benchmarks should measure agent improvement over time through iterative refinement, not just first-attempt performance—this reveals which VLMs can learn and adapt in interactive environments.

Summary

OmniGameArena is a benchmark for testing vision-language model agents in 12 Unreal Engine 5 games across different play modes (solo, competitive, cooperative). It introduces Improvement Dynamics Curve, which measures how agents improve when given multiple chances to refine their strategies through self-reflection, revealing performance evolution beyond single-attempt scores.

evaluation agents multimodal

Key Terms

vision-language-model agentic-reflection skill-prompt tool-use