RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments

Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge|June 24, 2026arXiv

Key Takeaway

You can reverse-engineer an agent's decision logic from its behavior by combining observation with strategic experimentation—a technique that works for policy interpretability and opponent modeling in competitive settings.

Summary

RevengeBench is a benchmark for reconstructing hidden decision-making code from an agent's behavior in games. Researchers observe a hidden policy playing and can design custom opponents to probe its behavior, then submit executable code that mimics it.

reasoning evaluation agents

Key Terms

inverse-problem opponent-modeling behavioral-probe policy-interpretability action-distance-metric