RP-Regret is a game-theoretic regret metric designed for repeated games with adaptive opponents that enables finding better equilibria than standard regret minimization approaches, with provable algorithms for non-convex optimization.
This paper introduces Repeated Policy Regret (RP-Regret), a new way to measure how well a player performs in repeated games against opponents who adapt to past moves. Unlike standard regret metrics, RP-Regret accounts for what players could have achieved if they'd responded differently to the game history.