Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas, Eser Aygün et al.|June 2, 2026arXiv

Key Takeaway

Diversity in RL emerges naturally when you acknowledge reward uncertainty—agents rationally explore multiple behaviors when they're unsure what the true reward function is, eliminating the need for hacky diversity bonuses.

Summary

This paper proposes a new way to train RL agents that naturally produces diverse behaviors by treating the reward function as uncertain rather than fixed. Instead of maximizing a single reward, the method optimizes over distributions of possible rewards, causing agents to explore multiple strategies without sacrificing performance.

reasoning

Key Terms

reward-uncertainty behavioral-diversity entropy-regularization contextual-bandit policy-gradient