Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko|April 21, 2026arXiv

Key Takeaway

Entropy regularization makes planning problems mathematically smoother, enabling algorithms with provable efficiency guarantees that don't exist for standard reinforcement learning.

Summary

SmoothCruiser is a planning algorithm that efficiently estimates value functions in entropy-regularized decision-making problems. By leveraging the smoothness that entropy regularization provides, it achieves polynomial sample complexity guarantees—a significant improvement over non-regularized approaches where no such guarantees exist.

reasoning

Key Terms

markov-decision-process entropy-gradient bellman-operator sample-complexity