Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Gugan Thoppe, L. A. Prashanth, Ankur Naskar, Sanjay Bhat|May 8, 2026arXiv

Key Takeaway

You can now use principled Q-learning algorithms for risk-sensitive decision-making (exponential utility), with mathematical guarantees that they find optimal policies—previously this lacked solid theoretical foundations.

Summary

This paper develops reinforcement learning algorithms for optimizing exponential utility in decision-making problems, which is important for risk-sensitive applications. The authors prove that their Q-learning-style algorithms converge to optimal policies and provide theoretical guarantees on convergence speed.

reasoning

Key Terms

markov-decision-process q-learning convergence risk-aversion bellman-operator