You can now use principled Q-learning algorithms for risk-sensitive decision-making (exponential utility), with mathematical guarantees that they find optimal policies—previously this lacked solid theoretical foundations.
This paper develops reinforcement learning algorithms for optimizing exponential utility in decision-making problems, which is important for risk-sensitive applications. The authors prove that their Q-learning-style algorithms converge to optimal policies and provide theoretical guarantees on convergence speed.