Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs — ThinkLLM