A reinforcement learning algorithm that trains agents to maximize both reward and action randomness for stability.