The process by which a reinforcement learning agent's decision-making strategy stabilizes toward optimal behavior.