Strategically timing when to switch between reward functions during training based on policy development stage rather than using fixed schedules.