Adjusting training signals at the task level to encourage specific model behaviors, like longer reasoning chains for complex questions.