A learned policy that generates corrective actions to move a system toward a specified target state or goal.