A reward function whose gradients can be computed, allowing optimization of model outputs toward desired properties.