Improving model outputs by defining a reward function that scores quality and using it to guide learning toward better solutions.