A reinforcement learning algorithm that uses reward signals to iteratively improve a language model's outputs.