Group Relative Policy Optimization, a reinforcement learning algorithm for fine-tuning language models with reward signals.