Training Reward Saturation — Glossary — ThinkLLM