Proximal Policy Optimization (PPO) — Glossary — ThinkLLM