Policy Optimization — Glossary — ThinkLLM