Policy Gradient — Glossary — ThinkLLM