Policy Gradient Theorem — Glossary — ThinkLLM