On-Policy Learning — Glossary — ThinkLLM