Think
LLM
Models
Capabilities
Use Cases
Benchmarks
Papers
Glossary
Search
/
Glossary
/
Policy Optimization
Policy Optimization
techniques
Training an LLM to maximize expected rewards using reinforcement learning techniques.
Policy Optimization — Glossary — ThinkLLM