Decoupled reinforcement learning — Glossary — ThinkLLM