Reward-guided Generation — Glossary — ThinkLLM