Reinforcement Learning from Human Feedback — Glossary — ThinkLLM