Inference-Time Reward Model — Glossary — ThinkLLM