Reward models work better when they treat evaluation as a flexible agent task that can dynamically choose which evaluation methods to use, rather than applying fixed criteria to all inputs.
This paper introduces Skill-RM, a unified reward model framework that treats reward evaluation as an agentic task. Instead of relying on separate evaluation methods (rule-based checks, reference comparisons, checklists, rubrics), Skill-RM dynamically selects and combines different types of evidence based on what each input needs, providing consistent feedback signals for training language models.