Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang, Yihao Liu et al.|June 2, 2026arXiv

Key Takeaway

Reward models work better when they treat evaluation as a flexible agent task that can dynamically choose which evaluation methods to use, rather than applying fixed criteria to all inputs.

Summary

This paper introduces Skill-RM, a unified reward model framework that treats reward evaluation as an agentic task. Instead of relying on separate evaluation methods (rule-based checks, reference comparisons, checklists, rubrics), Skill-RM dynamically selects and combines different types of evidence based on what each input needs, providing consistent feedback signals for training language models.

agents training

Key Terms

reward-model reinforcement-learning-from-human-feedback evidence-aggregation agentic-task