DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, Zhumin Chen et al.|June 15, 2026arXiv

Key Takeaway

By reversing the rubric generation process—building evaluation criteria from evidence first, then creating aligned questions—you can train research agents more efficiently with more reliable reward signals for reinforcement learning.

Summary

DeepRubric is a framework that creates high-quality training data for teaching AI research agents to write better reports. Instead of asking an AI to guess what makes a good report for a given question, it works backwards: it first decides what a report should be evaluated on, then creates matching question-evaluation pairs. This approach trains better agents 13x faster than previous methods.

training reasoning evaluation

Key Terms

rubric-generation evidence-tree grpo deep-research-agent rubric-based-rewards