By reversing the rubric generation process—building evaluation criteria from evidence first, then creating aligned questions—you can train research agents more efficiently with more reliable reward signals for reinforcement learning.
DeepRubric is a framework that creates high-quality training data for teaching AI research agents to write better reports. Instead of asking an AI to guess what makes a good report for a given question, it works backwards: it first decides what a report should be evaluated on, then creates matching question-evaluation pairs. This approach trains better agents 13x faster than previous methods.