Co-designing queries and rubrics together—rather than optimizing rubrics alone—solves a key bottleneck in rubric-based RL: vague queries lead to unusable rubrics, but overly narrow queries create unverifiable references that block learning.
QUBRIC co-designs queries and rubrics to enable reinforcement learning on tasks without verifiable rewards. The method transforms open-ended questions into scenario-based queries grounded in teacher insights, generates contrastive rubrics, and filters for learnability. It achieves +5.5 point gains on ArenaHard and transfers to legal, moral, and narrative reasoning tasks.