Using detailed, instruction-specific rubrics to score model outputs significantly improves preference-based training for vision tasks—achieving 82.69% on benchmarks versus 75.82% with simpler outcome-based scoring.
This paper introduces rDPO, a method for improving visual AI models by using detailed rubrics (checklists of criteria) to evaluate and rank image responses. Instead of simple yes/no judgments, the approach creates specific evaluation criteria for each image-instruction pair, which helps the model learn finer distinctions in visual reasoning tasks.