Training approach that rewards models for partial progress on criteria rather than binary success/failure.