Using reference solutions to construct problem-specific reward signals that evaluate intermediate reasoning steps, not just final answers.