Reward signals generated from the model's own internal signals, like confidence scores, rather than external verification.