Reward signals based on computational verification methods rather than the model's own internal signals.