A measure of how reward quality and model confidence vary together, used to adjust training baselines.