Evaluating reward quality relative to the current policy's skill level, recognizing that reward rankings change as the policy improves.