How well a supervision signal's scores order actions according to the true Q-values from a reference policy.