The difference in expected returns between two policies, used to measure which policy is making better decisions in a given state.