Training examples where evidence is semantically related but contradicts the claim, testing if models truly use evidence.