Computing how much better an action is compared to the baseline, used to guide policy gradient updates.