Training method that aligns models with human preferences by directly optimizing the difference between preferred and dispreferred outputs.