A training method that learns from pairwise comparisons between solutions rather than explicit reward signals.