A training technique that teaches a model to prefer certain outputs over others by learning from examples of better and worse responses.