A training technique that aligns a model's outputs with human preferences by combining supervised fine-tuning and preference learning in a single efficient training stage.
Adhering to complex, structured, or constrained instructions