ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Shelly Golan, Michael Finkelson, Ariel Bereslavsky, Yotam Nitzan, Or Patashnik|April 22, 2026arXiv

Key Takeaway

You can now train one diffusion model that handles multiple conflicting goals and let users choose their preferred trade-off at inference time, rather than training separate models or picking a single compromise upfront.

Summary

ParetoSlider trains a single diffusion model to handle multiple competing objectives simultaneously, letting users control trade-offs at inference time. Instead of committing to one fixed balance between goals (like image quality vs. prompt accuracy), the model learns the entire range of optimal solutions and accepts a preference weight as input to pick any point along that spectrum.

training alignment applications

Key Terms

pareto-frontier multi-objective-reinforcement-learning preference-conditioning flow-matching early-scalarization