Stepwise Credit Assignment for GRPO on Flow-Matching Models

Yash Savani, Branislav Kveton, Yuchen Liu, Yilin Wang, Jing Shi et al.|March 30, 2026arXiv

Key Takeaway

Stepwise credit assignment—rewarding each diffusion step for its own improvement rather than the final result—makes RL training of image generators more efficient and faster to converge.

Summary

This paper improves reinforcement learning for image generation models by assigning credit more intelligently across diffusion steps. Instead of treating all steps equally, it recognizes that early steps handle composition while late steps refine details, then rewards each step based on its specific contribution. This leads to faster learning and better sample efficiency.

training reasoning efficiency

Key Terms

grpo flow-matching credit-assignment tweedie-formula policy-gradient