FASTER: Value-Guided Sampling for Fast RL

Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn|April 21, 2026arXiv

Key Takeaway

You can get the benefits of expensive test-time sampling in RL by learning to filter action candidates early in the generation process, reducing compute without sacrificing performance.

Summary

FASTER is a method that speeds up reinforcement learning by filtering action candidates during the denoising process of diffusion-based policies, rather than waiting until denoising completes. It models this filtering as a decision problem with a learned value function, achieving the same performance as expensive sampling methods while cutting computational costs significantly.

efficiency reasoning training

Key Terms

diffusion-process test-time-scaling value-function markov-decision-process