You can get the benefits of expensive test-time sampling in RL by learning to filter action candidates early in the generation process, reducing compute without sacrificing performance.
FASTER is a method that speeds up reinforcement learning by filtering action candidates during the denoising process of diffusion-based policies, rather than waiting until denoising completes. It models this filtering as a decision problem with a learned value function, achieving the same performance as expensive sampling methods while cutting computational costs significantly.