You can get better reasoning from existing language models by smartly resampling from high-uncertainty decision points rather than random positions—no retraining needed.
This paper shows how to efficiently sample better reasoning from language models without extra training. Instead of randomly restarting reasoning at any point, the method identifies key decision moments (like choosing a proof strategy) using the model's uncertainty, then restarts from those points. This makes sampling much faster while producing better answers on math and coding tasks.