Reasoning with Sampling: Cutting at Decision Points

Felix Zhou, Anay Mehrotra, Quanquan C. Liu|May 28, 2026arXiv

Key Takeaway

You can get better reasoning from existing language models by smartly resampling from high-uncertainty decision points rather than random positions—no retraining needed.

Summary

This paper shows how to efficiently sample better reasoning from language models without extra training. Instead of randomly restarting reasoning at any point, the method identifies key decision moments (like choosing a proof strategy) using the model's uncertainty, then restarts from those points. This makes sampling much faster while producing better answers on math and coding tasks.

reasoning training

Key Terms

power-distribution entropy-cut-metropolis-hastings mixing-time token-entropy