Small models already generate the right answers in their candidate predictions—they just rank them poorly. Training them to re-rank their own outputs improves reasoning without external model calls.
Small language models struggle with reasoning tasks compared to large models. This paper discovers that when small models fail, the correct token from a large model is usually hidden in the small model's top-8 predictions.