Language models can be trained to use internal memory blocks for reasoning instead of generating visible reasoning steps, achieving better compute efficiency without sacrificing reasoning performance.
This paper introduces Reasoning in Memory (RiM), a method that lets language models perform internal reasoning using fixed memory blocks instead of generating intermediate thoughts. Rather than writing out reasoning steps token-by-token, the model processes special token sequences in a single forward pass, making reasoning faster while maintaining quality on math and logic tasks.