Unlocking the Working Memory of Large Language Models for Latent Reasoning

Lukas Aichberger, Sepp Hochreiter|May 28, 2026arXiv

Key Takeaway

Language models can be trained to use internal memory blocks for reasoning instead of generating visible reasoning steps, achieving better compute efficiency without sacrificing reasoning performance.

Summary

This paper introduces Reasoning in Memory (RiM), a method that lets language models perform internal reasoning using fixed memory blocks instead of generating intermediate thoughts. Rather than writing out reasoning steps token-by-token, the model processes special token sequences in a single forward pass, making reasoning faster while maintaining quality on math and logic tasks.

reasoning efficiency training

Key Terms

working-memory latent-reasoning inference-time-compute curriculum-learning