By retrieving learned reasoning skills at inference time instead of reasoning from scratch, you can reduce token usage and improve accuracy—making LLM reasoning cheaper and faster for practical deployment.
This paper proposes storing reusable reasoning skills learned from past problem-solving attempts, then retrieving and applying them during inference to guide new reasoning. Instead of reasoning from scratch each time, the model recalls relevant skills to avoid redundant work and reach solutions faster. Tests on coding and math tasks show it uses fewer tokens while improving accuracy.