You can catch and fix LLM reasoning errors at inference time by monitoring internal layer activations for phase shifts, then steering the model back on track—no retraining needed, and it's 5× cheaper than sampling multiple outputs.
This paper introduces a method to fix reasoning errors in language models during generation by monitoring internal signals and rolling back to correct course. Instead of retraining, it detects when a model makes a wrong turn by watching for sudden directional shifts in its internal computations, then resets the model's memory and injects a corrective signal.