Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta, Dhruv Kumar|April 20, 2026arXiv

Key Takeaway

You can catch and fix LLM reasoning errors at inference time by monitoring internal layer activations for phase shifts, then steering the model back on track—no retraining needed, and it's 5× cheaper than sampling multiple outputs.

Summary

This paper introduces a method to fix reasoning errors in language models during generation by monitoring internal signals and rolling back to correct course. Instead of retraining, it detects when a model makes a wrong turn by watching for sudden directional shifts in its internal computations, then resets the model's memory and injects a corrective signal.

reasoning efficiency

Key Terms

residual-stream kv-cache steering-vector phase-shift inference-time-compute