KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Alireza Nadali, Patrick Cooper, Ashutosh Trivedi, Alvaro Velasquez|May 12, 2026arXiv

Key Takeaway

You can extend transformer context length by simply reusing and accumulating the KV cache across chunks—no training needed, and the approach stays numerically stable even across very long sequences.

Summary

KV-Fold enables long-context inference by treating the key-value cache as an accumulator that gets passed between sequence chunks. The model processes each chunk while attending to cached information from previous chunks, allowing it to handle contexts up to 128K tokens without retraining or architectural changes.

efficiency

Key Terms

kv-cache long-context-inference chunk-based-processing numerical-stability