Training-Free Looped Transformers

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu|May 22, 2026arXiv

Key Takeaway

You can boost performance of frozen models by intelligently looping internal layers at inference time—no retraining needed, just a smarter application strategy based on ODE theory.

Summary

This paper shows how to improve pretrained transformer models at test time by looping a middle section of layers without retraining. The key insight is treating layer loops as smaller refinement steps rather than naive repetition, inspired by numerical methods for solving differential equations.

efficiency

Key Terms

looped-transformer forward-euler-step damped-sub-steps inference-time-wrapper