Generalization at the Edge of Stability

Mario Tuci, Caner Korkmaz, Umut Şimşekli, Tolga Birdal|April 21, 2026arXiv

Key Takeaway

Training at the edge of stability (where optimization becomes chaotic) generalizes better because the optimizer converges to a lower-dimensional fractal attractor, and you can predict generalization by measuring the complete structure of the loss landscape's curvature, not just simple summaries.

Summary

This paper explains why training neural networks with large learning rates—which causes chaotic, oscillatory behavior—actually improves generalization. The authors model optimizers as random dynamical systems that converge to fractal attractors and introduce 'sharpness dimension' to measure generalization.

training scaling reasoning

Key Terms

lyapunov-exponent fractal-attractor hessian-spectrum sharpness-dimension