Training at the edge of stability (where optimization becomes chaotic) generalizes better because the optimizer converges to a lower-dimensional fractal attractor, and you can predict generalization by measuring the complete structure of the loss landscape's curvature, not just simple summaries.
This paper explains why training neural networks with large learning rates—which causes chaotic, oscillatory behavior—actually improves generalization. The authors model optimizers as random dynamical systems that converge to fractal attractors and introduce 'sharpness dimension' to measure generalization.