Flatness matters for generalization, but only when measured using Fisher geometry—standard Euclidean measures are misleading because they change when you reparametrize the network while keeping its function identical.
This paper solves a long-standing problem in deep learning: why flat minima generalize better. The authors show that standard flatness measures fail because they change under reparametrization, but by using the Fisher Information Matrix geometry, they define a reparametrization-invariant flatness measure that provably explains SGD's bias toward flat minima and their generalization.