Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta|June 18, 2026arXiv

Key Takeaway

Flatness matters for generalization, but only when measured using Fisher geometry—standard Euclidean measures are misleading because they change when you reparametrize the network while keeping its function identical.

Summary

This paper solves a long-standing problem in deep learning: why flat minima generalize better. The authors show that standard flatness measures fail because they change under reparametrization, but by using the Fisher Information Matrix geometry, they define a reparametrization-invariant flatness measure that provably explains SGD's bias toward flat minima and their generalization.

training evaluation reasoning

Key Terms

fisher-information-matrix reparametrization-invariance pac-bayes stochastic-differential-equation