Generalization doesn't scale uniformly with width and data—the relationship changes dramatically across different regimes, with the data's spectral structure determining how performance improves.
This paper analyzes how neural networks generalize as both model size and training data scale together. Using a simplified quadratic network model with structured data, the researchers derive exact formulas showing that generalization error follows different power-law patterns depending on the ratio of parameters to samples, revealing distinct phases like interpolation onset.