LLMs have a fundamental capacity limit based on signal-to-noise ratio: scaling parameters or data without maintaining sufficient signal clarity causes performance degradation, explaining phenomena like catastrophic overtraining and quantization failures that standard scaling laws can't capture.
This paper explains why large language models sometimes get worse with more training or smaller precision—not just better. Using information theory, the authors model LLM training like sending signals through a noisy channel. When you scale up the model or data without keeping the signal clear relative to noise, performance actually drops in a U-shape.