Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Andrea Agazzi, Giuseppe Bruno, Eloy Mosig García, Samuele Saviozzi, Marco Romito|April 29, 2026arXiv

Key Takeaway

Noise in transformers can synchronize token behavior and stabilize learning—a counterintuitive finding that suggests randomness plays a constructive role in how these models process sequences.

Summary

This paper proves that transformer models with finite depth and width converge to a stochastic particle system as they scale. The researchers show that token evolution follows a continuous-time process with noise-driven synchronization, meaning random perturbations actually help tokens align rather than diverge.

scaling architecture training

Key Terms

transformer-architecture stochastic-dynamics propagation-of-chaos self-attention