Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

Andre Herz, Daniel Durstewitz, Georgia Koppe|April 28, 2026arXiv

Key Takeaway

Teacher forcing trains RNNs on chaotic systems differently than the model will actually be used—this mismatch can make models fit data well statistically while performing poorly at predicting actual dynamics, a problem that becomes worse when multiple explanations exist for the data.

Summary

This paper reveals a fundamental mismatch between how teacher forcing (a common training technique) and marginal likelihood (the true objective) shape neural network optimization for chaotic systems.

training reasoning

Key Terms

teacher-forcing chaotic-dynamics marginal-likelihood information-geometry dynamical-systems-reconstruction