Teacher forcing trains RNNs on chaotic systems differently than the model will actually be used—this mismatch can make models fit data well statistically while performing poorly at predicting actual dynamics, a problem that becomes worse when multiple explanations exist for the data.
This paper reveals a fundamental mismatch between how teacher forcing (a common training technique) and marginal likelihood (the true objective) shape neural network optimization for chaotic systems.