Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Yıldırım|May 6, 2026arXiv

Key Takeaway

Transformers for time series don't rely on superposition like they do in language tasks, meaning time series forecasting may not require the compositional complexity that makes Transformers powerful for NLP.

Summary

This paper investigates how Transformers work internally for time series forecasting by analyzing their hidden representations using sparse autoencoders. The key finding: Transformers don't need complex, overlapping feature representations (superposition) to forecast well—their representations stay sparse and simple, which explains why basic linear models remain competitive.

reasoning evaluation

Key Terms

sparse-autoencoder superposition mechanistic-interpretability causal-intervention