PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro et al.|April 7, 2026arXiv

Key Takeaway

You can replace attention with a linear-time polynomial mixer and get similar results with much faster inference—especially valuable for long sequences where attention becomes prohibitively expensive.

Summary

PoM replaces the expensive attention mechanism in transformers with a polynomial-based token mixer that runs in linear time instead of quadratic. It compresses all tokens into a learned polynomial representation, letting each token extract relevant context from this compact form.

efficiency architecture scaling

Key Terms

attention-mechanism token-mixing linear-complexity universal-approximation