Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai|May 6, 2026arXiv

Key Takeaway

Transformer attention can act as a feature learner for nonlinear functions during in-context learning, and this capability can be theoretically analyzed with concrete error bounds—bridging the gap between empirical success and mathematical understanding.

Summary

This paper explains how transformers perform in-context learning for nonlinear regression tasks. The researchers show that transformer attention mechanisms can automatically create nonlinear features (like polynomials or splines) from examples in the prompt, enabling the model to solve complex regression problems without updating weights.

reasoning architecture evaluation

Key Terms

in-context-learning attention-mechanism nonlinear-regression generalization-error basis-functions