From Weights to Activations: Is Steering the Next Frontier of Adaptation?

Simon Ostermann, Daniil Gurgurov, Tanja Baeumel, Michael A. Hedderich, Sebastian Lapuschkin et al.|April 15, 2026arXiv

Key Takeaway

Steering (modifying activations at inference time) is a fundamentally different adaptation approach from weight updates or prompting—it's reversible, local, and doesn't require retraining, making it a practical alternative for customizing model behavior.

Summary

This paper argues that steering—modifying a model's internal activations at inference time—should be understood as a distinct form of model adaptation, comparable to fine-tuning and prompting. The authors develop criteria to compare steering with classical adaptation methods and propose a unified taxonomy showing how steering enables local, reversible behavior changes without updating weights.

training alignment

Key Terms

activation-steering parameter-efficient-fine-tuning model-adaptation