Adapting steering strength dynamically per context significantly improves LLM control compared to fixed steering, matching more complex methods like LoRA while remaining simpler and more interpretable.
This paper improves linear activation steering—a technique for controlling LLM behavior—by making the steering strength adapt to each input context instead of using a fixed strength for all tokens. The method, called CLAS, works better than existing approaches across multiple benchmarks and models, offering a practical way to customize LLMs with limited training data.