You can align LLMs for safety without the usual trade-off in general capabilities by targeting safety training to specific tokens rather than retraining globally, and this works with minimal data.
SafeSteer is a method that makes LLMs safer without hurting their general abilities by focusing safety training only on the specific tokens that matter for safety decisions.