Emotion recognition in LLMs follows a predictable three-phase pattern, and you can improve emotion detection by identifying and amplifying the small set of internal features that drive emotion predictions—without retraining the model.
This paper reveals how large language models internally process emotions by analyzing their neural activations using sparse autoencoders. The researchers discover that emotion recognition happens in three distinct phases, with emotion-specific features emerging late in the network.