From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

Bangzhao Shu, Arinjay Singh, Mai ElSherief|April 28, 2026arXiv

Key Takeaway

Emotion recognition in LLMs follows a predictable three-phase pattern, and you can improve emotion detection by identifying and amplifying the small set of internal features that drive emotion predictions—without retraining the model.

Summary

This paper reveals how large language models internally process emotions by analyzing their neural activations using sparse autoencoders. The researchers discover that emotion recognition happens in three distinct phases, with emotion-specific features emerging late in the network.

alignment applications

Key Terms

sparse-autoencoder activation-patching steering-vector mechanistic-interpretability