A mechanism-oriented taxonomy of how language encodes hidden meaning is more effective for LLM-based content moderation than taxonomies based on communicative intent or surface forms.
This paper creates a taxonomy of indirect linguistic expressions (coded language like algospeak and euphemisms) that people use to evade content moderation. Rather than categorizing by intent, the taxonomy focuses on the underlying encoding mechanisms—how meaning is hidden and recovered.