Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

Hamid Reza Firoozfar, Mohammadsadegh Abolhasani, Reza Mousavi, Paul Jen-Hwa Hu|June 25, 2026arXiv

Key Takeaway

A mechanism-oriented taxonomy of how language encodes hidden meaning is more effective for LLM-based content moderation than taxonomies based on communicative intent or surface forms.

Summary

This paper creates a taxonomy of indirect linguistic expressions (coded language like algospeak and euphemisms) that people use to evade content moderation. Rather than categorizing by intent, the taxonomy focuses on the underlying encoding mechanisms—how meaning is hidden and recovered.

safety evaluation

Key Terms

content-filter coded-language taxonomy prompt-engineering