Vulnerability of Natural Language Classifiers to Evolutionary Generated Adversarial Text

Manjinder Singh, Alexander E. I. Brownlee, Mohamed Elawady|June 25, 2026arXiv

Key Takeaway

Genetic algorithms can effectively attack NLP models with only output logits, achieving high success rates by intelligently searching for semantically similar word substitutions—showing that black-box adversarial attacks on language models are more powerful than previously demonstrated.

Summary

This paper presents GAversary, a genetic algorithm that generates adversarial text attacks on NLP models by treating them as black boxes. It uses GloVe embeddings to find semantically similar word replacements that fool classifiers, achieving stronger attacks than existing methods like BAE and A2T, though at the cost of modifying more words.

safety evaluation

Key Terms

adversarial-examples genetic-algorithm black-box-testing semantic-similarity