Genetic algorithms can effectively attack NLP models with only output logits, achieving high success rates by intelligently searching for semantically similar word substitutions—showing that black-box adversarial attacks on language models are more powerful than previously demonstrated.
This paper presents GAversary, a genetic algorithm that generates adversarial text attacks on NLP models by treating them as black boxes. It uses GloVe embeddings to find semantically similar word replacements that fool classifiers, achieving stronger attacks than existing methods like BAE and A2T, though at the cost of modifying more words.