AI-text detectors need feature augmentation and careful threshold calibration to work reliably across different domains and generators; linguistic features like readability are crucial for robustness under distribution shift.
This paper tackles the challenge of detecting AI-generated text across different domains and AI models. Researchers trained transformer-based detectors and found that while they perform nearly perfectly on their training data, they struggle when tested on new domains or text from different AI generators.