AI-text detection isn't just about how much AI content is present—it depends on what edits were made, the domain, and revision history. Mixed-authorship documents can be harder to detect than fully AI-generated ones, exposing blind spots in current detection methods.
This paper introduces OpAI-Bench, a benchmark for detecting AI-generated text in documents that have been progressively edited by both humans and AI. Unlike existing benchmarks that only look at final outputs, OpAI-Bench tracks how AI authorship signals change across multiple revision stages, edit types, and document granularities (document, sentence, token, and span levels).