Synthetic QA generation for model training has hidden failure modes: biased coverage of documents and susceptibility to instruction injection. Simple fixes like anchoring questions to specific targets and filtering instruction-like text can substantially reduce these problems.
This paper reveals that using synthetic question-answer pairs to train language models is riskier than assumed. Models generating QA pairs don't uniformly cover documents—they focus on salient regions and can be hijacked by artifacts like markup.