AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA

Tasnim Kabir, Dmytro Kurdydyk, Aadi Palnitkar, Liam Dorn, Ahmed Haj Ahmed et al.|April 23, 2026arXiv

Key Takeaway

Current audio AI models fail dramatically on genuine audio understanding tasks—they likely exploit dataset biases and metadata rather than actually listening to and reasoning about sound.

Summary

AUDITA is a new benchmark dataset with real-world audio and human-authored trivia questions designed to test whether AI models can truly understand audio content rather than relying on shortcuts. Humans answer correctly 32% of the time, but state-of-the-art models score below 9%, revealing a significant gap in audio reasoning capabilities.

evaluation multimodal data

Key Terms

audio-visual-understanding shortcut-learning item-response-theory long-context-handling