Evaluating Commercial AI Chatbots as News Intermediaries

Mirac Suzgun, Emily Shen, Federico Bianchi, Alexander Spangher, Thomas Icard et al.|May 21, 2026arXiv

Key Takeaway

AI chatbots excel at retrieving and synthesizing recent news but have three critical weaknesses: they systematically underperform on non-English content, fail primarily due to retrieval errors rather than reasoning mistakes, and are easily fooled by questions containing subtle false information.

Summary

This study evaluates six major AI chatbots (Gemini, Grok, Claude, GPT models) on their ability to answer factual news questions across six languages and regions.

evaluation multimodal data

Key Terms

retrieval-augmented-generation false-premise-detection multilingual-bias retrieval-bias