AI chatbots excel at retrieving and synthesizing recent news but have three critical weaknesses: they systematically underperform on non-English content, fail primarily due to retrieval errors rather than reasoning mistakes, and are easily fooled by questions containing subtle false information.
This study evaluates six major AI chatbots (Gemini, Grok, Claude, GPT models) on their ability to answer factual news questions across six languages and regions.