LLM citations are unreliable at scale, but the problem is measurable and fixable: models equipped with URL-checking tools can reduce hallucinated citations from 5-18% to under 1% through self-correction.
This paper reveals that 3-13% of citation URLs provided by LLMs and research agents are completely fabricated (hallucinated), while another 5-18% don't work. The authors measure this across 10+ models and 200k+ URLs, then release urlhealth—a tool that checks if URLs are real using the Wayback Machine and helps models self-correct, reducing broken citations by up to 79x.