LLMs become reliable enough for mathematics research when their outputs are verified by formal proof checkers—this hybrid approach solved previously open problems at a practical cost, showing a path beyond LLM hallucination.
Researchers used large language models to automatically generate formal proofs in Lean, a proof verification language, to solve open mathematical problems. Their AI agent successfully proved 9 open Erdős problems and 44 OEIS conjectures, demonstrating that LLMs can contribute to real mathematical research when paired with formal verification systems that catch errors.