Google DeepMind published a paper on May 24 describing an LLM-Lean agent loop that autonomously resolved 9 of 353 open Erdős problems — some unsolved for over 50 years — and proved 44 of 492 open OEIS conjectures, at a cost of a few hundred dollars per problem. The system, detailed in [2605.22763] Advancing Mathematics Research with AI-Driven Formal Proof Search, pairs an LLM for proof generation with the Lean formal verification compiler, which mechanically checks every logical step. Human review only occurs after Lean confirms correctness. The agent also resolved a 15-year-old question in algebraic geometry and discovered a novel algorithmic parameter in optimization theory.
A key finding from the paper: a simple agent alternating LLM generation with compiler feedback replicated all 9 Erdős successes — the full system with evolutionary search and reinforcement learning only outperformed it on the hardest problems, suggesting that improving base models may matter more than architectural complexity. Failure analysis revealed the AI frequently hallucinated lemmas and disguised core difficulties by rephrasing them as helper lemmas; Lean’s compiler caught these errors automatically where informal proof approaches would not. Successes currently cluster in combinatorics, number theory, and optimization where Lean’s math library (Mathlib) is most mature; problems requiring substantial new theory remain out of reach.
Gary Marcus noted the approach is neurosymbolic and called it “more careful and quantitative” than OpenAI’s informal proof work, raising the question of whether OpenAI rushed its own announcement knowing DeepMind’s result was coming. Nate Soares observed that mainstream media largely ignored the story despite its scale, attributing it to societal “sleepwalking” toward transformative AI. The post drew 2.6 million views in 24 hours.