AI could radically change how math proofs are verified

Kevin Buzzard doesn’t sound like someone chasing a victory lap. He’s training computers to prove—really prove—one of math’s most famous problems, Fermat’s last theorem.
The irony is that resolving it isn’t the point anymore. There’s already an accepted proof finalized in 1998, a sprawling, tangle-like construction spread across about 130 pages over two papers. It’s the kind of result where, to truly “know” it, you end up seeing a wide swath of mathematics you might not otherwise touch. Buzzard’s bet is that if a computer can verify something that large and interconnected, then it can become a tool for checking, scrutinizing, and—eventually—helping create new proofs.
In recent years, the idea has moved from niche to urgent. For years, Buzzard and others have worked on formalizing mathematics: rewriting definitions and theorems into precise computer code so a specialized program can verify each step. It’s not just digitizing for convenience, either. Formalization “is a new paradigm … that essentially demands the proof writer be way more rigorous than usual,” as mathematician Emily Riehl puts it. The computer isn’t casually “filling in” details; the person writing the proof has to do that part—and they have to do it in a language the machine can actually check.
And yes, there’s a messier storyline underneath the goal of correctness. Buzzard began learning Lean, an interactive theorem prover and programming language designed for precisely this kind of verification. Lean first appeared in 2013, built by Leo de Moura, and it’s also been used to formalize Viazovska’s work. Buzzard describes his own entry as a kind of midlife crisis, sparked after a long exchange reviewing a paper left him unable to tell if the argument was truly rigorous. That frustration quickly turned into a broader question: could technology take the guesswork out of verifying math—so mathematicians can focus on doing something new rather than constantly peering under the hood of someone else’s reasoning?
If this sounds abstract, the project’s human scale is what makes it feel real. Buzzard’s formalization effort for Fermat’s last theorem launched in 2023, with support from the U.K.’s Engineering and Physical Sciences Research Council, and quickly grew beyond what he expected. At first, about 30 people contributed by writing code for Lean; just over 60 have had coded contributions verified and accepted. Numbers theorists he describes as anonymous have also reached out. Last August, he says, he went camping at a music festival for a week—then came back to find 7,000 unread messages about aspects of the proof. The sound of that many notifications, I’m guessing, would be hard to ignore.
The work itself is slow in the way only deep mathematics can be slow. Buzzard says the effort is rocky, with failed starts and moments where the team is “all over the place.” Even so, it’s already produced milestones: in January, Misryoum newsroom reported the team “proved that a certain thing was finite,” setting up a next step. Still, the milestone also raises doubts about whether the project’s targeted timeline of five years is realistic.
There’s also a looming question now, bigger than Fermat. The AI boom has propelled attempts to combine large language models with theorem provers in hopes of autoformalization. Some systems may, in theory, do more than humans. For mathematicians, that prospect is divisive—and it shows up in the classroom. Christian Szegedy argues humans might spend less time on the mechanical solving and more on steering exploration, essentially trusting AI to prove the tedious lemmas. Patrick Shafto, tied to DARPA’s expMath work, expects that in a few years many young mathematicians will use AI as part of their routine. But others are uneasy, saying accuracy can’t be the only story, because making mistakes is part of learning—and because framing math as something AI “solves” could change funding, prestige, and teaching itself.
Misryoum editorial desk notes a key tension: for proof verification, “mostly right” isn’t right enough. Large language models can generate fluent text that sounds correct, but they’re not designed to guarantee correctness. Pairing them with Lean changes the game, but skeptics like Buzzard still worry about guardrails—whether machine-generated code actually captures the theorem the human intended. In his view, the future could still bring tools that eat the literature, but the path there won’t be automatic.
Buzzard’s motivation stays simple. He wants an interactive theorem prover with a robust library of verified mathematics—something that can check work, help separate sloppy code from real advances, and make colleagues’ lives easier. “I just want to make my colleagues’ lives better,” he says. He’s not trying to destroy them. If anything, he’s trying to help them keep doing the one thing machines, for now, don’t quite own: pushing into new territory—while the details, at last, can be trusted.
Face-swapping illusion boosts access to childhood memories
Artemis II crews returns after nearly 700,000-mile lunar loop
“Weird blob” fossil once labeled the oldest octopus: reclassified