AI is cracking maths—mathematicians fear their jobs

3 10 minutes read

AI is cracking maths—mathematicians fear their jobs

AI is – In San Francisco, mathematicians and AI researchers meet with a shared sense of urgency: models once dismissed for being unreliable are now producing high-scoring results on elite contests, assisting real research papers, and even solving long-standing problem

When I stepped into a nondescript building in San Francisco for a hastily organised meeting. an unmarked pink door and a video doorbell were the only hints that anything profound was about to be discussed. Inside, the room felt charged with curiosity—then tightened with something closer to dread. The question hovering over everyone was simple and hard to ignore: if someone like me could produce mathematics at the press of a button. what would that mean for the people whose careers were built on doing the same work the slow way?.

Jacob Tsimerman, at the University of Toronto in Canada, helped organise the conference. “I think AI is going to come in a big way, and it will significantly revolutionise the field,” he said.

In the same weeks that excitement spread, concern did too. Jeremy Avigad at Carnegie Mellon University in Pennsylvania wrote in a recent essay that “we are running out of places to hide” and that “AI will soon be able to prove theorems better than we can.”

Not everyone sees the change as a threat. Terence Tao at the University of California. Los Angeles has said mathematics is shifting from an era of “proof scarcity” to “abundance. ” where many once-thorny problems could fall to AI. Instead of racing to be the first to find a proof. Tao argues. mathematicians might race to be the first to understand it.

The road from scepticism to upheaval has been surprisingly fast. In the past few years. AI moved from being an occasional curiosity to a tool that can make credible progress on tasks that previously required specialist intuition. Early attempts leaned on individually crafted neural networks aimed at specific problems. and even then they were difficult to apply across fields. Mathematicians largely stayed unimpressed when ChatGPT launched in 2022. especially when large language models such as GPT-3.5—powered the first version of OpenAI’s chatbot—struggled with basic arithmetic and “spouted confident nonsense” on research-level questions.

But as large language models scaled up and were trained on increasing amounts of mathematical data, results began to change.

One of the first major signals came from work tied to the International Mathematical Olympiad (IMO), an elite high-school test with six questions of difficult difficulty. Researchers expected it would take years—possibly a decade—for AI systems to score highly. They were wrong.

In July 2024. Google DeepMind announced that its AlphaProof AI system could solve four out of six questions from that year’s IMO—enough for a silver-level performance. AlphaProof also wasn’t a strict large language model and had been fine-tuned for IMO-style questions. such as geometry. leaving open how much further it could go.

Then, about a year later, Google and OpenAI announced gold-level performance, with OpenAI in particular using a less maths-focused model. The shift was enough to change how mathematicians talked about the possibility of machines proving theorems.

“People’s eyes really opened,” says Ravi Vakil at Stanford University in California.

The public impact arrived quickly. Tools once used for competitions began pushing into research-level work.

Thomas Bloom at the University of Manchester in the UK noticed the change in the last months of 2025. Bloom runs a website tracking progress on more than a thousand problems posed by the famous mathematician Paul Erdős. The problems are often easy to state but range widely in complexity, with many treated as signposts for mathematical progress.

Bloom began receiving comments on the site from people he didn’t recognise. At first, they were using GPT-5. Then, more recently released models appeared—leading to posts where AI assistance produced full-blown solutions. Bloom and his colleagues verified some of them as correct. These solutions took “non-trivial effort,” Bloom told me at the time.

“It’s incredible that AI is capable of that.”

Some of the work wasn’t coming from professional mathematicians either. Kevin Barreto. in his second year of an undergraduate mathematics degree at the University of Cambridge. has solved numerous Erdős problems using AI. frequently with his collaborator Liam Price. who has no maths degree or formal training.

That made me wonder what would happen if I tried myself.

Barreto told me the trick isn’t just asking a model to produce a proof. It’s also giving it a certain level of support—like “try your best” or “don’t give up.”

“You try to encourage the model,” he said. “You try to hint it into believing the problem is of an easier difficulty than it actually is.”

Even with the right coaxing, success wasn’t guaranteed. For some problems, Barreto said it can take numerous attempts.

“Coaxing the correct proof strategy out of it is essentially like trying to play the lottery,” he said.

I chose an unsolved Erdős problem: number 710. It concerns a list of requirements that must be satisfied by a set of numbers. with the goal being to find a set with the smallest difference between the lowest and highest numbers. The idea. as I came to think of it. is like picky hotel guests who demand a bath or a sea view. and you have to find the shortest block of rooms that satisfies everyone.

Wanting to use the most powerful model available, I asked OpenAI for access to ChatGPT 5.5 Pro. It normally costs $200 a month, but I was provided access for free for this article.

My prompt hinted that a solution was within reach, “it just takes a few clever tricks.”

By the time I stepped away from the machine and returned to the wider story, the sense of a “golden age” was colliding with another reality: research papers were arriving with proofs that looked increasingly like they had been accelerated by AI.

In January. Vakil and his colleagues uploaded a paper where they noted that “the proof of this result was obtained in conjunction with Google Gemini and related tools.” The result focused on a thorny problem about how sphere-like shapes can be linked to other mathematical objects called flag spaces. described as collections of nesting-doll-like objects. The work promises a bridge between topology. which concerns general properties of shapes. and algebraic geometry. which deals with precise shapes.

The task was hard partly because there are “a multitude of ways” the flag spaces and sphere-like shapes can correspond.

Vakil and his colleagues first gave a simpler version of what they wanted to prove to a custom AI model from Google DeepMind. The model found a mathematical structure they hadn’t previously seen. That made it clear to them how to generalise and write the entire argument. which they found turned out to be simpler than it initially seemed.

“There’s no way the AI could do it by itself because it wouldn’t know the [correct] question. We absolutely told it what to do,” Vakil said.

At the same time, he described a shortcut the AI provided.

“The paper might never have happened because we might never have had the time to get together and figure out the argument,” he said. “It’s more how things will happen. The future will be some combination of human and machine.”

The boundary between human direction and machine output is already blurring.

In the same month as Vakil’s paper. Tony Feng at the University of California. Berkeley. published a paper detailing how he used Google’s Aletheia AI to calculate a previously unknown collection of numbers vital for translating between algebraic geometry and number theory. Building such bridges is an important goal in the Langlands programme, often described as a grand unified theory of mathematics. Feng said the “core mathematical content” was generated entirely by Aletheia.

And then came the headline-grabbing moment that made many mathematicians feel briefly stunned.

In May, OpenAI announced that it had used an unreleased model to solve an 80-year-old maths conjecture called the planar unit distance problem. OpenAI did not provide full details about the model, aside from saying it was a general-purpose AI rather than one trained specifically to do mathematics.

The reaction among mathematicians was stunned disbelief.

“It opens up a world of possibility,” says Alex Kontorovich at Rutgers University in New Jersey. “I can imagine projects I could undertake this summer, things that I know would have taken me five years that I would never have even started.”

That “world of possibility” carries a darker undertow. The question quickly becomes whether those possibilities reach far enough to threaten the central work of professional mathematicians—proof-writing, problem-setting, and the kind of slow reasoning that can take decades.

Could tools like these eventually touch the Riemann hypothesis? It is a deep question about the origin of prime numbers and is one of the Millennium Prize Problems, described as among the greatest challenges in mathematics.

Several mathematicians working for AI companies told me they thought a problem of that scale could fall in the next several years. Others cautioned that such problems sit in a wildly different class of difficulty from those already solved.

In that atmosphere, the April San Francisco conference wasn’t just about celebration. It was an attempt to map futures before they arrived all at once.

The ostensible goal was to come up with a way to track AI’s mathematical progress and where it might be headed. Daniel Litt. another conference organiser at the University of Toronto. said he hoped to understand where the models were and where they were going in mathematical capability. “It’s clear that the models are, in some sense, missing some capabilities that mathematicians have,” he said.

Testing AI has often relied on benchmarks: collections of problems that typically require simple and easy-to-verify solutions, like a single number. That makes it easy to show progress as a clean rising line on a graph—something companies can present clearly.

But many mathematical tasks don’t reduce neatly. Proofs need interpretation by an expert.

Melanie Wood at Harvard University put it bluntly. “One big mistake that people make when they think about AI and math is to take the correlation of these skills in humans and think that it’s going to match some correlation in AI.”

Working groups at the conference produced a working draft for a better tracking approach, though disagreement remained about how to compress the work of a mathematician into a short document.

A big chunk of the meeting involved free-flowing group discussions between mathematicians, hashing out what AI-led mathematics might look like. Would it be humans and machines in lockstep, as Vakil suggested?. Or would it be more like a slot machine—press a button and sometimes the output is dazzling?.

Tsimerman didn’t like the slot-machine analogy. As a child he took part in maths competitions like the IMO, and “my experience of math is the act of solving problems,” he said. “And if I don’t do that anymore, I think I might prefer playing music or doing theatre or learning something else.”

He asked people in the room to indicate whether they would continue being mathematicians in his button-pushing vision. Only around half raised their hand.

Not everyone thought the exercise helped. Litt said “what I actually care about is understanding things and figuring out what’s true.” He noted that you can do that by posing and proving a conjecture, but you can also do it by “going over to your friend and asking them a question.”

Wood added a different boundary. Even if these tools can solve difficult problems, mathematicians often decide what is worth working on.

“Maths isn’t about solving puzzles just for the sake of it,” she said. “Mathematicians generally look for solutions that push the field forward.”

She described the test as practical: “Does it suggest a way to solve a lot of other problems, or is it only a solution for that particular problem?”

As the conference moved into its third day, excited murmurs rippled through the attendees. Overnight, an Erdős problem had been cracked—qualitatively different from the others.

Jared Lichtman at Stanford University had spent a large part of his PhD wrestling with a closely related problem that, after decades, many mathematicians had failed to solve. “It was a problem I was already independently very passionate about,” he said.

Price had elicited a solution to Erdős 1196 from a single request to ChatGPT 5.5 Pro. Erdős 1196 concerns “primitive” sets of numbers similar to prime numbers in that no number in the set can divide another. Erdős had calculated a number from these sets and argued that the largest value possible for any primitive set was 1.6. Lichtman had proved Erdős was correct for that case. but wanted to do the same for a more restricted family of primitive sets. Erdős suspected the highest value for that family was 1, but proving it remained tougher.

The AI used a tool called a Von Mangoldt function—something earlier attempts had missed.

“You can use the Von Mangoldt function to circumvent a lot of technical difficulties that all these previous approaches had used,” Lichtman said.

Working with others, including Price, Barreto and Tao, Lichtman later adapted the technique to solve a related 60-year-old conjecture by Erdős.

“This is perhaps one of the first examples of an AI-generated proof having downstream impacts, which we are still exploring,” Lichtman said when posting about the work on social media.

That night, I finally returned to my own attempt.

After “thinking” for 22 minutes and 18 seconds, ChatGPT pinged me with a response. “Here is the clean proof,” it wrote, followed by dozens of lines of mathematics I couldn’t follow well enough to trust.

I asked myself the question the conference room had been circling for days. Had I solved a decades-old problem, cementing my name in the mathematical history books?

I fed the answer back into ChatGPT. Soon, it confirmed: “Yes — the main argument is correct.”

I wrote to Barreto, asking whether I might be on to something.

The answer came fast, and with it came a jolt of embarrassment. “It doesn’t look like it solves the problem,” Barreto replied.

What I had missed was that the AI had actually proved something different from the formula Erdős had hoped for—something already discovered by Erdős himself years earlier.

A professional mathematician might have spotted the mistake quickly. For me, it was lost in the noise.

Perhaps that’s the uncomfortable landing point the conference tried to reach: maybe there is a future for mathematicians after all, even if the job shifts toward understanding what machines produce.

Litt put it plainly. “I still want to know what’s going on,” he said. “A model can’t understand something for you.”

AI mathematics ChatGPT 5.5 Pro AlphaProof International Mathematical Olympiad Erdős problems planar unit distance problem Riemann hypothesis Langlands program topology algebraic geometry

Matthew Patel 1 hour ago

3 10 minutes read

4 Comments

Megan Porter says:
June 1, 2026 at 8:16 pm
So it’s basically cheating but faster?
Derek Whitman says:
June 1, 2026 at 8:17 pm
I don’t get why everyone’s “dreading jobs” like it’s brand new. It can solve math contests, cool, but mathematicians still gotta prove stuff right? Or is it just generating answers without meaning?
Angela Ruiz says:
June 1, 2026 at 8:18 pm
This headline sounds like my kid’s homework app lol. Like if AI can crack math, then why not just let it teach kids and teachers can go away? Idk, feels like they’re panicking for no reason but also… kinda scary.
Caleb Morgan says:
June 1, 2026 at 8:19 pm
“Unmarked pink door” is honestly the most important part here. Anyway, I saw someone say AI is already doing “long-standing problems” which means the whole profession is cooked. But also my cousin’s a math tutor and he says AI can’t really do the hard steps, so who knows. If it’s making high scores on contests then it’s legit enough to replace a ton of people, just saying. Also the article’s kinda vague on what “solving” means, like did it actually prove it or just guess better?

4 Comments

Leave a Reply Cancel reply