AI just solved a math problem that stumped humanity for 80 years

In July 2024, Google DeepMind published results showing that AlphaProof and AlphaGeometry 2 together solved four out of six problems from the International Mathematical Olympiad — achieving a score equivalent to a silver medal.

One of those problems was in combinatorics. The techniques used trace back to an open problem posed in 1948. The model found a proof that human mathematicians had not.

Let me tell you what I think this actually means.

What AlphaProof Actually Does

AlphaProof is not GPT. It’s not generating text and hoping it looks like a proof. It operates in Lean 4 — a formal proof assistant where every step is verified by a theorem checker. If the proof compiles, it’s correct. There’s no “hallucinating a plausible-looking argument” here.

The architecture combines a language model (trained on mathematical literature) with a reinforcement learning loop in the formal proof space. The model proposes proof steps, Lean 4 checks them, and the RL loop rewards steps that move toward a valid proof.

This is a fundamentally different paradigm than what we usually mean when we say “AI reasoning.”

Why the “AGI is here” Takes Are Wrong

Within 24 hours, Twitter was full of claims that this proves AGI is imminent. These claims are wrong, and here’s why.

IMO problems are a very specific distribution. They’re hard, yes. But they’re also well-scoped, formally specifiable, and have exactly one correct answer. The real world rarely looks like this.

The system cannot generalize to unstructured mathematical research. AlphaProof can’t walk into a domain it’s never seen, identify what an interesting open question would be, formulate it formally, and then solve it. The problem has to be handed to it in a clean, structured form.

Symbolic verification is doing a lot of work. The reason this system is trustworthy is that Lean 4 checks every step. Strip away formal verification and you’re back to the hallucination problem.

Why the “Just Pattern Matching” Takes Are Also Wrong

On the other side: every time AI does something impressive, someone says “it’s just statistics” or “it’s just pattern matching on training data.”

This framing is intellectually lazy.

The combinatorics problem solved by AlphaProof had never been solved. It was not in the training data as a solved problem. The model had to compose techniques in a novel way to produce a proof that no human had found. Whether you call that “pattern matching” or “reasoning” is a definitional argument — but the output is genuinely novel mathematical work.

The gap between “learned to pattern-match” and “can produce new knowledge” is collapsing faster than most people expected.

What This Actually Signals

Here’s my read: we’re seeing the beginning of AI systems that are genuinely useful for formal knowledge work — not because they’re smarter than humans in general, but because formal systems (proofs, code, structured specifications) let us verify outputs without trusting the model.

This is the key insight: verifiability unlocks reliability.

When I build RAG systems or evals at work, I’m always fighting the verification problem. I can’t always know if the model’s answer is right. But if I can constrain the model to operate in a domain where outputs are checkable — code that runs, proofs that compile, SQL that returns correct results — suddenly the unreliability problem shrinks dramatically.

AlphaProof is proof of concept (no pun intended) that AI + formal verification is a viable research paradigm. Expect to see this pattern applied in more domains: verified code generation, verified biological pathway proposals, verified legal argument structures.

The Thing I Keep Coming Back To

The combinatorics problem that AlphaProof solved was open for 76 years. Not because nobody tried. Because it was genuinely hard and the right combinatorial insight was elusive.

A machine found it.

I don’t have a clean conclusion here. But I think the people confidently saying “AI will never do real mathematics” should update their priors. And the people saying “AGI is one more breakthrough away” should look more carefully at the specific constraints that made this possible.

The truth is somewhere more interesting than either camp is willing to admit.

I write about AI research through the lens of someone building with it every day. Subscribe to the newsletter if this kind of analysis is useful.