Thought Leadership

The 12-Hour Proof: When GPT-5.2 Discovered Physics That Humans Missed for 40 Years

GPT-5.2 spent 12 hours reasoning through a particle physics problem, discovered the textbook answer was wrong, and produced a formal proof. Harvard and Cambridge physicists verified it. What does this mean for science?
February 18, 2026 · 8 min read
01 MODEL GPT-5.2 Pro (Extended Reasoning)
02 TASK Simplify single-minus gluon amplitudes
03 REASONING TIME 12 hours autonomous
04 PRIOR ASSUMPTION A(1⁻, 2⁺, ..., n⁺) = 0 for all n
05 RESULT Non-zero. Closed-form formula found.
06 VERIFICATION Harvard · Cambridge · Princeton IAS · Vanderbilt
07 YEARS UNCHALLENGED 40

PROOF VERIFIED: Formula holds. Recursion check passed.

A language model spent 12 hours reasoning through a particle physics problem. It found that a result physicists accepted for four decades was wrong. Then it wrote the proof. Five institutions verified it.

TL;DR: OpenAI's GPT-5.2 spent 12 hours autonomously reasoning through a problem in particle physics, discovered that a widely accepted textbook result was wrong, and produced a formal proof of a new formula for gluon scattering amplitudes. The preprint, co-authored with physicists from Harvard, Cambridge, Princeton IAS, and Vanderbilt, was published on February 13, 2026. The finding has already been extended from gluons to gravitons. The physics community is divided: is this genuine discovery, or sophisticated pattern matching? And does the distinction even matter?

The Problem Nobody Thought Was a Problem

For about four decades, theoretical physicists accepted something as settled fact. When you calculate the scattering amplitude for gluons (the particles that carry the strong nuclear force) and one gluon has negative helicity while the rest have positive helicity, the answer is zero. Nothing happens. The math cancels out.

This wasn't controversial. It was in the textbooks.

It was also wrong.

12
Hours of autonomous reasoning by GPT-5.2
40
Years the "zero amplitude" answer stood unchallenged
5
Institutions on the resulting preprint

The story of how that error got caught starts not in a university seminar room, but inside a language model's extended reasoning chain. On February 13, 2026, OpenAI published a preprint titled "Single-minus gluon tree amplitudes are nonzero," authored by Alfredo Guevara of Princeton's Institute for Advanced Study, Alex Lupsasca of Vanderbilt and OpenAI, David Skinner of Cambridge, Andrew Strominger of Harvard, and Kevin Weil on behalf of OpenAI. The paper presents a closed-form formula for single-minus gluon amplitudes valid for any number of external particles, along with a formal proof that the formula satisfies the Berends-Giele recursion relation.

The formula didn't come from a blackboard session. It came from GPT-5.2.

How the Discovery Actually Happened

Here's the thing about these amplitudes: the expressions get nightmarishly complicated as you add more gluons. Physicists had worked out specific cases by hand, computing amplitudes for configurations with up to six external gluons. Each step up in complexity didn't just double the difficulty. It grew superexponentially. The resulting expressions were enormous, tangled, and (everyone assumed) non-zero only due to intermediate terms that would ultimately cancel.

GPT-5.2 Pro was given these computed expressions and asked to simplify them. It did something unexpected. Rather than just reducing the algebra, it simplified the expressions so dramatically that a pattern emerged across the different cases. It then posited a general formula, one that would hold for all n, not just the handful of cases humans had ground through.

That alone would have been noteworthy. But the team pushed further. An internal, scaffolded version of GPT-5.2 was set loose on the problem with a single directive: prove the formula. For roughly 12 hours, the system reasoned autonomously, constructing a formal argument. When it finished, the proof was checked analytically against the Berends-Giele recursion relation. It held.

"It chose a path no human would have tried."
Andrew Strominger, Harvard University

Strominger's comment gets at what makes this different from, say, AlphaFold predicting protein structures. This wasn't brute-force computation applied to a known framework. The AI identified a pattern that experienced physicists missed, generalized it, and then proved the generalization correct through a chain of mathematical reasoning that humans hadn't considered pursuing.

The Skeptics Have a Point

Let's be honest about what actually happened here, because the Hacker News thread with 500-plus comments raises legitimate concerns.

The skeptic's case: GPT-5.2 was given the answers for n=1 through n=6, computed entirely by human physicists. It spotted a pattern and generalized. This is sophisticated curve-fitting, not discovery. Humans did the hard work of computing the specific cases. The AI just noticed that the outputs weren't zero (which humans had also noticed in the intermediate steps) and found a formula that fit the data points. The 12-hour "proof" was produced by a system that has no understanding of what gluons are, what scattering means, or why any of this matters.

This is a serious objection and it deserves a serious response.

The counterargument is simple, and it's the one that bothers the skeptics most: spotting patterns in specific cases and generalizing to a universal formula is exactly how mathematical discovery has always worked. Euler did it. Ramanujan did it constantly, often without proofs at all. The Parke-Taylor formula for MHV (maximally helicity-violating) amplitudes, published in 1986, came from noticing patterns in computed examples and guessing a compact expression.

Nobody called Parke and Taylor "sophisticated curve-fitters."

Key insight: The debate over whether pattern recognition counts as "real" discovery isn't new. It's the same argument mathematicians have had for centuries. What's new is that a non-human system is doing the pattern recognition, and that changes who gets uncomfortable.

There's also the verification argument. One reason AI might be particularly effective in theoretical physics and mathematics is that these fields come with built-in test suites. You can check a proposed formula against known recursion relations, boundary conditions, and limiting cases. The system doesn't need to "understand" physics in some deep philosophical sense. It needs to produce outputs that survive rigorous verification. GPT-5.2's formula survived.

What the Physicists Think

The reaction from the physics community hasn't been uniform, but the names attached to this work carry serious weight.

Nima Arkani-Hamed of the Institute for Advanced Study described the finding as a step toward "simple formula pattern recognition" tools, suggesting that AI systems could become standard instruments for identifying compact expressions hidden inside unwieldy calculations.

That framing matters. Arkani-Hamed isn't calling GPT-5.2 a physicist. He's calling it a new kind of tool for physicists, one that excels at a specific cognitive task (compressing complexity into pattern) that humans find extraordinarily difficult at scale.

The result has already been extended from gluons to gravitons, suggesting this wasn't a one-off fluke but the uncovering of deeper mathematical structure that connects different sectors of particle physics. That extension happened quickly precisely because the original formula was compact enough to work with, something that the sprawling, case-by-case human computations never afforded.

Before GPT-5.2

  • Amplitudes computed case-by-case up to n=6
  • Expressions grew superexponentially in complexity
  • Single-minus amplitudes assumed to be zero for all n
  • No compact general formula existed

After GPT-5.2

  • Closed-form formula valid for all n
  • Formal proof via Berends-Giele recursion
  • Result extended from gluons to gravitons
  • New avenue opened in amplitude research

The Credit Problem

Kevin Weil is listed on the arxiv preprint "on behalf of OpenAI." This is a polite workaround for an awkward question: should GPT-5.2 be listed as an author?

Current academic norms say no. Authorship implies accountability, and a language model can't be held accountable for errors, respond to peer review, or stand behind its claims. But the workaround creates its own oddity. A human who didn't do the mathematical work gets authorship credit as a proxy for a system that did.

This isn't going to get less weird. If AI systems keep producing verifiable results in physics and mathematics, the academic community will need a new framework for attribution. "On behalf of" works for now. It won't work for long.

The Real Question

Strip away the hype and the backlash, the breathless press releases and the dismissive "it's just autocomplete" takes, and you're left with a straightforward fact: a formula that humans didn't find for 40 years now exists, and it's correct.

The uncomfortable truth: Whether GPT-5.2 "truly discovered" something or "merely found a pattern" is a question about philosophy of mind, not physics. The formula works. The proof checks out. The result has already generated new findings. The universe does not care who, or what, wrote the equation.

The more productive question isn't "is this real intelligence?" It's "what happens next?" If AI systems can compress superexponentially complex expressions into compact formulas and then prove those formulas correct, the bottleneck in theoretical physics shifts. It moves from "can we compute this?" to "are we asking the right questions?" The hard part stops being the math and starts being the taste: knowing which problems are worth posing in the first place.

That's still a deeply human skill. For now.

Strominger said the AI chose a path no human would have tried. Maybe that's the finding that matters most. Not the formula itself, but the demonstration that there are paths through these problems that human intuition systematically overlooks. Forty years of physicists looked at single-minus amplitudes and saw zero. They weren't stupid. They weren't careless. They just couldn't see past what they expected to find.

Something without expectations looked at the same problem and saw something different.

That should make us curious, not threatened. The 12-hour proof didn't replace a physicist. It showed physicists a door they'd been walking past for four decades. What they do with it from here is entirely up to them.

Whether their new collaborator "understands" what it found is, frankly, beside the point.

Share This Article

Share on X Share on Facebook Share on LinkedIn
Future Humanism editorial team

Future Humanism

Exploring where AI meets human potential. Daily insights on automation, side projects, and building things that matter.

Follow on X

Keep Reading

Tether Just Made Your Phone an AI Training Lab. The Cloud Should Be Nervous.
AI Tools

Tether Just Made Your Phone an AI Training Lab. Th...

Tether's QVAC framework enables billion-parameter AI model fine-tuning on smartp...

ODEI and the Case for World Memory as a Service
AI Agents

ODEI and the Case for World Memory as a Service

Every AI agent forgets everything. ODEI is building the persistent memory infras...

The Three Laws of Agent Commerce: How x402, ERC-8004, and ERC-8183 Built an Economy in Three Weeks
AI Agents

The Three Laws of Agent Commerce: How x402, ERC-80...

Three standards dropped in three weeks and together form the complete infrastruc...

These AI-Evolved Robots Refuse to Die, and That Changes Everything
AI Agents

These AI-Evolved Robots Refuse to Die, and That Ch...

Northwestern's legged metamachines are the first robots evolved inside a compute...

Share This Site
Copy Link Share on Facebook Share on X
Subscribe for Daily AI Tips