A language model spent 12 hours reasoning through a particle physics problem. It found that a result physicists accepted for four decades was wrong. Then it wrote the proof. Five institutions verified it.
The Problem Nobody Thought Was a Problem
For about four decades, theoretical physicists accepted something as settled fact. When you calculate the scattering amplitude for gluons (the particles that carry the strong nuclear force) and one gluon has negative helicity while the rest have positive helicity, the answer is zero. Nothing happens. The math cancels out.
This wasn't controversial. It was in the textbooks.
It was also wrong.
The story of how that error got caught starts not in a university seminar room, but inside a language model's extended reasoning chain. On February 13, 2026, OpenAI published a preprint titled "Single-minus gluon tree amplitudes are nonzero," authored by Alfredo Guevara of Princeton's Institute for Advanced Study, Alex Lupsasca of Vanderbilt and OpenAI, David Skinner of Cambridge, Andrew Strominger of Harvard, and Kevin Weil on behalf of OpenAI. The paper presents a closed-form formula for single-minus gluon amplitudes valid for any number of external particles, along with a formal proof that the formula satisfies the Berends-Giele recursion relation.
The formula didn't come from a blackboard session. It came from GPT-5.2.
How the Discovery Actually Happened
Here's the thing about these amplitudes: the expressions get nightmarishly complicated as you add more gluons. Physicists had worked out specific cases by hand, computing amplitudes for configurations with up to six external gluons. Each step up in complexity didn't just double the difficulty. It grew superexponentially. The resulting expressions were enormous, tangled, and (everyone assumed) non-zero only due to intermediate terms that would ultimately cancel.
GPT-5.2 Pro was given these computed expressions and asked to simplify them. It did something unexpected. Rather than just reducing the algebra, it simplified the expressions so dramatically that a pattern emerged across the different cases. It then posited a general formula, one that would hold for all n, not just the handful of cases humans had ground through.
That alone would have been noteworthy. But the team pushed further. An internal, scaffolded version of GPT-5.2 was set loose on the problem with a single directive: prove the formula. For roughly 12 hours, the system reasoned autonomously, constructing a formal argument. When it finished, the proof was checked analytically against the Berends-Giele recursion relation. It held.
"It chose a path no human would have tried."Andrew Strominger, Harvard University
Strominger's comment gets at what makes this different from, say, AlphaFold predicting protein structures. This wasn't brute-force computation applied to a known framework. The AI identified a pattern that experienced physicists missed, generalized it, and then proved the generalization correct through a chain of mathematical reasoning that humans hadn't considered pursuing.
The Skeptics Have a Point
Let's be honest about what actually happened here, because the Hacker News thread with 500-plus comments raises legitimate concerns.
This is a serious objection and it deserves a serious response.
The counterargument is simple, and it's the one that bothers the skeptics most: spotting patterns in specific cases and generalizing to a universal formula is exactly how mathematical discovery has always worked. Euler did it. Ramanujan did it constantly, often without proofs at all. The Parke-Taylor formula for MHV (maximally helicity-violating) amplitudes, published in 1986, came from noticing patterns in computed examples and guessing a compact expression.
Nobody called Parke and Taylor "sophisticated curve-fitters."
There's also the verification argument. One reason AI might be particularly effective in theoretical physics and mathematics is that these fields come with built-in test suites. You can check a proposed formula against known recursion relations, boundary conditions, and limiting cases. The system doesn't need to "understand" physics in some deep philosophical sense. It needs to produce outputs that survive rigorous verification. GPT-5.2's formula survived.
What the Physicists Think
The reaction from the physics community hasn't been uniform, but the names attached to this work carry serious weight.
Nima Arkani-Hamed of the Institute for Advanced Study described the finding as a step toward "simple formula pattern recognition" tools, suggesting that AI systems could become standard instruments for identifying compact expressions hidden inside unwieldy calculations.
That framing matters. Arkani-Hamed isn't calling GPT-5.2 a physicist. He's calling it a new kind of tool for physicists, one that excels at a specific cognitive task (compressing complexity into pattern) that humans find extraordinarily difficult at scale.
The result has already been extended from gluons to gravitons, suggesting this wasn't a one-off fluke but the uncovering of deeper mathematical structure that connects different sectors of particle physics. That extension happened quickly precisely because the original formula was compact enough to work with, something that the sprawling, case-by-case human computations never afforded.
Before GPT-5.2
- Amplitudes computed case-by-case up to n=6
- Expressions grew superexponentially in complexity
- Single-minus amplitudes assumed to be zero for all n
- No compact general formula existed
After GPT-5.2
- Closed-form formula valid for all n
- Formal proof via Berends-Giele recursion
- Result extended from gluons to gravitons
- New avenue opened in amplitude research
The Credit Problem
Kevin Weil is listed on the arxiv preprint "on behalf of OpenAI." This is a polite workaround for an awkward question: should GPT-5.2 be listed as an author?
Current academic norms say no. Authorship implies accountability, and a language model can't be held accountable for errors, respond to peer review, or stand behind its claims. But the workaround creates its own oddity. A human who didn't do the mathematical work gets authorship credit as a proxy for a system that did.
This isn't going to get less weird. If AI systems keep producing verifiable results in physics and mathematics, the academic community will need a new framework for attribution. "On behalf of" works for now. It won't work for long.
The Real Question
Strip away the hype and the backlash, the breathless press releases and the dismissive "it's just autocomplete" takes, and you're left with a straightforward fact: a formula that humans didn't find for 40 years now exists, and it's correct.
The more productive question isn't "is this real intelligence?" It's "what happens next?" If AI systems can compress superexponentially complex expressions into compact formulas and then prove those formulas correct, the bottleneck in theoretical physics shifts. It moves from "can we compute this?" to "are we asking the right questions?" The hard part stops being the math and starts being the taste: knowing which problems are worth posing in the first place.
That's still a deeply human skill. For now.
Strominger said the AI chose a path no human would have tried. Maybe that's the finding that matters most. Not the formula itself, but the demonstration that there are paths through these problems that human intuition systematically overlooks. Forty years of physicists looked at single-minus amplitudes and saw zero. They weren't stupid. They weren't careless. They just couldn't see past what they expected to find.
Something without expectations looked at the same problem and saw something different.
That should make us curious, not threatened. The 12-hour proof didn't replace a physicist. It showed physicists a door they'd been walking past for four decades. What they do with it from here is entirely up to them.
Whether their new collaborator "understands" what it found is, frankly, beside the point.