OpenAI dropped GPT-5.5, so we did the only reasonable thing: went live immediately and tried to break it.
In this off-the-cuff Neuron Live, Corey and Grant walk through OpenAI's GPT-5.5 release notes, benchmark claims, rollout details, and early access reactions before testing the model live across coding, reasoning, creativity, web research, and absurd prompt challenges. We also compare a few GPT-5.5 responses against Claude Opus 4.7, test Codex, build a new version of Cat Doom, and ask the important questions, like whether a sentient vending machine that only dispenses expired tuna salad deserves to live.
In this episode, we cover:
• What OpenAI says is new in GPT-5.5
• GPT-5.5’s improvements in coding, computer use, research, and knowledge work
• Early benchmark results across Terminal-Bench, GDPval, Frontier Math, BrowseComp, and scientific research tasks
• Why token efficiency may matter as much as raw intelligenceGPT-5.5’s rollout across ChatGPT, Codex, Plus, Pro, Business, and Enterprise
• Live Codex testing with a one-shot Cat Doom game buildCreative stress tests involving palindromes, time-traveling potatoes, dystopian vending machines, and Lord of the Rings product reviews
• First impressions of whether GPT-5.5 feels meaningfully different from GPT-5.4 and Claude Opus 4.7
This was not a formal benchmark. It was a first-contact livestream: messy, fast, weird, and exactly the kind of test we like.
Subscribe for more AI breakdowns, live model tests, beginner-friendly explainers, and weirdly useful prompt experiments from The Neuron.
Sign up for The Neuron newsletter: https://www.theneuron.ai/
Follow along for more AI news, analysis, and live experiments.