GPT-5.5, Google's 8th-Gen TPU, and Why AI Is Finally Learning to Say 'I'm Not Sure'

Hi there,

Yesterday OpenAI shipped GPT-5.5 — the first fully retrained base model since GPT-4.5 — scoring 82.7% on Terminal-Bench 2.0 and nearly doubling Claude Opus 4.7 on the hardest math benchmark. The same week, Google Cloud Next revealed 8th-generation TPUs and its new Gemini Enterprise Agent Platform, while MIT CSAIL published a training trick that attacks the root cause of AI hallucination. A lot happened in 48 hours — here's the short version.

🔥 Featured Post

GPT-5.5, Google's 8th-Gen TPU, and Why AI Is Finally Learning to Say 'I'm Not Sure'

GPT-5.5 scored 82.7% on Terminal-Bench 2.0 and 39.6% on FrontierMath Tier 4 — almost double the 22.9% Claude Opus 4.7 achieved on the same tier.
Google's TPU 8t superpods scale to 9,600 interconnected chips and 2 petabytes of shared high-bandwidth memory, delivering 2.8–3× training performance over the previous generation.
Google's TPU 8i inference chip achieves 80% better price-performance and 2× performance per watt vs. Gen 7.
75% of all new code at Google is now AI-generated and approved by engineers, up from 50% last fall.
MIT CSAIL's RLCR method reduces calibration error by up to 90%, targeting the root cause of hallucination without sacrificing benchmark accuracy.

Read the full post →

📚 In Case You Missed It

Physical AI Hits the Real World: Sony's Ace Beats the Pros, ChatGPT Walks Into the Clinic, and Enterprise Agents Go GA — Sony's Ace robot beats pro table tennis players on the cover of Nature, OpenAI ships ChatGPT for Clinicians plus HealthBench Professional, and Microsoft's Frontier Suite hits GA — physical, medical, and enterprise AI all crossed into real-world deployment this week.

Vision Learns to Think, Codex Goes Everywhere, and Open Weights Claim the Coding Crown — GPT-Image 2 adds native reasoning to image generation, Codex ships 'for (almost) everything,' Z.ai's open-weight GLM-5.1 tops SWE-Bench Pro over GPT-5.4 and Opus 4.6, Meta's Llama 5 lands with 5M-token context, and Oracle inks 2.8 GW with Bloom Energy.

Inside smart-sdlc: The Skill-First Agentic Framework That Turns Copilot and Claude Into a Full SDLC Team — smart-sdlc is a markdown-only agentic SDLC framework that runs inside GitHub Copilot, Claude, or any AI assistant — six personas (Aria, Rex, Nova, Sage, Lead, Scout), six phases, zero runtime. Here's why the 'skill-first, no platform' bet is interesting.

More posts dropping every day. Stay curious.

— Bhanu @ superml.dev