Hi there,
Today's shift is less "one big model" and more "models getting specialised." OpenAI just turned image generation into a reasoning task with GPT-Image 2 and expanded Codex into almost every surface a developer touches. Meanwhile, the open-weight camp quietly took the SWE-Bench Pro crown with Z.ai's GLM-5.1, Meta's Llama 5 landed with a 5-million-token context and explicit "System-2" reasoning, and Oracle signed a 2.8 GW power deal with Bloom Energy to feed the whole thing.
The pattern underneath: the frontier isn't a single leaderboard anymore — it's fragmenting along capability lines, and open weights are a real seat at the table.
🔥 Featured Post
Vision Learns to Think, Codex Goes Everywhere, and Open Weights Claim the Coding Crown
- GPT-Image 2 ships with native reasoning, 2K output, and multi-image consistency — OpenAI calls it a "visual thought partner" that verifies its own outputs and produces up to eight coherent images from a single prompt.
- OpenAI also announced "Codex for (almost) everything" on April 21, pushing the Codex agent into more surfaces across the developer stack.
- Z.ai's (formerly Zhipu) GLM-5.1 — a 744B MoE with 40B active parameters, 200K context, MIT-licensed — scored 58.4 on SWE-Bench Pro, topping GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). First open-weight model to lead the coding-agent benchmark.
- Meta's Llama 5, released earlier in April, explicitly markets "System-2" deliberate reasoning and a 5-million-token context window, while a new proprietary Meta model, Muse Spark, launched alongside it — a notable crack in Meta's open-source-only posture.
- Oracle signed a deal for up to 2.8 GW of Bloom Energy fuel cells for AI data centers (1.2 GW already contracted), part of a US utility capex plan hitting $1.4T through 2030 driven almost entirely by AI power demand.
📚 In Case You Missed It
Inside smart-sdlc: The Skill-First Agentic Framework That Turns Copilot and Claude Into a Full SDLC Team — smart-sdlc is a markdown-only agentic SDLC framework that runs inside GitHub Copilot, Claude, or any AI assistant — six personas (Aria, Rex, Nova, Sage, Lead, Scout), six phases, zero runtime. Here's why the 'skill-first, no platform' bet is interesting.
AI's Trust Test: Surgical Robots, Broken Benchmarks, and the EU's 100-Day Countdown — NVIDIA's healthcare physical AI stack (Open-H, Cosmos-H, GR00T-H, Rheo) ships into real operating rooms, Berkeley researchers prove the top 8 agent benchmarks can be hacked, and the EU AI Act deadline is now 103 days away. Trust is the new frontier.
The Silicon Decoupling: Meta's 1GW MTIA, OpenAI's $20B Cerebras Deal, and AI's Quiet Escape From Nvidia — Meta's 1-gigawatt Broadcom MTIA deal, OpenAI's $20B Cerebras contract, and Perplexity's Personal Computer on Mac — three stories, one pattern: AI compute is decoupling from Nvidia and from the cloud.
More posts dropping every day. Stay curious.
— Bhanu @ superml.dev
