Physical AI Hits the Real World: Sony's Ace Beats the Pros, ChatGPT Walks Into the Clinic, and Enterprise Agents Go GA

Hi there,

The last 48 hours delivered the clearest signal yet that AI is leaving the benchmark lab and walking onto the field, into the clinic, and across the enterprise. Sony's Ace robot landed on the cover of Nature after beating professional table tennis players under ITTF rules, OpenAI shipped a free HIPAA-capable ChatGPT for Clinicians, and Microsoft put a date on Copilot's "Frontier Suite" going GA.

Three different domains, one pattern: the sim-to-real gap is finally being crossed in production.

🔥 Featured Post

Physical AI Hits the Real World: Sony's Ace Beats the Pros, ChatGPT Walks Into the Clinic, and Enterprise Agents Go GA

Sony AI's Ace is on the cover of Nature (April 23): event-based vision + model-free RL, three wins in five matches against elite players and the first-ever win over a professional, all under ITTF rules and on unaltered equipment.
OpenAI launched ChatGPT for Clinicians (April 22) — free for verified U.S. physicians, NPs, PAs, and pharmacists; HIPAA-capable via BAA; GPT-5.4 in the clinician workspace beat every base model and human physicians on OpenAI's new HealthBench Professional.
Physicians rated 99.6% of responses safe and accurate across 6,924 real clinical conversations during red-team testing.
Microsoft 365 E7 (Frontier Suite) — bundling Copilot, Work IQ, and Agent 365 with central agent governance — hits general availability May 1, and Infosys-OpenAI tied Codex into Topaz Fabric for enterprise SDLC.
The Waymo World Model (built on DeepMind's Genie 3) is now simulating tornadoes and elephants at billion-mile scale — giving us a clean lens on why real-world AI suddenly works.

Read the full post →

📚 In Case You Missed It

Vision Learns to Think, Codex Goes Everywhere, and Open Weights Claim the Coding Crown — GPT-Image 2 adds native reasoning to image generation, Codex ships 'for (almost) everything,' Z.ai's open-weight GLM-5.1 tops SWE-Bench Pro over GPT-5.4 and Opus 4.6, Meta's Llama 5 lands with 5M-token context, and Oracle inks 2.8 GW with Bloom Energy.

Inside smart-sdlc: The Skill-First Agentic Framework That Turns Copilot and Claude Into a Full SDLC Team — smart-sdlc is a markdown-only agentic SDLC framework that runs inside GitHub Copilot, Claude, or any AI assistant — six personas (Aria, Rex, Nova, Sage, Lead, Scout), six phases, zero runtime. Here's why the 'skill-first, no platform' bet is interesting.

AI's Trust Test: Surgical Robots, Broken Benchmarks, and the EU's 100-Day Countdown — NVIDIA's healthcare physical AI stack (Open-H, Cosmos-H, GR00T-H, Rheo) ships into real operating rooms, Berkeley researchers prove the top 8 agent benchmarks can be hacked, and the EU AI Act deadline is now 103 days away. Trust is the new frontier.

More posts dropping every day. Stay curious.

— Bhanu @ superml.dev