|
3, 2, 1: Health AI Brief
Every Friday
May 1, 2026
|
|
|
AI is reshaping healthcare fast. Below are 3 key AI developments, 2 studies, and 1 takeaway for this week to help you better lead with AI. Target read time: 5 minutes. |
|
On April 28, the FDA announced two real-time clinical trial proof-of-concepts that stream endpoints to the agency continuously rather than as batched data at fixed milestones. AstraZeneca's Phase 2 trial for mantle cell lymphoma and Amgen's Phase 1b trial for small cell lung cancer are the first two. Commissioner Marty Makary framed the agency's stance directly: "For 60 years, we've been conducting clinical trials in the same way, where key data signals can take years to reach the FDA." So what?
Continuous review of clinical trials is a big structural change to drug development. If successful in compressing the time from trial completion to regulatory decision, this big pharma story converts into an even bigger payer story — since this will have a massive impact on the number of new drugs coming to market, coverage decisions, and ultimately cost. Read the FDA announcement → | Share your feedback with the FDA (by May 29) → On April 29, the American Hospital Association and the West Health Institute announced a $12 million, 3-year national accelerator focused on 3 priority areas: EHR optimization, virtual care, and AI integration. The premise isn't building new tools — it's deploying tested ones. Participating hospitals get access to a shared digital hub at nationalaccelerator.org with implementation playbooks and peer-learning networks. West Health Institute's CEO was direct about the program's stance: "This is not about inventing the future — it's about deploying it." So what?
The pilot-to-production gap in healthcare AI is now an explicit programmatic target. What AHA chooses to curate will likely become a de facto reference list for hospital procurement teams. Instead of regulation to keep vendors out, this very much feels like the opposite: a powerful accelerant for the chosen ones. In a Viewpoint published April 29 in JAMA, Alon Bergman (UPenn), Bob Wachter (UCSF), and Zeke Emanuel (UPenn) argue that the current FDA medical-device paradigm is structurally wrong for adaptive, general-purpose clinical AI. Their proposal: a licensure framework grounded in continuous clinical evaluation, analogous to how individual physicians are licensed and monitored — not premarket clearance for a fixed product. The authors tie the timing to two converging pressures: a worsening physician supply and rapid gains in AI clinical competency, which they argue can no longer be regulated as separate problems. So what?
Worth reading this opinion piece from policy heavyweights: AI products are constantly changing, meaning yesterday's evaluation may not apply today, thus making continuous evaluation an absolute necessity. |
|
A study published in Science by researchers at Harvard University and Beth Israel Deaconess Medical Center evaluated OpenAI's o1 reasoning model on five diagnostic tasks, with the most important being a real-world test on actual ER charts from Beth Israel patients. At the triage stage, when clinicians have the least information, o1 reached an exact or close diagnosis 67% of the time, more than 10 percentage points above the two physicians given the same cases (50–55%). On a separate clinical-reasoning quality task, o1 received a perfect score on 98% of cases versus 35% for attending physicians. The authors note that o1, released in late 2024, is already "ancient history" in machine learning time. Why it matters
The most striking part isn't that o1 outperformed the physicians. It's that the gap was widest at the triage stage, when clinicians have the least information to work with. AI's edge is largest precisely where the stakes are highest and the data is thinnest. And o1 was released in late 2024; the authors call it "ancient history" in machine learning time, meaning the 67% number isn't the ceiling, but the floor. A cross-sectional evaluation in JAMA Network Open tested 21 frontier large language models — including GPT-5, Claude 4.5 Opus, Gemini 3.0 models, and Grok 4 — against 29 standardized clinical vignettes from the MSD Manual, generating 16,254 responses. Cases were presented sequentially: differential diagnosis was tested at the start of each case, final diagnosis after all the data had unfolded. Across every model, differential-diagnosis failure rates exceeded 80%. Final-diagnosis failure rates were under 40%. As lead author Arya Rao put it: "These models are great at naming a final diagnosis once the data is complete, but they struggle at the open-ended start of a case, when there isn't much information." The authors conclude that current off-the-shelf LLMs "cannot yet be relied on for unsupervised patient-facing clinical decision-making." Why it matters
The split between differential diagnosis (over 80% failure) and final diagnosis (under 40% failure) is the actual story. Unfortunately, clinical medicine operates the other way, where the differential happens first, and that's where the models did their worst. Note: this is based on vignettes, and there was no comparison to how human physicians would have performed. |
|
Regulators made big moves on AI this week.
AI's underlying capability continues its unabated growth in competency; while not perfect, it's only going in one direction. This week, regulation and oversight are catching up. The FDA launched its first real-time clinical trials with AstraZeneca and Amgen. The AHA and West Health committed $12 million to help hospitals scale tools that already work. And Bob Wachter and Zeke Emanuel argued in JAMA that adaptive AI needs a licensure framework with ongoing rather than one-time evaluation. Three power centers (federal regulator, hospital association, and senior policy thinkers) are pushing for ways to make AI deployments faster. Takeaway
Real-time clinical trials, AI deployment at scale, continuous clinical evaluation: these are all really about the same thing, speed with effective oversight. I believe that's only possible with AI overseeing AI. Anything else won't scale, won't be fast enough, or just won't be affordable. |
|
|
Know someone who'd find this useful? Share |
