The evidence for AI is falling behind

3, 2, 1: Health AI Brief

Every Friday

May 22, 2026

AI is reshaping healthcare fast. Below are 3 key AI developments, 2 studies, and 1 takeaway for this week to help you better lead with AI. Target read time: 5 minutes.

3 Market Signals

Commure raises $70M at a $7B valuation to scale autonomous revenue cycle agents

On May 19, Commure announced a $70 million round led by returning investor General Catalyst, with Sequoia Capital, Morgan Stanley, and Kirkland & Ellis joining. The post-money valuation is $7 billion. Commure says its agents now run about 85% of revenue cycle workflows — billing and claims tracking — without human involvement, across 500+ healthcare organizations and 3,000+ care sites including HCA Healthcare and Tenet Healthcare.

So what?

US healthcare administrative work runs roughly $1 trillion a year, and revenue cycle is one of its largest line items. This is a big problem being tackled by what General Catalyst's Hemant Taneja called "a robust system of autonomous agents." Commure's value is less as a copilot (to a human) and more as a human itself.

Read the HIT Consultant coverage →

AMA gives patients a playbook for using AI on their own health

On May 20, the American Medical Association released a patient-facing infographic with five recommended uses (explore possibilities, simplify medical jargon, add personal context, understand treatment options, prep questions for a visit) and four cautions (don't use AI for diagnosis decisions, never in emergencies, protect personal information, never replace physician advice). AMA CEO John Whyte framed the guidance around patients using AI "to complement, not replace" their doctors.

So what?

A Gallup poll in April estimated about 14 million Americans skipped a doctor visit after getting health advice from AI. That same month, the AMA was lobbying Congress to tighten safeguards on AI chatbots. This week they handed patients a playbook for using them anyway. I'm reading that as the AMA pushing on both fronts — regulatory ask AND consumer reality.

Read the AMA announcement → | View the infographic →

HHS launches AERO, an AI program to audit Medicaid and federal grants across 50 states

On May 21, HHS announced the Audit Enforcement and Risk Oversight (AERO) program. It will use ChatGPT and other AI tools to analyze audit reports from all 50 states on an ongoing basis, covering state Medicaid programs and federal grantees in research, addiction services, and other HHS-funded work spanning at least the last 5 years. Enforcement levers include temporary payment withholding, fund suspension, and award termination. For context, the administration has already withheld $259 million from Minnesota and over $1 billion from California in Medicaid funds in earlier enforcement actions.

So what?

AI in the audit loop is half the story. The other half is the procedural shift: AERO can suspend payments on AI-flagged patterns and reconcile later, instead of investigating first and enforcing if substantiated. I'm reading that as a bet that the speed gain outweighs the false-positive cost.

Read the Healthcare Dive coverage → | Read the AP wire report →

2 Research Studies

npj Digital Medicine: AI graft-loss prediction didn't change kidney transplant conversations

The PRIMA-AI trial randomized 76 kidney transplant recipients with advanced graft dysfunction (eGFR < 30) 1:1 to usual care or usual care plus an EHR-integrated machine learning model predicting 1-year graft loss risk. The primary outcome was the share of patients having structured conversations with their team about post-graft-loss treatment options over 12 months. The result: 14/36 (39%) in the intervention arm versus 16/40 (40%) in the control arm — no difference. No significant differences on secondary outcomes either. The authors attribute it to low and variable tool uptake and workflow barriers.

Why it matters

A null result worth reading carefully. The model didn't fail technically, the workflow did. AI tools that don't change clinician behavior don't change patient outcomes, either.

Read the PRIMA-AI trial →

npj Digital Medicine: Only 2.4% of healthcare AI studies are randomized trials

A scoping review analyzed 218 systematic reviews published from September 2023 to September 2024, extracting 4,667 primary studies of AI in medicine. Of those, 88.2% were preclinical (4,114), and only 2.4% were randomized controlled trials (113).

Why it matters

The 2.4% number is lower than I would have expected. We're scaling AI deployment faster than the RCT base is growing. Ideally AI can help us accelerate the trials to evaluate AI. That's the only way the AI evidence base keeps up.

Read the scoping review →

1 Key Insight

The evidence for AI is falling behind.

A new trial tested whether a deployed AI risk-prediction tool changed how kidney transplant teams talked to patients about graft loss. It didn't. PRIMA-AI — published this month in npj Digital Medicine — found no difference between the intervention and control arms. The model worked; the workflow didn't.

Zoom out. A scoping review in the same journal this month put a number on the broader pattern: only 2.4% of healthcare AI studies are randomized trials. 88% of the field is still preclinical.

Meanwhile, Commure just took agentic AI for revenue cycle to a $7 billion valuation, and HHS announced it would put ChatGPT into federal-grants audits across 50 states. The capital, the procurement, and the deployment are all running ahead of the evidence base. The gap is growing every week, every day.

Takeaway

We need to make sure we are measuring end points that matter (e.g., does this actually help patients?) and with best-in-class evaluations like randomized clinical trials. We are behind on both. The creation of robust evidence that matters needs to move at the pace of AI.

Know someone who'd find this useful?

The evidence for AI is falling behind

Keep Reading

HealthLeader AI Brief