Less Intercoms, More Control Towers: Why Voice AI Isn’t the Point

Contributor: Kira Radinsky, PhD
To learn more about Kira, click here.

Source: Bigstock

In value-based care, the greatest risk is often not a bad decision but no decision – not intervening when signals are clear, not closing known gaps, or arriving too late for the intervention to matter. Too often, organizations adopt incremental tools that create motion but not control: activity without reliable follow-through, and engagement without operational execution.

That’s why the current wave of “voice AI” is easy to misunderstand. Healthcare is rapidly perfecting the talk layer: empathetic voices, natural dialogue, smoother call flows, faster answers. The progress is real and useful, but it’s also converging. As conversational capability becomes widely available, the advantage shifts from how well the system talks to what the system gets done.

Healthcare doesn’t need more voice; it needs voice with a brain. Not in a sci-fi sense, but in an operational sense: a system that can decide what matters now, route work to the right place, execute the next step, and verify completion. In other words, talking is becoming a commodity, and orchestration is becoming the differentiator.

Care isn’t a conversation; it’s a sequence of actions. Outcomes don’t change because someone had a good interaction; they change when the system reliably completes work despite real constraints: limited appointment supply, pharmacy friction, prior authorizations, transportation barriers, documentation requirements, and finite clinician capacity. A “great call” that ends with “please call your PCP” or “follow up with your pharmacy” is often just a polite handoff, and handoffs are where care breaks. The question, then, isn’t whether an AI can speak fluently. It’s whether the system can execute: coordinating the people, workflows, and follow-through that turn conversation into completed care.

One reason this distinction matters is that even highly successful “talk automation” often produces surprisingly modest throughput gains unless the surrounding workflow and incentives change. Ambient AI scribes are a useful example. They capture the conversation and generate the note, and they are often sold on a straightforward ROI story: reduce documentation time and clinicians will see more patients.

A January 2026 JAMA Network Open study at UCSF provides a reality check at scale: across roughly 1.2 million ambulatory encounters and 1,565 physicians, adoption was associated with +0.80 encounters per week and +1.81 RVUs per week, with no evidence of increased claim denials; the authors translate this to about $3,044 in annual revenue per physician (JAMA Network), which is similar to the cost of the scribe system.

Those gains are real but modest. The magnitude is instructive: automation does not convert 1:1 into throughput unless incentives and workflows are redesigned to convert saved time into capacity (notwithstanding suspected intangible and burden decrease benefits experienced by the clinical team).

This is where “voice with a brain” becomes concrete. A scribe automates talk, but it doesn’t reallocate work across the team, route downstream tasks, resolve operational friction, or verify completion. It improves a component, not the system. In our experience, the durable economics of automation often show up in places “RVUs per week” undercounts: retention, staffing flexibility, and role attractiveness (those aforementioned “intangibles”). The distinction is not whether voice AI saves time, but whether it converts that saved time into system-level execution. A voice-only tool may shave minutes off documentation or outreach and make clinicians feel better, yet leave throughput and outcomes largely unchanged because nothing downstream is redesigned.

Voice with a brain is different. The conversation becomes the trigger for an owned workflow: follow-up gets scheduled, the case is routed to the right team member, pharmacy barriers are resolved, documentation lands in the system of record, and completion is verified. That is how minutes saved turn into completed care, and completed care is what moves VBC economics.

The same pattern appears even more starkly when the “conversation” is the intervention itself. The “72-hour post-discharge phone call” is a classic example of talk without sufficient execution. A cluster-randomized trial in PLOS ONE found that a single post-discharge phone call had a small impact on care-transition quality but no effect on hospital utilization. (PLOS)

By contrast, transitional care programs that operationalize follow-through show meaningful reductions. The Care Transitions Intervention (Coleman et. al.) reported rehospitalization of 8.3% vs. 11.9% at 30 days and 16.7% vs. 22.5% at 90 days (intervention vs. control). (PubMed) Other studies reported lower 30-day hospital utilization with an incidence rate ratio of 0.695 (95% CI 0.515–0.937). (PMC)

The managerial takeaway is consistent: the unit of value isn’t “contact,” it’s “completion.” Calling works when it triggers an owned workflow, e.g. medication reconciliation, follow-up scheduling, barrier removal, escalation, and verification, and not when it stops at advice.

When voice-first programs fail, they usually fail in repeatable ways. Barriers are identified (cost, confusion, side effects, transportation), but resolution is pushed to the patient or an unmonitored inbox. “Schedule follow-up” is recommended, but no one schedules, confirms, or records it in the system of record. Outreach is logged, but refill pickup, appointment attendance, or lab completion isn’t verified – so dashboards look busy while contract metrics don’t move. Or the member is willing, but the bottleneck is capacity, authorization, benefits, or logistics – legacy constraints conversation alone can’t remove. These are all versions of the same root issue: talk isn’t connected to an execution engine.

A third lens on the same idea comes from preventive care. Client reminders are the intercom of cancer screening: they increase engagement by telling people they’re due (letters, emails, automated calls), but they don’t remove the operational barriers that determine whether screening actually gets completed. In a recent review it was shown that reminders produce meaningful lift for low-friction actions; for example, colorectal screening via FOBT shows a median +11.5 percentage-point increase. (The Community Guide) But in cervical screening, the median effect in the updated evidence is much smaller (+2.8 percentage points), with an incremental lift of +3.7 points when layered on provider interventions, which is consistent with the idea that once remaining bottlenecks are logistical, engagement alone plateaus. (The Community Guide)

Patient navigation functions more like a control tower: it is explicitly designed to operationalize completion by reducing structural barriers and coordinating follow-through (scheduling, access troubleshooting, tracking, escalation). In the same evidence base, navigation shows larger gains where execution is the constraint; for example, colonoscopy uptake increases by a median +13.9 percentage points with a pooled RR 1.97, and cervical screening increases by a median +22.5 percentage points. (The Community Guide)

In value-based care, inaction is not a neutral baseline; it is a decision, with avoidable utilization and missed quality measures as the price. That’s why “voice AI” should be evaluated less on pure conversational fluency and more on whether it reliably converts interactions into completed work under real constraints. The practical questions are straightforward: What share of high-priority tasks complete end-to-end (not just attempted)? How is completion verified (appointments attended, medications picked up, labs completed), and where is that recorded? What happens when the first attempt fails, and how does escalation work? Does the tool integrate into existing workflows or create parallel ones? And what incentive and capacity mechanisms translate saved time into measurable outcomes?

Voice will be everywhere. Orchestration will be rarer. The organizations that outperform will treat voice as an interface, not the product: they will build the control tower – the “brain” that routes, executes, and verifies follow-through. A conductor to synchronize these wonderfully talented new soloists.

Healthcare doesn’t need more voice. It needs voice with a brain, and the discipline to turn conversation into completed care. In value-based care, performance is a system property – relying on “solo” heroics is an expensive way to run operations.

Contact Kira at: [email protected]

Less Intercoms, More Control Towers: Why Voice AI Isn’t the Point

Sign in with email

Create an account