Corti's Symphony AI Surpasses OpenAI in Medical Speech-to-Text Accuracy

Copenhagen-based artificial intelligence firm Corti has unveiled Symphony for Speech-to-Text, a cutting-edge suite of clinical-grade speech recognition models. This new generation of AI is engineered for high-fidelity real-time dictation, conversational transcription, and batch audio processing, boasting the highest accuracy rates documented for this specific vertical.

“Our core mission is to ensure that our AI scribes earn the unwavering trust of physicians, medical practitioners, patients, and indeed, the entire healthcare ecosystem,” stated Andreas Cleve, co-founder and CEO of Corti, in an exclusive interview with VentureBeat.

The performance metrics released by Corti underscore a significant trend in enterprise AI: in highly regulated and specialized sectors, domain-specific models can achieve superior results compared to those from large, generalist foundation model providers.

In a recently published research paper, Corti detailed how its new clinical-grade speech models achieved up to a 93% reduction in word error rates (WER) when benchmarked against leading generalist speech models and APIs on medical terminology.

Specifically, for English medical terminology, Symphony for Speech-to-Text recorded an exceptionally low WER of 1.4%. In contrast, leading generalist models demonstrated significantly higher error rates: OpenAI’s speech model registered 17.7% WER, ElevenLabs achieved 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%.

Corti’s announcement marks a pivotal moment for developers within the healthcare industry. While broad-domain transcription services like OpenAI’s Whisper may suffice for general purposes, they often falter when confronted with specialized medical acronyms, complex drug dosages, clinical shorthand, and the challenging acoustic environments of settings like emergency rooms. Symphony for Speech-to-Text addresses this gap by offering developers a specialized, production-ready API meticulously designed for clinical workflows from the ground up.

The Agentic Era Necessitates Flawless Data Input

The introduction of Symphony for Speech-to-Text signifies a fundamental evolution in how healthcare leverages voice technology. Historically, medical speech recognition primarily focused on generating static text documents for physicians to review—essentially a digital replacement for a notepad.

However, as the healthcare sector rapidly advances into what industry experts term the “agentic era”—characterized by autonomous AI agents actively participating in clinical decision-making, Electronic Health Record (EHR) navigation, and real-time patient support—the transcribed text is no longer the endpoint. It is the critical foundational data layer.

“Voice has consistently been one of the most vital inputs in healthcare,” Cleve explained in a statement provided to VentureBeat. “What is changing is the subsequent processing of that captured speech. In the agentic era, speech recognition must transcend mere transcript generation; it needs to provide AI systems with precise clinical facts for accurate reasoning. A misheard medication, dosage, or symptom can compromise the reliability of every subsequent step. Symphony for Speech-to-Text equips healthcare builders with a speech layer accurate enough to navigate the complexities of clinical reality.”

This is where the compounding risk associated with high word error rates becomes acutely apparent. If a general-purpose AI model misinterprets a transcription—for instance, rendering “hyperthyroidism” as “hypothyroidism” or incorrectly noting a critical medication dosage—any downstream AI agent relying on that transcript will operate with fundamentally flawed data. Corti’s architecture is designed to mitigate this risk by delivering structured, clinically actionable output directly from the API, enabling downstream AI applications to reason based on accurate facts rather than imprecise, unformatted text.

The impact is particularly evident in Corti’s entity recall benchmarks. Symphony for Speech-to-Text achieved an exceptional recall rate of 98.3% for formatted clinical entities, including dosages, measurements, and dates. In stark contrast, Corti reported that even the strongest general-purpose baseline model achieved a maximum recall of only 44.3% for the same entities. This substantial 54% discrepancy highlights the critical difference between a tool that enhances physician efficiency and one that could introduce significant medical liability.

Challenging Industry Incumbents

While Corti’s benchmarks against contemporary LLM providers like OpenAI and ElevenLabs are compelling, the company is also setting its sights on established medical transcription giants. For years, Dragon Medical One has been considered the benchmark for dedicated clinician dictation. However, these legacy systems were primarily optimized for intentional dictation rather than serving as the underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools.

In evaluations involving real-world English medical dictation, Corti achieved a WER of 4.6%, surpassing Dragon’s 5.7%—a relative improvement of 19%. Furthermore, Corti demonstrated superior medical term recall compared to Dragon (93.5% versus 92.9%). By offering this level of accuracy through an API endpoint, Corti empowers third-party developers, EHR vendors, and virtual care platforms to create their own advanced dictation and ambient listening tools that exceed the performance of legacy industry leaders.

“Our objective is to foster the development of applications built upon our models,” Cleve remarked. “The ultimate goal is to disseminate the technology widely, ensuring it can provide maximum benefit to patients, doctors, and healthcare professionals.”

For Cleve and his co-founders, this mission is deeply personal. Cleve’s mother, a healthcare professional, suffered a severe injury from a patient attack and faced a long recovery. His endeavor to enhance healthcare processes is a tribute to her resilience.

Addressing the Global Healthcare Model Challenge

The demands of the healthcare sector extend far beyond English-speaking regions, and global health systems have historically been underserved by clinical Natural Language Processing (NLP) models. Corti’s new models are already being adopted by early users in linguistically complex environments, validating the technology’s effectiveness in demanding international markets.

Switzerland, for example, requires healthcare delivery across multiple languages, often within the same medical institution. This makes it a rigorous testing ground for multilingual medical speech models. Corti’s Symphony models have shown remarkable performance gains in non-English applications, achieving a WER of 2.4% in German (compared to 13.0% for the next best system) and 3.9% in French (versus 10.6%).

“In any clinical conversation, precision is paramount—a missed medication name, a misheard dosage, or an inaccurately transcribed symptom can fundamentally alter the meaning of an encounter,” said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement. “The accuracy of Symphony’s clinical terminology provides us with a robust foundation to integrate more trusted AI capabilities into clinical workflows via our Voicepoint Xenon platform. Enhancements in the speech layer, driven by Corti, lead to sharper, safer, and more effective workflows for clinicians in Switzerland.”

AI Verticalization and Specialization Drive Significant Gains

Today’s launch of Symphony for Speech-to-Text is not an isolated event but rather the latest development in Corti’s strategic focus on vertical AI. The broader Symphony platform, which supports clinical and administrative applications for a global network of EHR vendors and life sciences organizations, has consistently demonstrated the competitive advantage of specialized AI labs over broad-market tech giants.

This announcement follows two other major benchmark releases from Corti in recent weeks, each highlighting advancements in different areas of healthcare AI performance.

In April, the company reported that its Symphony for Medical Coding system outperformed general-purpose models by over 25% in clinical accuracy benchmarks, addressing one of healthcare’s most complex workflows.

Just last week, Corti announced that its primary clinical-grade model surpassed OpenAI’s performance on HealthBench Professional, a benchmark developed by OpenAI itself.

Collectively, these three milestones—medical coding accuracy, clinical reasoning capabilities, and speech-to-text performance—point to a growing industry consensus: generalized AI models are encountering limitations in highly regulated sectors.

AI models deployed in healthcare settings must possess a deep understanding of complex acronyms, sudden conversational interruptions, medical shorthand, specialty-specific terminology, and stringent compliance requirements. By training specifically on these unique edge cases, vertical AI developers like Corti are building a significant competitive advantage that companies relying solely on API calls to generalized large language models find difficult to surmount.

Product Availability and Offerings

The demonstrated performance gap is clearly resonating with developers. Corti reports a 30% increase in new platform sign-ups quarter-to-date, indicating a strong developer migration towards vertical, clinical-grade AI models over generalist APIs.

Corti, which already serves over 100 million patients annually across major health systems including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the foundational engine for the next generation of healthcare software.

It is important to clarify that Corti is not launching the overarching Symphony platform today; rather, Symphony for Speech-to-Text represents a new, distinct capability within Corti’s existing ecosystem, accessible via dedicated API endpoints.

Symphony for Speech-to-Text is available starting today. Developers and enterprise architects can access the models through the Corti API console. Comprehensive technical documentation is provided to facilitate the integration of this clinical-grade speech layer into existing applications.

In a commitment to transparency, Corti has also published its detailed research paper outlining its methodology, alongside a separate comparison tool designed to enable industry-wide, transparent evaluation of medical speech recognition systems.

As the healthcare industry accelerates its adoption of AI-driven automation, the integrity of the foundational data layer becomes increasingly critical. Corti’s latest release serves as a potent reminder that in the medical field, generic AI solutions are insufficient. The future of specialized AI is here.

Business Style Takeaway: The superior accuracy of specialized AI models like Corti’s Symphony for Speech-to-Text in niche, regulated industries highlights a critical business strategy: vertical AI development can unlock significant competitive advantages over generalized solutions. Businesses operating in complex sectors should evaluate whether tailored AI solutions offer the precision and reliability needed to reduce risk and drive innovation, especially where data integrity is paramount.

Details can be found on the website : venturebeat.com

No votes yet.

Please wait...

Corti’s Symphony AI Surpasses OpenAI in Medical Speech-to-Text Accuracy

The Agentic Era Necessitates Flawless Data Input

Challenging Industry Incumbents

Addressing the Global Healthcare Model Challenge

AI Verticalization and Specialization Drive Significant Gains

Product Availability and Offerings

Leave a ReplyCancel Reply