Establishing a National Standard for Safe and Scalable AI in Healthcare

Duke University researchers have developed pioneering frameworks to evaluate the safety and performance of AI models in healthcare, advancing responsible AI adoption with scalable assessment tools for clinical settings.
Researchers at Duke University School of Medicine have introduced two groundbreaking frameworks aimed at evaluating the safety, performance, and reliability of large-language models (LLMs) utilized in healthcare settings. These innovative evaluation tools are published in reputable journals, including npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), and are designed to ensure that AI systems meet high standards of quality and accountability before they are integrated into clinical practice.
As AI continues to become integral in healthcare—supporting functions like generating clinical notes, summarizing patient interactions, and assisting in communication—there is a pressing need for rigorous and scalable assessment methods. The frameworks from Duke address this gap by providing structured evaluation criteria.
One such framework, SCRIBE, focuses on Ambient Digital Scribing tools. These AI systems produce clinical documentation from real-time patient conversations. SCRIBE employs expert reviews, automated scoring, and simulated testing to evaluate accuracy, fairness, coherence, and resilience. According to Dr. Chuan Hong, the framework aims to reduce documentation burden on clinicians while preventing unintended biases or omissions that could compromise care quality.
A second framework evaluates large-language models used by platforms like Epic for managing patient messages. This assessment compares clinician feedback with automated metrics, analyzing clarity, completeness, and safety to identify areas needing improvement. Co-author Dr. Michael Pencina emphasizes that fostering responsible AI implementation requires continuous, rigorous evaluation throughout a system’s lifecycle, not just initial testing.
These frameworks are crucial for establishing responsible AI adoption, offering healthcare leaders, developers, and regulators the tools needed to appraise and monitor AI performance. Ultimately, they help ensure AI technologies enhance care delivery safely without eroding trust or safety standards.
For more details, studies, and the full evaluation frameworks, visit the original publications in npj Digital Medicine and JAMIA, provided by Duke University.
Stay Updated with Mia's Feed
Get the latest health & wellness insights delivered straight to your inbox.
Related Articles
Promising New Treatment for Aggressive Acute Myeloid Leukemia Through Clinical Trial Findings
A clinical trial led by Roswell Park has revealed ziftomenib as a promising targeted therapy for aggressive NPM1-mutated acute myeloid leukemia, offering hope for improved patient outcomes.