Establishing a National Standard for Safe and Scalable AI in Healthcare

Duke University researchers have developed pioneering frameworks to evaluate the safety and performance of AI models in healthcare, advancing responsible AI adoption with scalable assessment tools for clinical settings.
Researchers at Duke University School of Medicine have introduced two groundbreaking frameworks aimed at evaluating the safety, performance, and reliability of large-language models (LLMs) utilized in healthcare settings. These innovative evaluation tools are published in reputable journals, including npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), and are designed to ensure that AI systems meet high standards of quality and accountability before they are integrated into clinical practice.
As AI continues to become integral in healthcare—supporting functions like generating clinical notes, summarizing patient interactions, and assisting in communication—there is a pressing need for rigorous and scalable assessment methods. The frameworks from Duke address this gap by providing structured evaluation criteria.
One such framework, SCRIBE, focuses on Ambient Digital Scribing tools. These AI systems produce clinical documentation from real-time patient conversations. SCRIBE employs expert reviews, automated scoring, and simulated testing to evaluate accuracy, fairness, coherence, and resilience. According to Dr. Chuan Hong, the framework aims to reduce documentation burden on clinicians while preventing unintended biases or omissions that could compromise care quality.
A second framework evaluates large-language models used by platforms like Epic for managing patient messages. This assessment compares clinician feedback with automated metrics, analyzing clarity, completeness, and safety to identify areas needing improvement. Co-author Dr. Michael Pencina emphasizes that fostering responsible AI implementation requires continuous, rigorous evaluation throughout a system’s lifecycle, not just initial testing.
These frameworks are crucial for establishing responsible AI adoption, offering healthcare leaders, developers, and regulators the tools needed to appraise and monitor AI performance. Ultimately, they help ensure AI technologies enhance care delivery safely without eroding trust or safety standards.
For more details, studies, and the full evaluation frameworks, visit the original publications in npj Digital Medicine and JAMIA, provided by Duke University.
Stay Updated with Mia's Feed
Get the latest health & wellness insights delivered straight to your inbox.
Related Articles
Exploring the Link Between Abnormal Uterine Bleeding and Insulin Resistance
Emerging research reveals a significant link between abnormal uterine bleeding and insulin resistance, highlighting the importance of metabolic health in gynecological conditions. Studies suggest addressing insulin resistance may help prevent and manage AUB effectively.
Exploring the Role of Artificial Intelligence in Modern Rheumatology
Discover how artificial intelligence is revolutionizing rheumatology through improved diagnosis, disease monitoring, and patient communication with the latest research from EULAR 2025.
Nearly 19 Million Children Are Growing Up in Households with Parents Suffering from Substance Use Disorders
Approximately 19 million children in the U.S. live with parents affected by substance use disorders, raising concerns about their health and well-being. Learn about the scope and implications of this issue.
Study Finds Taurine Unlikely to Be a Reliable Biomarker for Aging
New research indicates that circulating taurine levels are not reliable biomarkers for aging across species, highlighting the complexity of biological aging processes.