Establishing a National Standard for Safe and Scalable AI in Healthcare

Duke University researchers have developed pioneering frameworks to evaluate the safety and performance of AI models in healthcare, advancing responsible AI adoption with scalable assessment tools for clinical settings.

2 min read

Researchers at Duke University School of Medicine have introduced two groundbreaking frameworks aimed at evaluating the safety, performance, and reliability of large-language models (LLMs) utilized in healthcare settings. These innovative evaluation tools are published in reputable journals, including npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), and are designed to ensure that AI systems meet high standards of quality and accountability before they are integrated into clinical practice.

As AI continues to become integral in healthcare—supporting functions like generating clinical notes, summarizing patient interactions, and assisting in communication—there is a pressing need for rigorous and scalable assessment methods. The frameworks from Duke address this gap by providing structured evaluation criteria.

One such framework, SCRIBE, focuses on Ambient Digital Scribing tools. These AI systems produce clinical documentation from real-time patient conversations. SCRIBE employs expert reviews, automated scoring, and simulated testing to evaluate accuracy, fairness, coherence, and resilience. According to Dr. Chuan Hong, the framework aims to reduce documentation burden on clinicians while preventing unintended biases or omissions that could compromise care quality.

A second framework evaluates large-language models used by platforms like Epic for managing patient messages. This assessment compares clinician feedback with automated metrics, analyzing clarity, completeness, and safety to identify areas needing improvement. Co-author Dr. Michael Pencina emphasizes that fostering responsible AI implementation requires continuous, rigorous evaluation throughout a system’s lifecycle, not just initial testing.

These frameworks are crucial for establishing responsible AI adoption, offering healthcare leaders, developers, and regulators the tools needed to appraise and monitor AI performance. Ultimately, they help ensure AI technologies enhance care delivery safely without eroding trust or safety standards.

For more details, studies, and the full evaluation frameworks, visit the original publications in npj Digital Medicine and JAMIA, provided by Duke University.

Establishing a National Standard for Safe and Scalable AI in Healthcare

Stay Updated with Mia's Feed

Related Articles

Predicting Mpox Severity Using Viral Load Tests at Symptom Onset

Education Enhances In-Home Gun Safety Awareness, Study Finds

Federal Analysis Reveals High Costs of Implementing Medicaid Work Requirements in Georgia

Innovative Meta-Atlas Sheds Light on Brain Development and Disorders

Menstrual Cycle Influences Women's Reaction Time More Than Physical Activity, But Exercise Has a Stronger Impact

Housing Associations Outperform Government Initiatives in Supporting Long-Term Unemployed in UK Deprived Areas

Enhancing Patient Engagement and Health Literacy Through Digital Therapeutic Approaches

Innovative Portable Device Enables Real-Time Monitoring of Alzheimer's Disease Progression

How Patient Personas Are Transforming Healthcare Delivery

Genetic Discoveries Link to the Most Common Pediatric Bone Cancer

Exploring Cross-Cultural Adaptations of Mental Health Education Through Global Research