Establishing a National Standard for Safe and Scalable AI in Healthcare

Duke University researchers have developed pioneering frameworks to evaluate the safety and performance of AI models in healthcare, advancing responsible AI adoption with scalable assessment tools for clinical settings.
Researchers at Duke University School of Medicine have introduced two groundbreaking frameworks aimed at evaluating the safety, performance, and reliability of large-language models (LLMs) utilized in healthcare settings. These innovative evaluation tools are published in reputable journals, including npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), and are designed to ensure that AI systems meet high standards of quality and accountability before they are integrated into clinical practice.
As AI continues to become integral in healthcare—supporting functions like generating clinical notes, summarizing patient interactions, and assisting in communication—there is a pressing need for rigorous and scalable assessment methods. The frameworks from Duke address this gap by providing structured evaluation criteria.
One such framework, SCRIBE, focuses on Ambient Digital Scribing tools. These AI systems produce clinical documentation from real-time patient conversations. SCRIBE employs expert reviews, automated scoring, and simulated testing to evaluate accuracy, fairness, coherence, and resilience. According to Dr. Chuan Hong, the framework aims to reduce documentation burden on clinicians while preventing unintended biases or omissions that could compromise care quality.
A second framework evaluates large-language models used by platforms like Epic for managing patient messages. This assessment compares clinician feedback with automated metrics, analyzing clarity, completeness, and safety to identify areas needing improvement. Co-author Dr. Michael Pencina emphasizes that fostering responsible AI implementation requires continuous, rigorous evaluation throughout a system’s lifecycle, not just initial testing.
These frameworks are crucial for establishing responsible AI adoption, offering healthcare leaders, developers, and regulators the tools needed to appraise and monitor AI performance. Ultimately, they help ensure AI technologies enhance care delivery safely without eroding trust or safety standards.
For more details, studies, and the full evaluation frameworks, visit the original publications in npj Digital Medicine and JAMIA, provided by Duke University.
Stay Updated with Mia's Feed
Get the latest health & wellness insights delivered straight to your inbox.
Related Articles
Promising Results in EGFR-Mutated Lung Cancer with Iza-bren and Osimertinib Combination Therapy
A new study demonstrates that combining iza-bren with osimertinib yields a 100% response rate in patients with EGFR-mutated non-small cell lung cancer, offering hope for improved first-line treatment options.
Review Highlights Effective Strategies to Modify Vaping Harm Perceptions and Influence Behavior
A systematic review explores effective interventions to change perceptions of vaping harm, highlighting how accurate communication can influence smoking and vaping behaviors and support harm reduction efforts.
Children Account for One-Third of E-Scooter Fatalities in Australia
A startling investigation reveals that one-third of e-scooter fatalities in Australia involve children under 18, mainly due to collisions with cars. The rising trend calls for stronger safety measures and regulation enforcement.
The Rise of Ice Baths: Benefits, Risks, and Safety Tips
Ice baths have surged in popularity as a wellness trend, but they carry significant health risks. Learn about their benefits, dangers, and safety tips for safe use.



