Navigating Risk: Standards Required for AI-Generated Clinical Summaries

by Barry P Chaiken, MD

The advent of artificial intelligence (AI) in generating clinical summaries has presented unparalleled opportunities and significant challenges. We must navigate these waters with caution and foresight as we stand on the cusp of integrating these technologies into everyday clinical practice.

A recent article in JAMA highlights the potential of large language models (LLMs) like ChatGPT-4 in streamlining information gathering from electronic health records (EHR). It underscores the necessity for transparent development of standards for LLM-generated clinical summaries and pragmatic clinical studies to ensure their safe and prudent deployment. This resonates deeply with my conviction that as we embrace AI in clinical settings, we must prioritize accuracy, transparency, and standards to mitigate misinformation risks.

AI-generated clinical summaries can significantly impact patient care and medical research. They promise to alleviate physician burnout and enhance clinical decision-making by providing succinct, relevant, and accurate summaries of complex patient data. However, without stringent standards and clear indications that these summaries are AI-generated, there is a risk of introducing biases and inaccuracies into patient records—a scenario we must diligently avoid to ensure the highest patient care and safety standards.

Variability and Clinical Decision-Making

LLMs, by design, do not produce a single, definitive output for a given input. Instead, they generate summaries based on a complex interplay of factors, including the training data they have been exposed to and the nuances of the algorithms that drive their predictions. This means that even with identical prompts or clinical information, the summaries produced by LLMs can differ in detail, such as the conditions listed, the clinical history elements emphasized, or the organization and phrasing of the summary itself. The JAMA article illustrates this with examples where ChatGPT-4 prompts summarized deidentified clinical documents. This resulted in summaries that varied in the patient conditions listed and the clinical history elements emphasized.

Clinical Implications of Variability

This variability is not merely a technical issue; it has profound clinical implications. Organizing and framing information in a clinical summary can influence clinician interpretations and subsequent decisions. Differences in summaries can nudge clinicians towards different diagnostic or treatment paths, intentionally or unintentionally, based on the information emphasized or omitted. This is particularly concerning given the high stakes in medical decision-making and the potential for such variability to impact patient care and outcomes. Also, unreliable EHR data degrades its value in medical research.

To this end, standards development should focus on accuracy and include measures to test for biases and errors that could have clinical implications. Such standards should be the product of a collective effort involving not just technology developers but also clinicians, regulatory bodies, and other stakeholders in the healthcare ecosystem. Furthermore, AI-generated summaries should be rigorously tested in clinical settings to quantify their benefits and potential risks before widespread adoption. While it is still unclear whether the responsibility for testing falls on the FDA, the technology companies developing these products, or the institutions implementing them, deploying “black box” LLMs without oversight is dangerous.


The probabilistic nature of LLMs presents a complex challenge for their application in generating clinical summaries. While these models hold the promise of streamlining the gathering of information from EHRs and improving the efficiency of clinical documentation, addressing the variability and uncertainty they introduce is crucial. Developing rigorous standards and conducting clinical studies to evaluate the impact of this variability on patient care are critical steps toward the safe and effective integration of AI technologies in healthcare.

Source: AI-Generated Clinical Summaries Require More Than Accuracy, JAMA, January 29, 2024

I look forward to your thoughts, so please submit your comments in this post and subscribe to my bi-weekly newsletter, Future-Primed Healthcare on LinkedIn.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.