Artificial Intelligence December 15, 2023

Beyond Human Limits: The Potential and Precautions of AI in Medical Diagnostics

by Barry P Chaiken, MD

Integrating Artificial Intelligence (AI) in medical diagnostics has been a subject of extensive research and debate. A recent study published in the New England Journal of Medicine AI, “Use of GPT-4 to Diagnose Complex Clinical Cases,” provides critical insights into the capabilities and limitations of AI in this field.

Study Overview

The study assessed the performance of GPT-4, a large language model (LLM), in diagnosing complex medical cases. It compared GPT-4’s diagnostic success rate with that of medical journal readers. The AI model correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers. Despite this success, the study underscores the need for further improvements, validation, and addressing ethical considerations before clinical implementation.

Key Findings

Performance Assessment: GPT-4 was tested on 38 clinical case challenges, correctly diagnosing an average of 57% of cases. This performance was consistent over time and across different versions of GPT-4.

Temporal Analysis: The study included a temporal analysis to assess GPT-4’s performance on cases published before and after its training data cutoff in September 2021. GPT-4 showed better performance on cases published after this date.

Limitations: The study acknowledges limitations, including the poorly characterized population of human journal readers and the unrealistic assumption of independent answers among them.

Steps for Reliable AI Use in Diagnostics

Robust and Diverse Data Sets: AI models like GPT-4 must be trained on comprehensive, diverse, high-quality datasets. Proper training includes data from various demographics, medical conditions, and global regions, especially from low-income countries.

Clinical Validation and Trials: Before clinical implementation, AI models must undergo rigorous clinical validation and trials to ensure their safety, efficacy, and accuracy in real-world settings.

Ethical and Regulatory Considerations: Addressing ethical implications, such as transparency and data privacy, is crucial. Providers using AI in treating patients require regulatory compliance with healthcare standards and establishing patient consent protocols.

Continuous Improvement and Monitoring: AI models should be continuously updated and monitored for performance, incorporating new medical research and feedback from healthcare professionals. Frequent validation of the models helps ensure their output is consistent with intended outcomes. 

Integration with Healthcare Systems: Effective integration of AI into existing healthcare systems, including compatibility with electronic health records and diagnostic equipment, is essential for practical utility. Interoperability remains a crucial obstacle.

Education and Training for Healthcare Professionals: Before using AI in patient care, healthcare professionals require training on properly using these tools in diagnostics, including interpreting outputs and integrating them into clinical decision-making.

Human Oversight: AI should be a supportive tool for decision-making with human oversight rather than replacing physicians.

The study of GPT-4’s performance in diagnosing complex clinical cases highlights the potential of AI in healthcare. However, it also emphasizes the need for careful and responsible integration of these technologies into medical practice. By following a path of careful development and frequent validation, clinicians can harness AI’s power to improve diagnostic accuracy and patient outcomes while ensuring safety, reliability, and ethical use.

Source: Use of GPT-4 to Diagnose Complex Clinical Cases, NEJM AI, November 9, 2023

I look forward to your thoughts, so please submit your comments in this post and subscribe to my weekly newsletter, “What’s Your Take?” on

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.