Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology – a recent scoping review

Ullah, Ehsan; Parwani, Anil; Baig, Mirza Mansoor; Singh, Rajendra

doi:10.1186/s13000-024-01464-7

Table 1 Critical analysis of key reviews related to the LLM applications in diagnostic medicine

From: Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology – a recent scoping review

Authors	Study title	Area / Focus	Outcomes	Challenges
Eysenbach, Gunther [13]	The Role of Large Language Models in Diagnostic Medicine: A Literature Review with a Focus on Digital Pathology	To examine the current state of knowledge regarding the use of LLMs for diagnostic medicine, with a particular emphasis on their applications in digital pathology.	Studies have reported improved diagnostic accuracy and efficiency when pathologists incorporate LLM-based tools in their workflow.	LLMs heavily rely on the training data available to them, and biases in the data can result in biased or erroneous outputs. Efforts are required to ensure diverse and representative training datasets. - Limited interpretability of LLMs in understanding the underlying rationale for their predictions is a significant challenge, particularly in complex digital pathology cases. - Data privacy, security, and ethical concerns arise when integrating LLMs in clinical settings, emphasizing the need for robust frameworks and guidelines.
Muftić, Fatima et al. [9]	Title: Review of ChatGPT-based Diagnostic Medicine Applications	To explore the current state of knowledge regarding the use of ChatGPT in diagnostic medicine and its specific applications within this field	- ChatGPT can serve as a conversational agent, providing clinicians with real-time access to medical knowledge and literature, aiding in clinical decision-making	ChatGPT’s responses are generated based on statistical patterns in the training data and may lack contextual understanding or accuracy in specific medical scenarios.
Hariri, Walid [5]	Lack of Contextual Understanding in ChatGPT Responses	This study examined the contextual understanding of ChatGPT in the context of medical diagnoses.	It revealed limitations in the model’s ability to accurately interpret and respond to nuanced clinical scenarios, leading to potential inaccuracies or incomplete information.	The study emphasized the importance of cautious interpretation and validation of ChatGPT-generated responses by healthcare professionals.
Ma, Y [14].	The potential application of ChatGPT in gastrointestinal pathology	This study evaluated the biases present in ChatGPT responses by analyzing the model’s outputs in various medical scenarios.	The study highlights ability to summarize patients’ charts, its potential application in Digital Pathology, education, and research.	The study mentions the potential bias based on the datasets used in its training, the requirement of sufficient input information, as well as concerns related to bias, transparency, and generating inaccurate content.
Gregory Brennan [15]	Using ChatGPT to Write Pathology Results Letters	Utilizing ChatGPT to generate pathology results letters could automate the process, saving time and effort for pathologists.	ChatGPT lacks true understanding and context, which can be critical in pathology reports. Pathology findings may vary significantly depending on patient history, clinical context, and the specific case, and an AI language model may not be able to fully grasp these subtleties.	LLM models may struggle to handle rare or complex cases that require expert knowledge and interpretation. Uncommon findings might not be adequately covered in the training data, leading to potentially incorrect or inadequate results.
Sun et al. [2]	PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology	The researchers present PathAsst as a generative foundation AI assistant designed to improve diagnostic and predictive analytics in pathology.	PathAsst leverages the capabilities of the ChatGPT/GPT-4 language model, generating over 180,000 instruction-following samples. Additionally, they devise pathology-specific instruction-following data to allow PathAsst to interact effectively with pathology-specific models, enhancing its diagnostic capabilities.	The use of large language models and multimodal techniques can potentially enhance the accuracy and efficiency of pathology diagnostics, leading to improved patient care. However, to fully understand the findings and the impact of PathAsst, it is essential to read the full research paper, including the methodology, experimental results, and potential limitations.
Sorin et al. [16]	Large language model (ChatGPT) as a support tool for breast tumor board	The aim of this study is to evaluate ChatGPT as a support tool for breast tumor board decisions making. We inserted into ChatGPT-3.5 clinical information of ten consecutive patients presented in a breast tumor board in our institution. We asked the chatbot to recommend management.	ChatGPT’s recommendations were like the tumor board’s decisions. Mean scores while grading the chatbot’s summarization, recommendation, and explanation by the first reviewer were 3.7, 4.3, and 4.6 respectively. Mean values for the second reviewer were 4.3, 4.0, and 4.3, respectively.	Authors present initial results on the use of an LLM as a decision support tool in a breast tumor board. Given the significant advancements, it is warranted for clinicians to be familiar with the potential benefits and harms of the technology.

Back to article page

ISSN: 1746-1596

Contact us

Submission enquiries: journalsubmissions@springernature.com

Diagnostic Pathology

Contact us