828
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A Systematic Review of the Limitations and Associated Opportunities of ChatGPT

ORCID Icon, ORCID Icon & ORCID Icon
Received 15 Dec 2023, Accepted 12 Apr 2024, Published online: 08 May 2024

Abstract

This systematic review explores the limitations and opportunities associated with ChatGPT's application across various fields. Following a rigorous screening process of 485 studies identified through searches in Scopus, Web of Science, ERIC, and IEEE Xplore databases, 33 high-quality empirical studies were selected for analysis. The review identifies five key limitations: accuracy and reliability concerns, limitations in critical thinking and problem-solving, multifaceted impacts on learning and development, technical constraints related to input and output, and ethical, legal, and privacy concerns. However, the review also highlights five exciting opportunities: educational support and skill development, workflow enhancement, information retrieval, natural language interaction and assistance, and content creation and ideation. While this review provides valuable insights, it also highlights some gaps. Limited transparency in the studies regarding specific ChatGPT versions used hinders generalizability. Additionally, the extent to which these findings can be transferred to more advanced models like ChatGPT-4 remains unclear. By acknowledging both limitations and opportunities, this review offers a foundation for researchers, developers, and practitioners to consider when exploring the potential and responsible application of ChatGPT and similar evolving AI tools.

1. Introduction

In the rapidly evolving landscape of natural language processing, OpenAI's ChatGPT, introduced in November 2022, has emerged as a groundbreaking artificial intelligence model with versatile applications across various domains (Bubeck et al., Citation2023). Leveraging cutting-edge deep learning techniques, ChatGPT has demonstrated remarkable capabilities, producing well-written and coherent responses in conversational-style interactions (Hassani & Silva, Citation2023; Ray, Citation2023). As the demand for sophisticated language models continues to surge, it is crucial to critically assess the limitations that may hinder language models’ optimal performance. Recognizing and understanding these limitations is crucial for researchers, developers, and end-users alike.

Given ChatGPT's recent introduction in 2022, existing research is notably limited in its exploration of the model’s applications, primarily focusing on specific domains such as business, education, or academic research (e.g., Lo, Citation2023; Rahman et al., Citation2023; Singh & Singh, Citation2023). Reviews confined to specific fields risk obscuring hidden strengths or shortcomings that may manifest across various applications. Therefore, a comprehensive review is essential to unveil the broader spectrum of challenges and potentials inherent in ChatGPT's deployment across diverse domains.

This review delves into the wealth of empirical studies conducted on ChatGPT to unveil documented constraints that may encompass accuracy, ethical issues, and other technical limitations. Beyond an examination of limitations, our review endeavors to shed light on the opportunities that arise from understanding and addressing these constraints. By elucidating pathways for improvement, we aim to contribute to the ongoing discussion covering the enhancement of ChatGPT’s utility in various domains.

Accordingly, this systematic review endeavors to explore and synthesize the existing empirical literature to address two pivotal research questions:

  • What limitations of ChatGPT are documented in the prior empirical literature?

  • In light of the limitations identified, what opportunities exist for enhancing the utilization of ChatGPT?

This systematic exploration of ChatGPT's limitations and associated opportunities not only serves as a comprehensive resource for researchers and practitioners but also aims to foster a deeper understanding of the nuances involved in leveraging state-of-the-art language models. As we navigate the intricate landscape of artificial intelligence, a nuanced understanding of ChatGPT's strengths and weaknesses is paramount for harnessing its full potential and driving innovation in natural language processing.

2. A brief overview of research background on ChatGPT

Research has convincingly demonstrated that ChatGPT offers significant advantages and can contribute to various fields of study. Notably, it assists experts across different disciplines in composing reports for various experiments. For instance, Aydın and Karaarslan (Citation2023) found that ChatGPT can be a valuable tool for paraphrasing and academic writing in the healthcare field. Similarly, a study by Kumar (Citation2023) demonstrated that within biomedical sciences, ChatGPT can be employed to produce well-organized and grammatically sound English academic writing. Beyond these specific examples, ChatGPT offers broader advantages for users. It can address complex inquiries by providing comprehensive insights, ranging from general overviews to detailed analyses of intricate phenomena (Tan et al., Citation2023). Overall, the current body of research suggests that ChatGPT can be a valuable digital resource across diverse fields of study.

The literature highlights education as one of the primary domains where ChatGPT can make significant contributions (e.g., Bitzenbauer, Citation2023; Poole, Citation2022; Rudolph et al., Citation2023; Su & Yang, Citation2023). This potential, however, necessitates responsible use. Studies by Bitzenbauer (Citation2023) suggest that ChatGPT can enhance critical thinking skills among secondary school students in Germany. In another study, Poole (Citation2022) reported that ChatGPT benefits language teachers by assisting them in designing exercises and lesson plans. Additionally, ChatGPT can empower teachers to create personalized learning experiences and exercises tailored to individual student needs (Su & Yang, Citation2023). Furthermore, ChatGPT has the potential to revolutionize higher education, particularly in assessment, learning, and teaching methodologies (Rudolph et al., Citation2023).

While the integration of ChatGPT into the education sector appears promising, Su and Yang (Citation2023) advocate for careful consideration of several factors to maximize its effectiveness. These factors include determining the expected outcome, defining the appropriate level of automation, considering both the ethical and unethical aspects of use, and measuring the efficacy of ChatGPT in achieving the desired learning objectives.

The recommendations outlined by Su and Yang (Citation2023) for the field of education can be broadly applied to various fields of study where experts leverage ChatGPT for different purposes. In other words, it is crucial for experts in different fields to first determine their desired outcomes and then carefully consider the level of automation, ethical implications, and overall effectiveness of ChatGPT within their specific contexts. As an example, General Practitioners (GPs) writing reports to patients or colleagues could benefit from evaluating the following criteria: (1) identifying the clear purpose of the report, (2) considering the limitations and advantages of using ChatGPT for report writing (including the level of automation and ethical considerations), and (3) determining the appropriate level of human intervention to ensure accuracy, professionalism, and adherence to ethical guidelines. By following this approach, GPs can optimize the use of ChatGPT for report writing while maintaining control and responsibility for the final content.

Despite the promising applications, there is still a limited comprehensive understanding of ChatGPT's limitations and opportunities across fields, based on a synthesis of empirical findings. This systematic review aims to address this gap by critically examining existing empirical studies on ChatGPT.

3. Method

The current systematic review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for conducting and reporting systematic reviews, ensuring a transparent and methodologically rigorous approach throughout the review process (Page et al., Citation2021).

3.1. Search strategies

A comprehensive search strategy was implemented to locate primary studies that report empirical evidence on the limitations of ChatGPT. The search was conducted across various databases including Web of Science, Scopus, IEEE Xplore, and ERIC. Keywords utilized in our search comprised "ChatGPT" AND "limitations," along with synonyms such as "weakness," "drawback," "challenge," and "pitfall" (for detailed search strings, refer to Appendix A).

For establishing inclusion and exclusion criteria, studies were considered eligible if they were published as journal articles, provided empirical findings assessing ChatGPT's performance, and included a discussion on its limitations (details in ). Publications characterized as conceptual or lacking an empirically supported examination of ChatGPT's performance and limitations were excluded. No date constraints were imposed on the search process. Results were managed using Covidence, a web-based systematic review management tool facilitating deduplication and screening.

Table 1. Inclusion and exclusion criteria.

The initial screening involved evaluating titles and abstracts collaboratively by all three authors to identify potentially relevant studies. Subsequently, the screening process progressed to a full-text evaluation of studies identified during the initial screening phase. This thorough examination was conducted in duplicate by the authors to ensure rigor and comprehensiveness. Any discrepancies encountered during the initial and full-text screening were resolved through discussion and consensus among the authors or, when necessary, by consulting an additional member of the research team. This systematic approach aimed to enhance the reliability and precision of the study selection process in the systematic review.

3.2. Data analysis

A thematic analysis, following the guidelines outlined by Braun and Clarke (Citation2006), was utilized to uncover and categorize the reported limitations of ChatGPT across the diverse studies included. This methodical approach involved thoroughly familiarizing ourselves with each study to gain a deep understanding of ChatGPT's limitations. Subsequently, initial codes were systematically generated to organize key concepts into a structured data extraction table. Importantly, data extraction was conducted in duplicate by two of the authors to ensure precision and reliability in capturing the nuances of ChatGPT's limitations.

As we explored relationships between these codes, initial themes began to emerge, offering a holistic view of recurring patterns that represented more abstract categories of ChatGPT's limitations. The subsequent review and refinement of these themes aimed to ensure clarity and precision in encapsulating the multifaceted challenges identified in the included literature. This qualitative approach, rooted in thematic analysis and bolstered by the dual extraction performed by two authors, provided a rigorous and structured framework for synthesizing the diverse findings across the included studies, contributing to a comprehensive understanding of ChatGPT's limitations.

Data visualizations throughout this review were created using the Matplotlib library in Python (Hunter, Citation2007).

4. Findings

4.1. Overview of the included studies

presents the PRISMA flowchart of searching for and screening studies for eligibility in this review. A review of 33 studies identified a diverse range of fields where ChatGPT limitations were investigated (see ). The most prevalent field of study was health (48.48%), highlighting the growing interest in understanding potential limitations of large language models in critical healthcare applications. This focus on health suggests a cautious approach to ensure responsible use of ChatGPT in this domain. Education (15.15%) emerged as the second most frequent field, indicating concern about potential shortcomings in educational settings. Engineering (9.09%) and other fields like psychology, chemistry, and physics (all around 3%) were also represented, showcasing a broader exploration of limitations across various disciplines. This distribution underscores the widespread interest in evaluating ChatGPT's limitations across diverse application areas, with a particular emphasis on ensuring its safe and effective use in healthcare and education.

Figure 1. Flowchart of stages of the review.

Figure 1. Flowchart of stages of the review.

Figure 2. Distribution of the fields of research evaluating ChatGPT’s limitations.

Figure 2. Distribution of the fields of research evaluating ChatGPT’s limitations.

In terms of the version of ChatGPT employed in the studies reviewed, 23 studies (69.7%) did not report the specific version of ChatGPT used. The remaining studies provided version information, with ChatGPT-3 (15.15%) appearing most frequently, followed by versions 3.5 (12.12%) and a single study exploring a combination of 3.5 and 4 (3.03%). To investigate the potential influence of version differences on the limitations identified in this review, we conducted a sub-analysis of studies examining versions 3 and 3.5. This analysis revealed no significant discrepancies from the overall findings on limitations and opportunities discussed below. Due to the limited presence of research on ChatGPT-4 (only one study identified), a similar sub-analysis for this version was not feasible. The limitations associated with the scarcity of research on ChatGPT-4 and the generalizability of the review’s conclusions on ChatGPT limitations will be addressed later in our discussion of the limitations of the review.

4.2. RQ1: What limitations of ChatGPT are documented in prior empirical literature?

Our analysis of 33 studies identified limitations associated with ChatGPT. The most prevalent limitation concerned accuracy and reliability, with these issues found in 47.06% of the total instances identified across all studies. This highlights ChatGPT's potential to generate misleading or incorrect information. Limitations in critical and problem-solving thinking were present in 22.06% of instances, suggesting shortcomings in handling complex scenarios that require independent analysis. Ethical considerations, including potential biases and discriminatory outputs due to training data, were observed in 13.24% of instances, raising concerns about ethical, legal, and privacy issues. Furthermore, limitations in understanding context and suitability for in-depth exploration of specialized topics were found in 11.76% of ChatGPT interactions, potentially leading to adverse effects on users’ learning and development. Finally, 10.29% of instances suggested limitations in handling diverse inputs and outputs, potentially hindering the usability of ChatGPT for complex tasks. These findings underscore the need for continued development and responsible use of large language models like ChatGPT ( and ).

Figure 3. Limitations of ChatGPT (n = 71 instances of limitations).

Figure 3. Limitations of ChatGPT (n = 71 instances of limitations).

Table 2. Summary of the limitations of ChatGPT.

Table 3. Opportunities of ChatGPT identified in the included studies (n = 44 instances of opportunities).

4.2.1. Accuracy and reliability concerns

ChatGPT faces a significant limitation concerning its accuracy and reliability, particularly evident in evaluations within the health science domain (Ali, Citation2023; Ariyaratne et al., Citation2023; Clark, Citation2023). Ali’s (Citation2023) examination revealed significant factual inaccuracies, with ChatGPT providing unreliable and poorly informed responses, especially on contentious health issues. Wagner and Ertl-Wagner’s findings (2023) further underscored this concern, indicating up to 33% of responses by ChatGPT's to radiology questions were inaccurate, highlighting a substantial deficiency in accuracy within the medical domain.

Au Yeung et al. (Citation2023) tasked ChatGPT with predicting medical diagnoses based on clinical histories. While the AI provided overall high-quality responses in terms of relevance (83%), it missed crucial diagnoses in 60% of its outputs. This deficiency poses a significant risk, particularly in healthcare contexts, where ChatGPT is likely to generate misleading outputs, potentially perpetuating harmful health beliefs or reinforcing biases.

Fergus et al.'s (Citation2023) study in the pharmaceutical program domain of chemistry found inconsistencies in ChatGPT's responses to test questions. Each answer contained a different error attributed to technical randomness. Similarly, Hoch et al.'s (Citation2023) medical quiz study revealed significant domain-specific variations in ChatGPT's performance. It achieved a 72% accuracy rate for allergology, a field studying hypersensitive reactions of the immune system, questions, and the rest of the responses were inaccurate (Hoch et al., Citation2023).

Beyond health science, accuracy concerns persist. Clark’s (Citation2023) evaluation in a chemistry test resulted in a concerning accuracy rate: only 44% of responses were correct, falling below the average score of participants. This inaccuracy extended to medical assessments, with ChatGPT falling short of the required score in the GP test. Reliability issues were noted by Clark (Citation2023), Duong and Solomon (Citation2023), and Seth et al. (Citation2023), highlighting inconsistencies in ChatGPT's answers to identical questions and criticizing its suitability as a source of sample answers for examinations. Lai (Citation2023) explored the AI chatbot’s potential use in addressing inquiries of library service users and found that it performed poorly on advanced research questions, complex inquiries, and queries involving locally specific information.

Seth et al. (Citation2023) further exposed a troubling aspect of ChatGPT's behavior—the generation of fake references, labeled as hallucination of references. Similar findings on hallucinations of the chatbot were reported in the study by McIntosh et al. (Citation2024). Wagner and Ertl-Wagner (Citation2023) discovered that 63.8% of ChatGPT's references in response to radiology questions were fabricated, accentuating broader reliability concerns. Hoch’s extensive study involving a medical board certification test revealed domain-specific performance challenges, with ChatGPT's accuracy varying significantly by domain.

In summary, the themes of accuracy and reliability emerge as prominent limitations in ChatGPT. These limitations encompass technical inaccuracies, inconsistencies in responses, and domain-specific performance challenges.

4.2.2. Limitations in critical thinking and problem-solving

A second limitation concerns ChatGPT's capability for accomplishing critical thinking, problem-solving, and mathematical tasks (Cascella et al., Citation2023; Clark, Citation2023; Giannos & Delardas, Citation2023). Cascella et al.'s (Citation2023) evaluation, involving the composition of a medical note, highlighted deficiencies in addressing causal relations among health conditions, indicating inadequacy in complex reasoning. Clark (Citation2023) emphasized the model’s proficiency in addressing general questions over problem-solving or skill-specific queries, while Duong and Solomon’s (Citation2023) study revealed ChatGPT's preference for memory-based questions rather than critical thinking tasks. Sanmarchi et al. (Citation2023) assessed ChatGPT's ability to design studies and suggest plastic surgery options, revealing limitations in constructing conceptual frameworks and narrative structures. Seth et al.'s (Citation2023) examination of ChatGPT's responses to plastic surgery questions highlighted inadequacies in addressing specialized topics, particularly critical thinking skills for complex issues like thumb arthritis.

In educational contexts, Giannos and Delardas (Citation2023) reported ChatGPT's subpar performance on critical thinking and mathematical questions, with more incorrect than correct responses. Parsons and Curry’s (Citation2024) evaluation echoed these concerns. They assessed ChatGPT's capability in completing a graduate instructional design assignment for a 12th-grade media literacy course. The chatbot primarily provided superficial information and demonstrated a limited capacity to customize its responses or justify them with details. Rahman and Watanobe’s (Citation2023) scrutiny of ChatGPT's mathematical capabilities found dissatisfaction in generating codes and correcting errors, exposing weaknesses in basic mathematical tasks. Kortemeyer (Citation2023) further found that ChatGPT narrowly passed the introductory course in Physics and exhibited "many of the preconceptions and errors of a beginning learner" (p. 1).

The problem-solving capability of ChatGPT for coding practices was also questioned in the study by Shoufan (Citation2023), where the chatbot showed inconsistent responses and struggled to complete the given codes, even the ones it generated itself. Collectively, these findings underscore ChatGPT's limited capacity for critical thinking and problem-solving across diverse domains.

4.2.3. Multifaceted impact on learning and development

This section explores the multifaceted impact ChatGPT's responses might have on users’ learning and development. Concerns include learners’ potential overreliance on the tool leading to declines in critical thinking skills. Additionally, the risk of bias and incomplete information in ChatGPT's responses is another consideration. Finally, the potential psychological effects on vulnerable individuals seeking interaction and decision-making support from the AI tool warrant consideration.

In the realm of education, Alafnan et al. (Citation2023) highlighted the positive impact of ChatGPT in providing reliable input to answer test questions. However, they cautioned against overreliance and irresponsible use of the AI tool, emphasizing the potential consequences of "human unintelligence and unlearning" if not used judiciously (p. 60). Clark (Citation2023) echoed concerns about overreliance on ChatGPT, suggesting that excessive dependence could result in passivity and a decline in critical thinking skills among learners. Notably, the challenge of detecting logical fallacies in ChatGPT is a particular concern. The model’s ability to provide seemingly logical explanations, even when flawed, may mislead users who lack specific expertise in the subject matter.

Giannos and Delardas (Citation2023) assessed ChatGPT's capability for education and test preparation, concluding that while the AI chatbot is adept at providing tutoring support for general problem solving and reading comprehension, its limitations in scientific and mathematical knowledge and skills render it an unreliable independent tool for supporting students. They also underscored the potential for misuse, highlighting concerns about cheating and gaining unfair advantages during standardized admission tests.

Ibrahim et al. (Citation2023) raised the issue of potential bias in ChatGPT's responses, asserting that the model might be influenced by the dataset used for training, aligning more closely with the political and philosophical values of Western and more developed countries. Sallam et al. (Citation2023) echoed these concerns, particularly in medical education, where biased, outdated, and incomplete content in ChatGPT's responses could pose risks to learners. They noted potential adverse consequences, including discouraging critical thinking and communication skills among medical students.

Additionally, Stojanov (Citation2023) warned of the psychological impact on vulnerable individuals, such as those grieving or extremely shy, who may turn to ChatGPT for solace and interaction. Stojanov also highlighted the risk of individuals relying on the AI tool for crucial life decisions, potentially weakening their personal agency and responsibility. These varied concerns collectively emphasize the need for a cautious and informed approach to the integration of ChatGPT in educational settings.

4.2.4. Technical constraints related to input and output

The effectiveness of ChatGPT is further contingent upon technical constraints of its input and output. This limitation of ChatGPT-3 and ChatGPT-3.5 poses challenges, particularly in disciplines like mathematics and chemistry, where communication often involves signs and symbols. Fergus et al. (Citation2023) conducted examinations in the field of chemistry, revealing instances where ChatGPT struggled, particularly in tasks requiring the drawing of structures between reactants and products.

Furthermore, the efficacy of ChatGPT is influenced by the type of questions posed to it. Notably, the chatbot exhibited a significantly higher performance when responding to single-choice questions compared to multiple-choice questions (Hoch et al., Citation2023). In an extensive study encompassing 2,576 questions, Hoch et al. (Citation2023) observed a 63% accuracy rate for single-choice questions, in contrast to a 34% accuracy rate for multiple-choice questions.

The phrasing of prompts for ChatGPT responses is also a pivotal factor affecting the chatbot’s performance. Sallam et al. (Citation2023) acknowledged that the formulation of prompts, coupled with the word limit imposed on ChatGPT's output, could influence the amount of information generated, subsequently impacting the clarity and effectiveness of the responses. Similarly, Stojanov (Citation2023) reported that ChatGPT's inherent word limit in its output may result in responses containing incomplete information, posing challenges to comprehension.

4.2.5. Ethical, legal and privacy concerns

Previous studies have addressed academic integrity, legal, privacy, and ethical concerns associated with the use of ChatGPT (Au Yeung et al., Citation2023; Alafnan et al., Citation2023; Ibrahim et al., Citation2023; Sallam et al., Citation2023; Sanmarchi et al., Citation2023). Academic integrity emerges as a prominent concern, particularly in light of the challenges posed by most plagiarism detection software in identifying content generated by ChatGPT. Fergus et al. (Citation2023) conducted an examination using Turnitin to assess plagiarism in ChatGPT's output, concluding that the Turnitin report failed to raise any alerts necessitating further investigation into academic integrity (p. 1674). This inability to detect generated content raises concerns about the potential misuse of ChatGPT and its impact on academic honesty.

Furthermore, educators face challenges in distinguishing between students’ original work and content generated by ChatGPT, making assessment of individual abilities more complex. Alafnan et al. (Citation2023) argued that the high accuracy and reliability of ChatGPT's responses may impede instructors’ ability to differentiate between independently working students and those heavily reliant on automation. This, in turn, can compromise the evaluation of learning outcomes, causing a significant challenge in assessing students’ performance. The implications of ChatGPT on academic integrity, underscored by these studies, highlighting the need for careful consideration and regulation in its educational use.

Other legal and ethical issues, including privacy and copyright infringements, were also raised in the literature. The answers generated by ChatGPT raise privacy concerns that may lead to further legal ramifications (Au Yeung et al., Citation2023; Ibrahim et al., Citation2023; Sallam et al., Citation2023; Sanmarchi et al., Citation2023). Notably, the potential biases in ChatGPT's responses, possibly leaning towards specific political parties or perspectives, raise red flags regarding the validity of its content (Au Yeung et al., Citation2023). Sallam et al. (Citation2023) specifically assessed responses to health and public education prompts, revealing concerns about plagiarism, copyright issues, academic dishonesty, and the absence of personal and emotional interactions, which are essential for communication skills in healthcare education.

4.3. RQ2: In light of the limitations identified, what opportunities exist for enhancing the utilization of ChatGPT?

presents a list of opportunities for ChatGPT identified in this review, offering actionable insights for capitalizing on its strengths and capabilities.

4.3.1. Educational support and skill development

ChatGPT's impact on education is multifaceted. It provides educational content, aids in learning processes, and contributes to essential skills development. Scholars have discussed various ways ChatGPT can support this domain, including creating course materials, designing lesson plans and assessments, providing feedback, explaining complex knowledge, and personalizing the learning experience (Clark, Citation2023; Rahman & Watanobe, Citation2023). Day (Citation2023) suggests using ChatGPT to develop writing course materials. Drawing on Vygotsky’s sociocultural theory, Stojanov (Citation2023) discusses how ChatGPT could serve as a knowledgeable learning peer, aiding knowledge exploration. Similarly, Rahman et al. (Citation2023) discuss benefits for learners, educators, and researchers. Learners can employ ChatGPT as a learning assistant for exploring complex concepts, problem-solving, and receiving personalized guidance. Educators can leverage ChatGPT for lesson planning, generating customized resources and activities, answering student questions, and assisting with assessment. Researchers can improve their work by using ChatGPT to check and improve writing, request literature summaries, or suggest research ideas.

4.3.2. ChatGPT as a workflow enhancer

Beyond education, ChatGPT's ability to automate tasks and enhance professional workflows optimizes operational efficiency and resource utilization. In the construction industry, Prieto et al. (Citation2023) tested ChatGPT's application in creating a coherent and logical construction project schedule. Participants found it satisfactory and indicated its potential for automating preliminary and time-consuming tasks. Similarly, Sanmarchi et al. (Citation2023) suggests ChatGPT as a valuable tool for designing research studies and following international guidelines, for both experienced and less experienced researchers.

4.3.3. Information retrieval powerhouse

ChatGPT's prowess in retrieving and applying information across various domains empowers users with informed decision-making and problem-solving. Alafnan et al. (Citation2023) discovered that ChatGPT has the potential to function as a valuable platform for students seeking information on diverse topics. They asserted that ChatGPT's capabilities could potentially replace traditional search engines by offering students accurate and reliable information. Duong and Solomon (Citation2023) compared ChatGPT's ability to respond to genetics questions against human performance, revealing that the chatbot approached human-level proficiency. Stojanov (Citation2023) discusses how ChatGPT played a crucial role in providing valuable content, aiding in the ongoing pursuit of learning and exploration of new knowledge.

4.3.4. Natural language interaction and assistance

ChatGPT's ability to engage users in natural conversations and provide human-like assistance positions it as a valuable virtual companion. Lahat et al. (Citation2023) explored using ChatGPT to answer 110 real-life medical questions from patients, finding it relatively useful and satisfactory, albeit with moderate effectiveness. Other scholars interacted with the chatbot for tasks such as creating a construction project (Prieto et al., Citation2023) or discussing a plastic surgery topic (Seth et al., Citation2023). Prieto et al. (Citation2023) highlighted that the conversation-based chatbot is advantageous compared to other single-prompted AI tools as it allows users to modify project aspects as needed.

4.3.5. Content creation and ideation

Finally, ChatGPT facilitates creative content generation, text transformation, and ideation processes, making it a versatile tool for content creators and innovators. Ariyaratne et al. (Citation2023) discussed using ChatGPT for research, suggesting that "the format of articles generated by ChatGPT can be used as a draft template to write an expanded version of the article" (p. 4). Similarly, ChatGPT can be used to enhance research processes by assisting researchers in generating hypotheses, exploring literature, and translating research findings into a more understandable language (Cascella et al., Citation2023). In education, ChatGPT can be used to create course materials, such as for writing courses (Day, Citation2023). Regarding ideation capability, Clark (Citation2023) demonstrated that ChatGPT could be used to support problem conceptualization in chemistry education. A similar conclusion is reached in engineering education as Nikolic et al. (Citation2023) indicated that ChatGPT can support students by aiding in the generation of project ideas, providing information, assisting with project structure, delivering summaries, and offering feedback on ethical considerations and workplace health and safety risks associated with their projects. The text transforming function is another advantageous feature of this generative AI tool. Prieto et al. (Citation2023) indicated the use of ChatGPT is useful for transforming research writing into more readily understandable language ().

Figure 4. Opportunities for ChatGPT application.

Figure 4. Opportunities for ChatGPT application.

5. Discussion

This systematic review identified five key limitations associated with ChatGPT's application across diverse fields. Accuracy and reliability emerged as a primary concern, particularly in critical domains like healthcare (Fergus et al., Citation2023). Additionally, limitations were found in ChatGPT's ability to perform complex cognitive tasks such as critical thinking and problem-solving (Clark, Citation2023). Studies identified potential negative effects on learners’ development due to overreliance on the tool, potentially hindering the development of critical thinking skills (Alafnan et al., Citation2023; Sallam et al., Citation2023). Finally, ethical considerations surrounding academic integrity, privacy, and copyright infringement emerged as limitations requiring careful attention when deploying ChatGPT in educational and professional settings (Ibrahim et al., Citation2023; Puthenpura et al., Citation2023).

The analysis of included studies also revealed five key themes highlighting potential opportunities presented by ChatGPT. One area of potential lies in information retrieval, where research suggests it can be a valuable tool for finding information across various subjects. Another promising area is natural interaction and support, with ChatGPT's ability to hold natural conversations making it a potential candidate as a virtual companion or assistant in fields like medicine (Lahat et al., Citation2023) and creative endeavors (Seth et al., Citation2023). Studies also indicate that ChatGPT may automate tasks and improve workflow efficiency (Prieto et al., Citation2023; Sanmarchi et al., Citation2023). Within the educational domain, research explores its potential for personalized learning experiences, creating course materials, and supporting students (Day, Citation2023; Rahman et al., Citation2023). Finally, ChatGPT shows promise in creative text generation and supporting brainstorming processes, highlighting its potential as a tool for content creation, research, and generating new ideas (Ariyaratne et al., Citation2023; Cascella et al., Citation2023).

The limitations and associated opportunities of ChatGPT identified in this review align with findings from previous reviews on its affordances and limitations e.g., (Aydin & Karaarslan, Citation2023; Ray, Citation2023; Sok & Heng, Citation2024). Aydin and Karaarslan (Citation2023) highlight similar concerns in their review, including ChatGPT’s potential bias towards certain political views, its ability to deliver misleading information with equal confidence, and limitations in critical thinking and creativity. Similarly, the opportunities identified in this study resonate with the findings on opportunities in a review by Sok and Heng (Citation2024). They suggest that ChatGPT has the potential to enhance the field of higher education by stimulating innovative assessment methods, improving research writing and design, and boosting productivity. However, the current review, to the authors’ best knowledge, is the first systematic review following the PRISMA approach that specifically targets the limitations of ChatGPT, thereby providing more transparent and robust evidence on the limitations and related opportunities of ChatGPT

This review advocates for a cyclical collaborative approach among researchers, practitioners, and developers as essential for the sustainable development of ChatGPT. Grounded in the understanding that ChatGPT presents a double-edged sword, with both opportunities and challenges, expert supervision is crucial (Alafnan et al., Citation2023; Amin et al., Citation2023; Au Yeung et al., Citation2023). To maximize its potential, the proposed model in outlines a three-stage cyclical process. In stage one, researchers can explore methods to optimize ChatGPT's benefits and minimize limitations within specific fields. Stage two involves practitioners applying these research findings in real-world settings. Finally, developers can refine ChatGPT's technical capabilities based on both theoretical advancements and practical feedback from practitioners. This cyclical process necessitates all parties to remain updated on the latest developments and collaborate to ensure ChatGPT's continued evolution.

Figure 5. The cyclical evolution of ChatGPT by researchers, practitioners, and developers.

Figure 5. The cyclical evolution of ChatGPT by researchers, practitioners, and developers.

The current review is subject to several limitations. First, while this systematic review identified a substantial pool of 485 studies on ChatGPT limitations through searches in Scopus, Web of Science, ERIC, and IEEE databases, it should be acknowledged that it may have overlooked potential studies not indexed within these major databases. Second, while this review offers a comprehensive analysis of ChatGPT's limitations and opportunities, it focuses on the reported findings within the included studies and does not assess their methodological quality. This limits the ability to definitively determine the generalizability of the findings or identify potential biases within the research. Third, the generalizability of the identified limitations to more advanced iterations of ChatGPT, such as version 4, remains unclear. While the studies included explored limitations of earlier versions (3 and 3.5), it is uncertain if these limitations persist or change in the latest iteration. Additionally, a significant portion (69.7%) of the reviewed studies did not report the specific ChatGPT version used. This lack of transparency hinders our ability to definitively assess how limitations might vary across different versions.

However, while ChatGPT-4 reportedly leverages larger datasets, potentially leading to enhanced performance and incorporating plugin functionalities, previous scholars indicated that many limitations identified in ChatGPT-3.5 are still applicable to it. While advancements have been made, OpenAI (Citation2023) acknowledges ChatGPT-4 still exhibits limitations from earlier versions, including hallucinations, unreliability, and a limited context window, and lacks the ability to learn from experience. Supporting this, Suchman et al. (Citation2023) found no demonstrable advantage for ChatGPT-4 in a medical test, even showing a performance deficit compared to the free version (ChatGPT-3.5) on gastroenterology self-assessment tests.

Regarding future research directions, it is crucial for researchers to explicitly report the specific version of ChatGPT used in their studies to enhance the generalizability and reliability of research findings in the future. This facilitates comparisons across studies and allows for a more nuanced understanding of how limitations evolve across ChatGPT versions. Next, developing best practices for educators, assessment methods that leverage ChatGPT's strengths, and research on its impact on learning outcomes are essential next steps. Finally, integration with specific domains presents a promising avenue for future research. Investigating the potential of integrating ChatGPT with specialized tools in various contexts, along with domain-specific training methods and the associated ethical considerations, is recommended.

6. Conclusion

This systematic review examined limitations and opportunities associated with ChatGPT's application across various fields. By analyzing 33 carefully screened empirical studies, it offers a comprehensive picture of ChatGPT's capabilities. The review identified five key limitations: accuracy concerns in critical domains like healthcare, limitations in complex cognitive tasks, potential negative impacts on learners’ development due to overreliance, and ethical considerations surrounding privacy, copyright, and academic integrity. However, the review also highlights five opportunities. ChatGPT has the potential to be a valuable tool for users seeking information across various domains. Its ability to engage in natural conversations positions it as a potential virtual companion or assistant. The review also found promise in its ability to automate tasks and enhance workflows, leading to improved efficiency. Within education, ChatGPT presents opportunities for personalized learning experiences, course material creation, and student support. Finally, the review suggests promise in the ability of ChatGPT to generate creative text formats and support ideation processes, highlighting its potential as a tool for content creation, research, and brainstorming. By acknowledging both limitations and opportunities, this review offers valuable insights for researchers, developers, and users to consider when exploring the potential and responsible application of ChatGPT.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Ngo Cong-Lem

Ngo Cong-Lem is a Research Fellow at BehaviourWorks Australia, Monash University and a lecturer at the Faculty of Foreign Languages, Dalat University. His research interests involve educational technologies, evidence synthesis and translation, second language studies, and continuing professional learning.

Ali Soyoof

Ali Soyoof is a research fellow at University of Macau. His field of interest is Computer Assisted Language Learning (CALL), digital games, second language vocabulary learning, and Informal Digital Language Learning of English (IDLE).

Diki Tsering

Diki Tsering is a research officer at Monash Sustainable Development Institute. Her main interest lies in applying systematic review principles to deliver high-quality evidence reviews that translate research knowledge into practice and make positive contributions to society.

References

  • Alafnan, M. A., Dishari, S., Jovic, M., & Lomidze, K. (2023). ChatGPT as an educational tool: Opportunities, challenges, and recommendations for communication, business writing, and composition courses. Journal of Artificial Intelligence and Technology, 3(2), 60–68. https://doi.org/10.37965/jait.2023.0184
  • Ali, M. J. (2023). ChatGPT and lacrimal drainage disorders: Performance and scope of improvement. Ophthalmic Plastic and Reconstructive Surgery, 39(5), 515–514. https://doi.org/10.1097/IOP.0000000000002418
  • Amin, M. M., Cambria, E., & Schuller, B. W. (2023). Will affective computing emerge from foundation models and general artificial intelligence? A first evaluation of ChatGPT. IEEE Intelligent Systems, 38(2), 15–23. https://doi.org/10.1109/MIS.2023.3254179
  • Ariyaratne, S., Iyengar, K. P., Nischal, N., Chitti Babu, N., & Botchu, R. (2023). A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiology, 52(9), 1755–1758. https://doi.org/10.1007/s00256-023-04340-5
  • Au Yeung, J., Kraljevic, Z., Luintel, A., Balston, A., Idowu, E., Dobson, R. J., & Teo, J. T. (2023). AI chatbots not yet ready for clinical use. Frontiers in Digital Health, 5, 1161098. https://doi.org/10.3389/fdgth.2023.1161098
  • Aydin, Ö., & Karaarslan, E. (2023). Is ChatGPT leading generative AI? What is beyond expectations? Academic Platform Journal of Engineering and Smart Systems, 11(3), 118–134. https://doi.org/10.21541/apjess.1293702
  • Bitzenbauer, P. (2023). ChatGPT in physics education: A pilot study on easy-to-implement activities. Contemporary Educational Technology, 15(3), ep430. https://doi.org/10.30935/cedtech/13176
  • Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
  • Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. https://doi.org/10.48550/ARXIV.2303.12712
  • Cadamuro, J., Cabitza, F., Debeljak, Z., De Bruyne, S., Frans, G., Perez, S. M., Ozdemir, H., Tolios, A., Carobene, A., & Padoan, A. (2023). Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group. Clinical Chemistry and Laboratory Medicine, 61(7), 1158–1166. https://doi.org/10.1515/cclm-2023-0355
  • Cascella, M., Montomoli, J., Bellini, V., & Bignami, E. (2023). Evaluating the feasibility of ChatGPT in healthcare: An analysis of multiple clinical and research scenarios. Journal of Medical Systems, 47(1), 33. https://doi.org/10.1007/s10916-023-01925-4
  • Clark, T. M. (2023). Investigating the use of an artificial intelligence chatbot with general chemistry exam questions. Journal of Chemical Education, 100(5), 1905–1916. https://doi.org/10.1021/acs.jchemed.3c00027
  • Day, T. (2023). A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. The Professional Geographer, 75(6), 1024–1027. https://doi.org/10.1080/00330124.2023.2190373
  • Duong, D., & Solomon, B. D. (2023). Analysis of large-language model versus human performance for genetics questions. European Journal of Human Genetics, 32(4), 466–468. https://doi.org/10.1038/s41431-023-01396-8
  • Fergus, S., Botha, M., & Ostovar, M. (2023). Evaluating academic answers generated using ChatGPT. Journal of Chemical Education, 100(4), 1672–1675. https://doi.org/10.1021/acs.jchemed.3c00087
  • Giannos, P., & Delardas, O. (2023). Performance of ChatGPT on UK standardized admission tests: Insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Medical Education, 9, e47737. https://doi.org/10.2196/47737
  • Gregorcic, B., & Pendrill, A. M. (2023). ChatGPT and the frustrated Socrates. Physics Education, 58(3), 035021. https://doi.org/10.1088/1361-6552/acc299
  • Hassani, H., & Silva, E. S. (2023). The role of ChatGPT in data science: How AI-assisted conversational interfaces are revolutionizing the field. Big Data and Cognitive Computing, 7(2), 62. https://doi.org/10.3390/bdcc7020062
  • Hoch, C. C., Wollenberg, B., Lüers, J. C., Knoedler, S., Knoedler, L., Frank, K., Cotofana, S., & Alfertshofer, M. (2023). ChatGPT’s quiz skills in different otolaryngology subspecialties: An analysis of 2576 single-choice and multiple-choice board certification preparation questions. European Archives of Oto-Rhino-Laryngology, 280(9), 4271–4278. https://doi.org/10.1007/s00405-023-08051-4
  • Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
  • Ibrahim, H., Asim, R., Zaffar, F., Rahwan, T., & Zaki, Y. (2023). Rethinking homework in the age of artificial intelligence. IEEE Intelligent Systems, 38(2), 24–27. https://doi.org/10.1109/MIS.2023.3255599
  • Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course? Physical Review Physics Education Research, 19(1), 1–18. https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
  • Kumar, H. A. (2023). Analysis of chatgpt tool to assess the potential of its utility for academic writing in biomedical domain. Biology, Engineering, Medicine and Science Reports, 9(1), 24–30. https://doi.org/10.5530/bems.9.1.5
  • Lahat, A., Shachar, E., Avidan, B., Glicksberg, B., & Klang, E. (2023). Evaluating the utility of a large language model in answering common patients’ gastrointestinal health-related questions: Are we there yet? Diagnostics, 13(11), 1950. https://doi.org/10.3390/diagnostics13111950
  • Lai, K. (2023). How well does ChatGPT handle reference inquiries? An analysis based on question types and question complexities. College & Research Libraries, 84(6), 974–995. https://doi.org/10.5860/crl.84.6.974
  • Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410
  • McIntosh, T. R., Liu, T., Susnjak, T., Watters, P., Ng, A., & Halgamuge, M. N. (2024). A culturally sensitive test to evaluate nuanced GPT hallucination. IEEE Transactions on Artificial Intelligence, 1–13. https://doi.org/10.1109/TAI.2023.3332837
  • Nikolic, S., Daniel, S., Haque, R., Belkina, M., Hassan, G. M., Grundy, S., Lyden, S., Neal, P., & Sandison, C. (2023). ChatGPT versus engineering education assessment: A multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity. European Journal of Engineering Education, 48(4), 559–614. https://doi.org/10.1080/03043797.2023.2213169
  • OpenAI. (2023). GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf
  • Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. International Journal of Surgery, 88(2021), 105906. https://doi.org/10.1016/j.ijsu.2021.105906
  • Parsons, B., & Curry, J. H. (2024). Can ChatGPT pass graduate-level instructional design assignments? Potential implications of artificial intelligence in education and a call to action. TechTrends, 68(1), 67–78. https://doi.org/10.1007/s11528-023-00912-3
  • Poole, F. (2022). Using Chatgpt to design language material and exercises. https://fltmag.com/chatgpt-design-material-exercises/
  • Prieto, S. A., Mengiste, E. T., & García de Soto, B. (2023). Investigating the use of ChatGPT for the scheduling of construction projects. Buildings, 13(4), 857. https://doi.org/10.3390/buildings13040857
  • Puthenpura, V., Nadkarni, S., DiLuna, M., Hieftje, K., & Marks, A. (2023). Personality changes and staring spells in a 12-year-old child: A case report incorporating ChatGPT, a natural language processing tool driven by artificial intelligence (AI). Cureus, 15(3), e36408. https://doi.org/10.7759/cureus.36408
  • Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13(9), 5783. https://doi.org/10.3390/app13095783
  • Rahman, M., Terano, H. J. R., Rahman, N., Salamzadeh, A., & Rahaman, S. (2023). Chatgpt and academic research: A review and recommendations based on practical examples. Journal of Education, Management and Development Studies, 3(1), 1–12. https://doi.org/10.52631/jemds.v3i1.175
  • Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003
  • Rozado, D. (2023). The political biases of ChatGPT. Social Sciences, 12(3), 148. https://doi.org/10.3390/socsci12030148
  • Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning & Teaching, 6(1), 342–363. https://doi.org/10.37074/jalt.2023.6.1.9
  • Sallam, M., Salim, N., Barakat, M., & Al-Tammemi, A. (2023). ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J, 3(1), e103. https://doi.org/10.52225/narra.v3i1.103
  • Sanmarchi, F., Bucci, A., Nuzzolese, A. G., Carullo, G., Toscano, F., Nante, N., & Golinelli, D. (2023). A step-by-step researcher’s guide to the use of an AI-based transformer in epidemiology: An exploratory analysis of ChatGPT using the STROBE checklist for observational studies. Journal of Public Health, 1–36. https://doi.org/10.1007/s10389-023-01936-y
  • Segal, S., & Khanna, A. K. (2023). Anesthetic management of a patient with juvenile hyaline fibromatosis: A case report written with the assistance of the large language model chatgpt. Cureus, 15(3), e35946. https://doi.org/10.7759/cureus.35946
  • Seth, I., Sinkjær Kenney, P., Bulloch, G., Hunter-Smith, D. J., Bo Thomsen, J., & Rozen, W. M. (2023). Artificial or augmented authorship? A conversation with a Chatbot on base of thumb arthritis. Plastic and Reconstructive Surgery. Global Open, 11(5), e4999. https://doi.org/10.1097/GOX.0000000000004999
  • Shoufan, A. (2023). Can students without prior knowledge use ChatGPT to answer test questions? An empirical study. ACM Transactions on Computing Education, 23(4), 1–29. https://doi.org/10.1145/3628162
  • Singh, H., & Singh, A. (2023). Chatgpt: Systematic review, applications, and agenda for multidisciplinary research. Journal of Chinese Economic and Business Studies, 21(2), 193–212. https://doi.org/10.1080/14765284.2023.2210482
  • Sok, S., & Heng, K. (2024). Opportunities, challenges, and strategies for using ChatGPT in higher education: A literature review. Journal of Digital Educational Technology, 4(1), ep2401. https://doi.org/10.30935/jdet/14027
  • Stojanov, A. (2023). Learning with ChatGPT 3.5 as a more knowledgeable other: An autoethnographic study. International Journal of Educational Technology in Higher Education, 20(1), 35. https://doi.org/10.1186/s41239-023-00404-7
  • Su, J., & Yang, W. (2023). Unlocking the power of chatgpt: A framework for applying generative ai in education. ECNU Review of Education, 6(3), 355–366. https://doi.org/10.1177/20965311231168423
  • Suchman, K., Garg, S., & Trindade, A. J. (2023). Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology Self-Assessment Test. The American Journal of Gastroenterology, 118(12), 2280–2282. https://doi.org/10.14309/ajg.0000000000002320
  • Tan, T. F., Thirunavukarasu, A. J., Campbell, J. P., Keane, P. A., Pasquale, L. R., Abramoff, M. D., Kalpathy-Cramer, J., Lum, F., Kim, J. E., Baxter, S. L., & Ting, D. S. W. (2023). Generative artificial intelligence through chatgpt and other large language models in ophthalmology: Clinical applications and challenges. Ophthalmology Science, 3(4), 100394. https://doi.org/10.1016/j.xops.2023.100394
  • Thirunavukarasu, A. J., Hassan, R., Mahmood, S., Sanghera, R., Barzangi, K., El Mukashfi, M., & Shah, S. (2023). Trialling a large language model (ChatGPT) in General practice with the applied knowledge test: Observational study demonstrating opportunities and limitations in primary care. JMIR Medical Education, 9, e46599. https://doi.org/10.2196/46599
  • Wagner, M. W., & Ertl-Wagner, B. B. (2023). Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Canadian Association of Radiologists Journal, 75(1), 69–73. https://doi.org/10.1177/08465371231171125

Appendix A

Databases and Search Strings

Database: Scopus

Date of Search: 26 June 2023

Yield: 169

Search string: TITLE-ABS-KEY (chatgpt AND (limitation* OR challenge* OR drawback* OR problem* OR challenge* OR issue* OR concern* OR risk* OR disadvantage* OR flaw* OR weakness* OR shortcoming* OR pitfall* OR downside* OR bias* OR error* OR ethic*)) AND (LIMIT-TO (DOCTYPE, "ar"))

Database: Web of Science

Date of Search: 26 June 2023

Yield: 150

Search string: TS=(ChatGPT AND (limitation* OR challenge* OR drawback* OR problem* Or challenge* OR issue* OR concern* OR risk* OR disadvantage* OR flaw* OR weakness* OR shortcoming* OR pitfall* OR downside* OR bias* OR error* OR ethic*))

Database: ERIC

Search Date: 09 March 2024

Yield: 108

Search string: ChatGPT AND (limitation* OR challenge* OR drawback* OR problem* OR issue* OR concern* OR risk* OR disadvantage* OR flaw* OR weakness* OR shortcoming* OR pitfall* OR downside* OR bias* OR error* OR ethic*)

Database: IEEE Xplore

Search Date: 09 March 2024

Yield: 58 (Filters applied: ‘Journals’ and ‘Early Access Articles’)

Search string: ("ChatGPT" AND (limitation* OR challenge* OR drawback* OR problem* OR issue* OR concern* OR bias* OR risk* OR disadvantage* OR flaw OR flaws OR weakness OR weaknesses OR shortcoming OR shortcomings OR pitfall OR pitfalls OR downside OR downsides OR error OR errors OR ethic OR ethics))

Appendix B

A Summary of the Included Studies