4,784
Views
12
CrossRef citations to date
0
Altmetric
Computer Science

How trustworthy is ChatGPT? The case of bibliometric analyses

, ORCID Icon & ORCID Icon
Article: 2222988 | Received 02 Apr 2023, Accepted 06 Jun 2023, Published online: 25 Jun 2023

Abstract

The introduction of the AI-powered chatbot ChatGPT by OpenAI has sparked much interest and debate among academic researchers. Commentators from different scientific disciplines have raised many concerns and issues, especially related to the ethics of using these tools in scientific writing and publications. In addition, there has been discussions about whether ChatGPT is trustworthy, effective, and useful in increasing researchers’ productivity. Therefore, in this paper, we evaluate ChatGPT’s performance on tasks related to bibliometric analysis, by comparing the output provided by the chatbot with a recently conducted bibliometric study on the same topic. The findings show that there are large discrepancies and ChatGPT’s trustworthiness is low in this particular area. Therefore, researchers should exercise caution when using ChatGPT as a tool in bibliometric studies.

1. Introduction

ChatGPT is a highly developed large language model intended to respond to text-based queries and produce human-like natural language responses. It was developed by Open AI based on San Francisco, USA using generative pre-trained transformers 3.5 (GPT 3.5) and introduced to the general public in late 2022. However, just two months after the release of ChatGPT, it had an estimated 100 million active users, making it the fastest-growing consumer application in history. In March 2023, Open AI launched GPT 4 with additional features and capabilities.

There is already a considerable body of research on the use of ChatGPT in different areas, as illustrated by several recent review articles (e.g., Dwivedi et al., Citation2023; Sohail et al., Citation2023; Khosravi et al., Citation2023; Lo, Citation2023; Sallam, Citation2023). Many studies and reports have shown that ChatGPT is frequently used in academic writing, including essays, poems, stories, computer coding, and even technical writing. As ChatGPT produces text that resembles human writing and has lower levels of plagiarism than anticipated, a number of papers have recently been published addressing its impact on scientific writing. For example, Dowling and Lucey (Citation2023) find that ChatGPT can be a highly useful tool and research assistant in finance research.

However, the capacity of ChatGPT to produce original writings has generated questions and difficulties for academic science. While some people are embracing ChatGPT because of the improved learning possibilities, others are raising concerns about ethical issues, trustworthiness, and misleading data. For instance, Sharples (Citation2022) recommends that rather than forbidding the use of these AI tools, educators and students should be encouraged to use them to enhance learning experiences. Similarly, McMurtrie (Citation2023) suggests that advanced tools like ChatGPT will soon become a part of everyday writing.

Other commentators have been more skeptical. For example, some researchers (Graham, Citation2022; Salvagno et al., Citation2023) have pointed out the risks involved in relying on ChatGPT’s generated text or data. Despite mixed opinions about ChatGPT, its popularity continues to grow as it provides a powerful tool for generating high-quality written content. Scholarly writing is one of the fields most significantly impacted by ChatGPT. Several publications have explored the advantages and challenges of using ChatGPT in scientific writing (Biswas, Citation2023; Hill-Yardin et al., Citation2023; Koo, Citation2023; Omar et al., Citation2017; Salvagno et al., Citation2023) and have cautioned that, despite its impressive writing abilities, human judgment is still required (Kitamura, Citation2023). Some authors have even used ChatGPT to write entire articles and have assessed its reliability, plagiarism, and authentication capabilities in scientific writing (Cotton et al., Citation2023; King, Citation2023). The question of whether ChatGPT should be considered an author of scientific articles is currently being debated among scientific experts, and while some have given ChatGPT authorship credit (Salvagno et al., Citation2023), others have raised concerns (Lee, Citation2023; Stokel-Walker, Citation2023; Teixeira da Silva, Citation2023; Thorp, Citation2023). As a result, some leading journals like Science (Thorp, Citation2023), and The Lancet (https://www.thelancet.com/pb/assets/raw/Lancet/authors/tl-info-for-authors.pdf), have updated their guidelines regarding the use of ChatGPT and similar AI generated chatbots.

To this end, we conducted an investigation to evaluate how ChatGPT 3.5 performs when asked to write an abstract for bibliometric analysis, by comparing the chatbot’s output with the results of a recently published bibliometric study on the same topic. In other words, this comparison helps evaluate the extent to which ChatGPT can be considered trustworthy and reliable in this type of research task.

2. Conducting bibliometric analysis using ChatGPT

We tasked ChatGPT 3.5 with creating abstracts for two bibliometric analyses to evaluate the accuracy and quality of its generated content. We selected a previously published study on curcumin in wound healing (Farhat et al., Citation2023) and requested ChatGPT to write an abstract using the same search string and database utilized in the original study. We created two versions of Query 1, one without any word limits and another limited to 300 words, while using the same search terms and database. The objective of the first query was to evaluate the consistency of ChatGPT in producing data. For the second query, we performed a bibliometric analysis on the Web of Science (WoS) database ourselves. Then, we requested ChatGPT to generate an abstract using the same search terms and database to evaluate its real-time data curation capabilities. In the third query, we asked for references for bibliometric analysis related to Query 1, followed by cross-questioning ChatGPT’s responses in Queries 4 and 5.

3. Discussion

Bibliometric analysis is a valuable tool for identifying prolific authors, top avenues, leading countries and their collaborative patterns, as well as the intellectual structure of a particular domain in existing literature (Donthu et al., Citation2021; Ellegaard & Wallin, Citation2015; Zupic & Čater, Citation2015). This type of analysis involves processing significant amounts of unstructured data, such as the number of publications, keywords, and other relevant metrics. Bibliometric analysis can also be helpful in forecasting future trends in a particular academic topic (Farhat et al., Citation2023). The usefulness and impact of bibliometric reviews can be greatly enhanced through synthesis. The fundamental promise of bibliometrics rests in the capacity to synthesize knowledge, even though tracking trends and doing statistical analysis are important components (Maggio et al., Citation2021). Researchers can go beyond simple analysis by engaging in synthesis, which will help them gain a more complete picture of the research landscape. In order to provide new insights and knowledge, synthesis requires combining the results of various investigations and locating recurring themes and patterns. It tries to produce an integrated and complex understanding of the subject by going beyond the various components of the study (Perrier et al., Citation2016). Consequently, well-executed bibliometric studies can significantly contribute to the progress of a field and guide future research endeavours.

The accuracy of the data curation capabilities of ChatGPT was evaluated by conducting a bibliometric analysis using search strings in either SCOPUS or Web of Science databases. Table compares its results from actual bibliometric papers. Despite the well-written presentation of quantitative data, ChatGPT provided inaccurate information about leading authors, countries, and avenues. For instance, when asked to write a bibliometric review using the search keywords “turmeric” OR “curcum*” AND “wound” from the SCOPUS database, it only retrieved 246 articles. In comparison, the original study (Farhat et al., Citation2023) found 1284 articles, i.e., a quite significant difference in data.

Table 1. Comparison of responses from ChatGPT on queries related to bibliometric queries and actual article data

Furthermore, ChatGPT listed India, USA, and Iran as the top contributing countries, while the original study listed India, China, USA, and Iran. In terms of the most prolific authors, ChatGPT identified Kottarathil Abraham Jacob and Madhulika Bhagat, whereas the original study found Meiyanto, Edy, Sahebkar, Amirhossein, and Jenie, Riris. Even if we disregard the sequence of the top countries, the three countries mentioned by ChatGPT are still among the top five countries listed in the original study. Although the authors that ChatGPT identified were not included in the original study’s list of authors, the top keywords and avenues retrieved by ChatGPT were among the top 20 listed in the original study for their respective metrics, despite their sequence being inaccurate. Reframing the query did not affect the consistency of the data curation.

In order to evaluate the real-time data curation capabilities of ChatGPT, a bibliometric analysis was conducted using a new set of search strings on the Web of Science database, and then ChatGPT was asked to conduct a similar analysis. The results indicated that ChatGPT generated inaccurate information regarding data collection. While 681 articles were retrieved, ChatGPT reproduced 2725 articles, which is far more than the original number. While the top two contributing countries were accurate, the top institutions were not correct. Similarly, the top cited journals and funding agencies provided by ChatGPT did not align with the findings of the study conducted by the researchers themselves. The top cited journals in the field were found to be Scientific Reports, Frontiers in Microbiology, and Microbiome, whereas ChatGPT provided a different ranking. The prominent researchers suggested by ChatGPT were Jian-Hua Zhao, Yang Zhang, and Gerard Wright, but Jian-Hua Zhao and Yang Zhang were not even included in the list of authors retrieved by the researchers themselves. Gerard Wright, however, was identified as one of the top 30 prolific authors. The top three countries with the highest collaborations were found to be the USA, UK, and Germany, while ChatGPT identified the top three most frequent collaborators as the USA, China, and the UK. However, ChatGPT did not include Canada, which was found to be equal to the UK, leading to incomplete information.

In a recent study, ten research abstracts were collected from five high-impact medical journals, and ChatGPT was tasked with generating new abstracts based on their titles and journals. While the generated abstracts had similar patient cohort sizes as the original abstracts, the exact numbers were found to be fabricated (Gao et al., Citation2022). Despite this, reviewers found it surprisingly difficult to distinguish between the two sets of abstracts, though they noted that the AI-generated abstracts were vague and had a formulaic tone. Several recent studies have highlighted the difficulty faced by researchers in distinguishing between AI-generated and original abstracts (Else, Citation2023; Salvagno et al., Citation2023).

In Query 3, we asked ChatGPT for references related to bibliometric analysis, but some of the references provided were either non-existent or irrelevant to our study. When we cross-questioned ChatGPT about the sources, it apologized and generated a new set of references, but upon further investigation, those references were also non-existent. In Query 5, when we asked ChatGPT why it does not reject questions if it cannot provide relevant answers, it explained that it is programmed to respond to every query and cannot ignore any question. These findings highlight that ChatGPT is programmed to respond to every query regardless of accuracy and does not take responsibility for any errors. It can be envisaged that ChatGPT needs to be more trustworthy to depend solely on the data it generates. Therefore, it is crucial to ensure the accuracy of data generated by ChatGPT, and bibliometric analysis may not be the most suitable task for it.

While analyzing the ChatGPT response, we observed that there is a glaring lack of synthesis in addition to the inaccurate data procurement when performing a bibliometric review. Instead of synthesizing the information, ChatGPT largely concentrates on reporting random data. This constraint prevents it from offering a thorough and perceptive analysis that extends beyond individual data points. Moreover, the lack of data reproducibility raises concerns about the reliability and trustworthiness of the results generated by ChatGPT. By acknowledging and addressing these limitations, we can work towards developing AI models that excel in both data analysis and synthesis, thus advancing the capabilities of bibliometric reviews.

4. Conclusion

In conclusion, this exploratory study finds that while ChatGPT has the potential to be a useful tool as a scientific writing assistant in terms of improving readability, language enhancement, rephrasing/paraphrasing and proofreading, etc., it should not, as of today, be used for retrieving bibliometric data or conducting bibliometric assessments. It is very important for researchers and students to keep this in mind. In recent years, bibliometric methods have become increasingly popular in many different research areas, and some might be tempted to take a shortcut and ask ChatGPT rather than carrying out the analyses by extracting data from databases and analyzing these data using appropriate software packages.

Table summarizes the potential issues related to ChatGPT while conducting bibliometric analysis. It is our view that researchers should exercise caution when interpreting the results generated by ChatGPT and should verify the information using other sources. ChatGPT’s real-time data curation capabilities and data analytic techniques, specifically with electronic databases such as SCOPUS and WoS, need further refinement and validation to ensure trustworthiness, accuracy and consistency.

Table 2. Summary of ChatGPT’s capabilities and related issues

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Biswas, S. (2023). ChatGPT and the future of medical writing. Radiology, 307(2), 223312. https://doi.org/10.1148/RADIOL.223312
  • Cotton, D. R. E., Cotton, P. A., & Shipway, J. R. (2023). Chatting and cheating. Ensuring academic integrity in the era of ChatGPT. EdArxiv. https://doi.org/10.35542/OSF.IO/MRZ8H
  • Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–8. https://doi.org/10.1016/j.jbusres.2021.04.070
  • Dowling, M., & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 53, 103662. https://doi.org/10.1016/j.frl.2023.103662
  • Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., & Wirtz, J.… Wright, R. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642
  • Ellegaard, O., & Wallin, J. A. (2015). The bibliometric analysis of scholarly production: How great is the impact? Scientometrics, 105(3), 1809–1831. https://doi.org/10.1007/s11192-015-1645-z
  • Else, H. (2023). Abstracts written by ChatGPT fool scientists. Nature, 613(7944), 423. https://doi.org/10.1038/D41586-023-00056-7
  • Farhat, F., Athar, M. T., Ahmad, S., Madsen, D. Ø., & Sohail, S. S. (2023). Antimicrobial resistance and machine learning: Past, present, and future. Frontiers in Microbiology, 14, 1179312. https://doi.org/10.3389/fmicb.2023.1179312
  • Farhat, F., Sohail, S. S., Siddiqui, F., Irshad, R. R., & Madsen, D. Ø. (2023). Curcumin in wound healing—A bibliometric analysis. Life, 13(1), 143. https://doi.org/10.3390/life13010143
  • Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv [Internet]. ( 2022.12.23.521610). https://doi.org/10.1101/2022.12.23.521610
  • Graham, F. (2022). Daily briefing: Will ChatGPT kill the essay assignment? Nature. https://doi.org/10.1038/D41586-022-04437-2
  • Hill-Yardin, E. L., Hutchinson, M. R., Laycock, R., & Spencer, S. J. (2023). A Chat(GPT) about the future of scientific publishing. Brain, Behavior, and Immunity, 110, 152–154. https://doi.org/10.1016/J.BBI.2023.02.022
  • Khosravi, H., Shafie, M. R., Hajiabadi, M., Raihan, A. S., & Ahmed, I. (2023). Chatbots and ChatGPT: A bibliometric analysis and systematic review of publications in Web of Science and Scopus databases. arXiv preprint arXiv:2304.05436.
  • King, M. R., & C. (2023). A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cellular and Molecular Bioengineering, 16(1), 1–2. https://doi.org/10.1007/s12195-022-00754-8
  • Kitamura, F. C. (2023). ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology, 307(2), 230171. https://doi.org/10.1148/RADIOL.230171
  • Koo, M. (2023). The importance of proper use of ChatGPT in medical writing. Radiology, 307(3). https://doi.org/10.1148/RADIOL.230312
  • Lee, J. Y. (2023). Can an artificial intelligence chatbot be the author of a scholarly article? Journal of Educational Evaluation for Health Professions, 20, 6. https://doi.org/10.3352/JEEHP.2023.20.6
  • Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410
  • Maggio, L. A., Costello, J. A., Norton, C., Driessen, E. W., & Artino, A. R., Jr. (2021). Knowledge syntheses in medical education: A bibliometric analysis. Perspectives on Medical Education, 10(2), 79–87. https://doi.org/10.1007/s40037-020-00626-9
  • McMurtrie. (2023). Teaching: Will ChatGPT change the way you teach. The Chronicles of Higher Education. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=McMurtrie%2C++B.++2023+Teaching%3A+Will+ChatGPT+change+the+way+you+teach&btnG=
  • Omar, R., Mangukiya, O., Kalnis, P., Mansour, E., & Arabia, S. (2017). Current status and future directions towards knowledge graph Chatbots.
  • Perrier, L., Lightfoot, D., Kealey, M. R., Straus, S. E., & Tricco, A. C. (2016). Knowledge synthesis research: A bibliometric analysis. Journal of Clinical Epidemiology, 73, 50–57. https://doi.org/10.1016/j.jclinepi.2015.02.019
  • Sallam, M. (2023). The utility of ChatGPT as an example of large language models in healthcare education, Research and Practice: Systematic Review on the Future Perspectives and Potential Limitations (pp. 1–34).
  • Salvagno, M., ChatGpt Taccone, F. S., & Gerli, A. G. (2023). Can artificial intelligence help for scientific writing? Critical Care (London, England), 27(1), 75. https://doi.org/10.1186/S13054-023-04380-2
  • Sharples, M. (2022, May 17). New AI tools that can write student essays require educators to rethink teaching and assessment. Blog. London School of Economics. https://blogs.lse.ac.uk/impactofsocialsciences/2022/05/17/new-ai-tools-that-can-write-student-essays-require-educators-to-rethink-teaching-and-assessment/
  • Sohail, S. S., Farhat, F., Himeur, Y., Nadeem, M., Madsen, D., & Øivind, S. S and Atalla, Y., Mansoor, W. (2023). The Future of GPT: A Taxonomy of Existing ChatGPT Research. Current Challenges, and Possible Future Directions. SSRN Journal. https://doi.org/10.2139/ssrn.4413921
  • Stokel-Walker, C. (2023). ChatGPT listed as author on research papers: Many scientists disapprove. Nature, 613(7945), 620–621. https://doi.org/10.1038/D41586-023-00107-Z
  • Teixeira da Silva, J. A. (2023). Is ChatGPT a valid author? Nurse Education in Practice, 68, 103600. https://doi.org/10.1016/J.NEPR.2023.103600
  • Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313–313. https://doi.org/10.1126/SCIENCE.ADG7879
  • Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429–472. https://doi.org/10.1177/1094428114562629