1,565
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Mega-authorship implications: How many scientists can fit into one cell?

Received 17 Nov 2023, Accepted 11 Feb 2024, Published online: 05 Mar 2024

ABSTRACT

The past 20 years has seen a significant increase in articles with 500 or more authors. This increase has presented problems in terms of determining true authorship versus other types of contribution, issues with database metadata and data output, and publication length. Using items with 500+ authors deemed as mega-author titles, a total of 5,533 mega-author items were identified using InCites. Metadata about the items was then gathered from Web of Science and Scopus. Close examination of these items found that the vast majority of these covered physics topics, with medicine a far distant second place and only minor representation from other science fields. This mega-authorship saw significant events that appear to correspond to similar events in the Large Hadron Collider’s timeline, indicating that the projects for the collider are driving this heavy output. Some solutions are offered for the problems resulting from this phenomenon, partially driven by recommendations from the International Committee of Medical Journal Editors.

Introduction

Excel cells have a limit of 32,767 characters in a single cell. While this may seem like an enormous number of characters, there is some research that exceeds this limit when fields like author and author identification numbers are exported from research databases. This leads to an overflow of data, resulting in the need to clean up the data output from these records no longer aligning to the output field headers in the spreadsheet and taking up extra rows. Encountering this phenomenon multiple times with database output led to this investigation of how pervasive are items with a large number of authors and the implications of such items.

This article will focus on examining metadata about items with 500+ authors, which will be referred to as mega-author items. Different terminology is used to label the concept of works with a large number of authors, including mega-authorship (Byard and Vink Citation2021; Das and Sen Citation2001; Kretschmer and Rousseau Citation2001; Sen Citation1997) and hyperauthorship or hyper-authorship (Changa, Huang, and Chiu Citation2019; Nogrady Citation2023; Von Bergen and Bressler Citation2017). Terminology is also used for the phenomenon of an increasing number of authors (but not necessarily a large author count), including author inflation (Dong et al. Citation2016; Kretschmer and Rousseau Citation2001; Nuzzo Citation2021; Von Bergen and Bressler Citation2017), the simple too many authors (McConnell Citation1958), the specific (and with an acronym) Increase in the average Number of Authors per Publication (INAP) (Hosseini et al. Citation2022), and possibly the most popular, author proliferation or authorship proliferation (Byrne Citation1988; Camp and Escott Citation2013; Durani, Rimouche, and Ross Citation2007; King Citation2000; Lutnick et al. Citation2021; Modi et al. Citation2008; Papadakis Citation2021). Most works did not give a specific threshold number for a work to earn this label, with items using the author(ship) proliferation label typically just analyzing the data without giving anything approaching a threshold. McConnell (Citation1958) uses three authors as a maximum author count before earning the too many authors label, with this number not being unusual today. Other numbers include 10 (Das and Sen Citation2001; Sen Citation1997), 15 (Papadakis Citation2021) and 100 (Byard and Vink Citation2021; Changa, Huang, and Chiu Citation2019). While Kretschmer and Rousseau (Citation2001) do not explicitly state more than 100 as mega-authorship, they use that number throughout their paper as a threshold number for their study of author inflation. These numbers, even the 100 count, were deemed too low to use as a minimum as a search to gather data due to the quantity of works with more than that number of authors. For the purposes of this article, the minimum will be 500. Given this phenomenon is not limited to journal articles, but also includes conference papers and books (which are the two other major formats indexed by most research databases), the term “items” will be used to collectively cover these formats.

Literature review

How many authors is too much?

Publications with multiple authors is not a new concept. While this article is focusing on items with 500+ authors, the phenomenon of a large number of authors has appeared in the literature for some time. In a Science letter, McConnell (Citation1958) suggests that having too many authors is not usually justifiable (unless the item is a book). The threshold number? Three. McConnell gives an example of a specific item and the difference between authorship of an actual item and someone that does work that is used, but does not contribute to writing. The stated solution is to use acknowledgments. McConnell’s threshold would today likely be broken by quite a large number of scientific journal articles on a regular basis.

Thirty years later, Byrne (Citation1988) commented on two articles, one in physics with 104 authors and one in medicine with 193 authors. Byrne suggests this is a problematic phenomenon and encourages readers of his editorial to send in examples of other such works.

Regalado (Citation1995) covers the increase in both medicine and physics items with many (more than 50) authors. Regalado proposed reasons for why this phenomenon was occurring back then, including an increase of multi-institution clinical trials and large projects involving accelerators. This short commentary indicates how some feel that anyone contributing to the work that leads to a paper should be included. The letter also quotes a journal editor who believes enforcing standards can be difficult. In a more recent study comparing genetics and high-energy physics, Changa, Huang and Chiu (Citation2019) found several journals in their study to have a large percentage of hyperauthorship (100 or more authors), with Nature Genetics at 9.04%, European Physical Journal C at 13.27%, and Astropartical Physics at 9.18%.

Medicine’s views on author quantity

As Regalado (Citation1995) mentioned, medicine is known for having items with a large number of authors. It is interesting that several articles were written with a similar title about mega-authorship within specific disciplines, many having a title variation along the lines of “How many [blanks] does it take … ?” with the medical professional type inserted in the [blank] (with slight variations to the title, of course). Modi et al. (Citation2008) examined 70 years (1936–2006) of the cardiothoracic surgical literature for journal articles with larger numbers of authors. The findings were a decrease in items with one or two authors and an increase in the average number of authors (although this average was still under ten in 2006). Similar results were found for the orthopedic (Rahman and Muirhead-Allwood Citation2010), plastic surgery (Durani, Rimouche, and Ross Citation2007) and neurosurgery (King Citation2000) literature. Another factor in some of these studies was international collaborations.

A large study of 121,397 peer reviewed publications by An et al. (Citation2020) of the medical literature from 2005 to 2017 found that neurology, radiation oncology, pathology, psychiatry, and internal medicine were the subject areas with the greatest number of authors per article, in that order. They also found that while case reports and review articles found a small increase in the mean number of authors for case studies (4.26 to 4.49 over the 2005–2017 period), literature reviews (3.53 to 5.69) and original research (5.87 to 8.51) saw larger increases.

More recent studies on orthopedics journal articles found similar trends in increased authorship numbers per article (Camp and Escott Citation2013; Lutnick et al. Citation2021). Related to the large influx of literature during the COVID-19 pandemic, Papadakis (Citation2021) specifically focuses on case reports. There was a mean of 6.1 authors for identified reports and 14% of the titles had ten or more authors.

Tilak, Prasad and Jena (Citation2015) examined articles from three medical journals and found that several categories (single-center randomized controlled trial, multi-center randomized controlled trial, and observational studies) all had increases in mean author count from 1960 to 2010. Interestingly, articles about multi-center controlled trial studies increased, with over 12 times as many in 2010 as in 1960, while there were over six times as many single-center items. The number of observational studies decreased for the same period. These changes, especially with more multi-center randomized control trials, are potentially factors in having more authors since more locations will require authors from those institutions to be involved. This harkens back to Regalado’s (Citation1995) commentary on this increase in author quantity. Similarly, during their examination of neurosurgical journals, Cole, Pacult and Lawton (Citation2022) found 28% of the variation of the increase is due to studies involving multiple institutions or departments.

Perhaps one of the more interesting ways to address this issue is the approach by Agel et al. (Citation2016). They present seven vignettes involving ways in which discrepancies, problems, disagreements, and other issues can arise for multi-authored works. Suggestions are given of ways to prevent such issues, especially having a discussion of roles and authorship expectations beforehand and revisit, as necessary. The authors in this study also suggest certain roles not involved in the actual writing of the manuscript, which sometimes get authorship credit, should get alternative acknowledgments for their roles.

While most research focuses on the peer-reviewed journal literature, the issue of increased author count is not limited to this format. Although not up to the level of mega-authorship, Nuzzo (Citation2021) found an increase in the average number of authors of letters to the editor in exercise science and physical therapy journals over each decade from the 1960s to the 2010s.

Other disciplines’ views on author quantity

While medicine seems to cover this topic quite thoroughly, it is not as heavily covered in the physical sciences. However, Wyatt (Citation2012) does address the issue of author quantity in physics. Wyatt did a quick examination and saw an increase in author count, indicating that many of these were likely people that would be listed in acknowledgments in the past. This author goes on to ponder potential reasons for this quantity, but also suggests there is perhaps more evidence of creativity on the authors’ part when there is just one or two authors. Ledbetter (Citation2012) responded to Wyatt’s commentary by pointing out ambiguity with some author quantity guidelines, which use vague terms like significant (which Ledbetter pulls from the American Physical Society’s guidelines) and refers to mundane tasks that should not result in being included as an author. Ledbetter points to the International Committee of Medical Journal Editors (ICMJE) as a more specific guide for determining who should be considered an author.

Can there Be standards on authorship?

Are all these authors in mega-authored publications really authors? What did these people contribute to the work? The recommendations from International Committee of Medical Journal Editors (Citationn.d.-a) are that for someone to be considered an author, the person should be involved in:

  1. Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND

  2. Drafting the work or revising it critically for important intellectual content; AND

  3. Final approval of the version to be published; AND

  4. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

In these guidelines, ICMJE also recommends that “Contributors who meet fewer than all 4 of the above criteria for authorship should not be listed as authors, but they should be acknowledged.” Thus, many, if not most, of these mega-author items probably fail at least some of these criteria, which are combined with AND. So a failure to meet all of these indicates the person should not be listed as an author.

Von Bergen and Bressler (Citation2017) believe that the ICMJE recommendations may be becoming a standard for authorship. But the authors also still see a lot of disagreement over what constitutes an author credit and notes disciplinary differences. What is recommended for medicine journals may not mesh with other disciplines. This is backed up by the fact that author quantity seems well-covered in the medical literature, but other disciplines seem to be quieter on this topic.

Byard and Vink (Citation2021) agree with the ICMJE recommendations. They point out that mega-authorship is incompatible with these recommendations and that journals should ask for details of contribution. They find that editors for their journal, Forensic Science, Medicine and Pathology, often removes some authors when asking for those details. Interestingly, they also suggest that authors check off adherence to ICMJE recommendations when submitting and that perhaps authors should be more detailed in their CVs as well when involved in mega-authored publications. However, Dong et al. (Citation2016) studied four major gastroenterology journals and found that despite author contribution requirements, the number of listed authors still increased. Even with such rules in place, the number of authors can still grow, sometimes for valid reasons. It is worth noting, however, that this particular study did not focus on mega-authored titles, which are at a much higher authorship count than the usual range seen in the Dong et al. study.

Also referencing the ICMJE recommendations, Vučković-Dekić (Citation2014) explores causes for concern about false authorship when it comes to multi-authorship. This author shares evidence from another study that indicates multi-authorship leads to an increase in false, undeserved, or gift authorship. Vučković-Dekić believes that educating researchers about these issues is important, as statements or submission requirements seem to not be sufficient.

A more discipline-neutral option is the Committee on Publication Ethics (COPE), which has extensive information on different publication ethics issues. A section of their site is dedicated to multiple issues related to authorship, including disputes about who should be on the list, determining author lists that have too many or too few authors, and other issues (Committee on Publication Ethics Citation2023). While the guidelines are not quite as specific as ICMJE as to who should be on an author list, the guidelines give a good bit more detail on the types of situations that might arise due to larger author lists.

Publishers themselves have something to say on this issue. Examining areas outside of medicine give several examples of attempts to address author quantity. Institute of Physics (IOP) Publishing does follow ICMJE’s authorship criteria (IOP Publishing Citationn.d.), indicating their support for author lists that are more exclusive to authors of the manuscript. The American Physical Society (APS) does not appear to address author quantity limits, but indicates authors should be ones that give a significant contribution to the work, with those that gave other contributions should get acknowledgments (American Physical Society Citationn.d.-a). As a reminder, Ledbetter (Citation2012) points out the vagueness of words like “significant.” While it does not give a quantity, APS’s Physical Review Letters does give some guidelines for higher quantities. It indicates the requirement to provide an alternative format for author quantities of 50 or more, requiring submissions to use the SPIRES Collaboration Author Lists XML Format, linking to GitHub to download, to format such longer lists (American Physical Society Citationn.d.-b). Finally, the American Institute of Physics indicates that “Only persons who have significantly contributed to the research should be listed as authors.” AIP goes on to indicate those with contributions, but not authorship-level contributions, should be listed in acknowledgments (AIP Publishing Citationn.d..). Thus, all three of these major physics society mention requirements to be on an author list. But only one (IOP) is very explicit and use the ICMJE recommendations.

John Wiley and Sons (Citation2023) points to COPE within its Author Services in order to advise potential authors on different ethics issues. More specific to author quantity, other publishers who use vague criteria like substantial or significant include American Chemical Society (ACS Publications Citationn.d.), American Society of Civil Engineers (Citationn.d.), American Society of Mechanical Engineers (Citation2020), and Elsevier (Citation2020). Perhaps the most interesting example is Springer Nature (Citationn.d.), which suggest who should be on the author list should be considered through a research field lens, but gives, in absence of such field-based guidelines, a modified version of ICMJE’s recommendations. Similar to Springer Nature, Taylor & Francis Group (Citationn.d. provides a modified version of the ICMJE recommendations, but also states all of its medical/health journals should specifically adhere to their recommendations.

Momeni et al. (Citation2018), rather than examining the published literature, tried to get to the heart of the matter and see if there is any consistency with viewpoints on what roles should result in a credit as an author, an acknowledgment, or no credit or mention at all. Surveyed plastic surgery residents and fellows were given different scenarios and asked what type of credit someone should get. Three of the four scenarios had fairly even splits, with two having a nearly a third each for the three choices and another with a nearly even split between authorship and contributor (but nobody for no credit or mention). Only one scenario had a large majority for one option. While this was a very focused and fairly small sample of people, this illustrated part of the reason why trying to get standardization can be difficult.

Nogrady (Citation2023) shared some views from multiple fields to illustrate that the way authors are listed can vary widely. Some disciplines have the first author as the most important, while others have the last author as most important. Some may group authors by roles, while others may list everyone alphabetically. Some journals may be wary of submissions with large author lists, while others allow such submissions.

It is worth noting that often individual journals may have their own guidelines that could be more specific than publisher-level guidelines. Given there are tens of thousands of journals, reviewing journal-specific guidelines for authorship quantities is impractical. That being said, has summaries of author listing policies for the journals with the most mega-authored articles as found in this study:

Table 1. Author listing policy summaries.

European Physical Journal C may seem contradictory by suggesting both disciplinary and ICMJE-adapted guidelines. Journal of High Energy Physics seems to be the most forceful in its requirements when addressing collaborations. Otherwise, who should be on an author list or how it should be shared seems fairly optional and up to the submitters.

Limited agreement, but some themes

Thus, there is disparity among scientists, publishers, and across disciplines as to what constitutes authorship. It is worth noting the alignment of IOP with the ICMJE recommendations, but two other prominent physics societies being vaguer. But while vague, the other two physics publishers do seem to believe significant contributions should be the deciding factor. Some scientists see issues with larger numbers of authors, with Ledbetter (Citation2012) suggesting with more authors, creativity is diluted. So some may see hundreds of authors as incompatible with ICMJE recommendations or even specific publisher statements. Thus, some of these author names are potentially in this false/undeserved/gift categories by some standards/viewpoints. Others, such as some of the views Regalado (Citation1995) shared, may think any contribution that leads to a publication’s authoring is worthy of inclusion.

The goal of the research in this article is to answer several questions:

RQ1:

How many works are there with these enormous numbers of authors?

RQ2:

In what disciplines and source titles is this happening?

RQ3:

Should these people all actually be listed as authors?

Following the answers to these questions are recommended solutions to the issues caused by mega-authorship.

Methodology

Neither Scopus (https://www.scopus.com/) nor Web of Science (https://www.webofscience.com/) allows for searching by author quantity. It was observed that the phenomenon of overflowing of data in Excel output happened with author counts over 2,000 or so (this varied, of course, as the number of characters in an author’s name can vary widely). In order to be sure to catch such items, but to also get a larger representation of items with many authors, even if they do not cause an overflow, the author count 500 was chosen as the minimum. This number was chosen since 100 gives too many items to be considered unusual. The number 500 was chosen since the number of results was manageable, giving more results and thus more representation than 1,000.

After several unsuccessful searches to try to get mega-author content in both Scopus and Web of Science, InCites (https://incites.clarivate.com/), a companion tool to Web of Science, was found to allow for author quantity searching as a part of one its analysis tools. InCites uses data from Web of Science to enable analysis for research disciplines, organizations, publications, funding agencies, and areas related to scholarly output. This tool can do more analysis and create reports, something either not possible or harder to do using just Web of Science.

Examining all areas and all years (1980–2022), a list of 5,557 items with 500+ authors and associated data was exported to Excel from Web of Science. The detailed process is:

  • (5) Log into InCites.

  • (6) Choose Analyze > Research Areas.

  • (7) On left menu, choose Authors per Document and choose 500 as the minimum and update results.

  • (8) Choose All years (1980-2023) under Publication Date.

  • (9) At this point, the documents for each research area can be selected and exported to Web of Science for viewing and compiling into a single list.

  • (10) The list can then be exported with needed data, for analysis into Excel.

  • • Note: This was done on January 13, 2023.

While Scopus does cover more journals, neither it nor Web of Science allow for searching by author count. Scopus does export all data about these equivalent items, but it also exceeds the character limit and contents overflow into other cells when the author output gets to a certain level, resulting in problematic data quality. Web of Science does not do this, but it instead outputs incomplete data. For example, authors are listed with a semicolon between each name. A formula can be used to calculate the number of semicolons in the author field, then add one (since the last author would not have one) to get the total number of authors.

=LEN(cell#)-LEN(SUBSTITUTE(cell#,“;,”””))

Unfortunately, Web of Science simply ends with the last author that will fit in the cell and outputs two semicolons after the last name that fits. This results in the author quantity being incorrect for those items.

To deal with this, another formula was used to look for the total character count and identify cells which hit this limit. Anything over 32,000 characters was flagged for closer examination. This resulted in over a thousand. To get an accurate quantity, these items were then searched in Scopus since it reports an author quantity when there are a larger quantity of authors (enough to open a side panel when requesting to see the longer author list). This number was recorded to replace the erroneous one found using the aforementioned formula.

=LEN(cell#)

However, 24 items were removed from further examination due to discrepancies as to whether they were mega-author items or not (i.e., Scopus had a much lower author count), leaving 5,533 items to view. These removed items will have some significance later. All but 12 of these items (8 books, 4 conference papers) were journal articles.

The remaining data were then analyzed using Excel to find calculations and trends by formats, years, subjects, group authors, and source titles. This was done using COUNTIF formulas in Excel to count occurrences for each.

Results and discussion

The number of authors ranged from 500 (which was the minimum chosen for mega-author content) to a high of 5,502. The average number of authors for the 5,533 items was 1,531.19 and the median was 1,112. This indicates that while there are a lot of mega-author items, only a bit more than half of the items in this study go over the 1,112 author threshold.

Author counts over time

Mega-authorship was relatively steady during the beginning of this time period, with mega-author titles averaging between 521.68 and 640.21 between 1989–2009. The average of the averages for this period was 574.37. Then, in 2010, mega-authorship skyrocketed, with the average increasing by 150.8% over the previous year; it has not gone below the 2010 average author count since. See .

Figure 1. Average authors per year for items with 500+ authors.

Figure 1. Average authors per year for items with 500+ authors.

Publication quantities

The first item identified by InCites as being over 500 authors was published in 1989 (Aarnio et al. Citation1989). The number of items published per year with over 500 authors remained under 200 items until 2011, after which it has never gone below 200 (although partial data from 2022 is closer to that number than any other year since 2011). In 2019, a peak was reached with 527 mega-authored items. See .

Figure 2. Number of items with 500+ authors.

Figure 2. Number of items with 500+ authors.

Subjects

Based on the subjects assigned to each publication venue by author examination of its coverage, physics overwhelmingly dominates mega-authored content, with 91.13% of the items being from this subject area. Medicine is an extremely distant second place, with 4.37%. A few other subject areas saw minor representation. Notable science areas not represented include chemistry and engineering (although interdisciplinary titles may cover these subjects). Arts, humanities, and social sciences are not represented among the mega-author sources (unless interdisciplinary is counted). See .

Figure 3. Subject distribution for items with 500+ authors Items.

Figure 3. Subject distribution for items with 500+ authors Items.

Publications

All but 12 of the items were published in journals. Four of these were in conference proceedings and eight in books. However, several of the book items appear to be close matches to journal articles, with these items appearing to be either reprints or close revisions published in a book from the same paper. Given the dominance of physics, this should not be a major surprise as physics research is very heavily oriented toward journals.

In terms of the titles, the top ten venues for mega-authorship are all physics journals, with the 11th one being a medical title (Lancet). In fact, the top five titles account for 76.61% of all mega-author items published. See for a list of sources with mega-author content. Shaded items are titles that agree with the ICMJE recommendations, per International Committee of Medical Journal Editors (Citationn.d.-b).

Table 2. Source titles with impact factor and item count.

The Impact Factor is provided for these titles, showing that many of these are highly-cited journals. While impact factor is only partially reliable as an indicator of journal quality, these higher numbers indicate these journals are seeing high citation rates. In other words, these are not minor journals with low citation rates publishing mega-authored content.

Discrepancies

Revisiting the 24 items that had discrepancies in author counts between Web of Science and Scopus, it became clear on examining the items that Web of Science chose to list contributors as authors and Scopus chose to stick with either the main authors and/or the group (project name) author. In these cases, the contributors were obviously separated from the authors and labeled differently. Thus, these 24 items gives a clue about differing perspectives on author versus contributors and how these may be treated differently by both publications and databases. It is worth noting that both of these databases have options for group authors (Group Author in Web of Science and Author Collaboration in Scopus are search fields that can be used in their advanced searches.

This seems to indicate it is entirely possible that more of these mega-author titles may likely be cases where contributors to the project are listed as thus, but the database has chosen to list them as authors. Without examining the full text of all 5,533, it is impossible to determine how pervasive how often databases may be listing non-author contributors as authors. The 24 discrepancy items shows that this can happen, so therefore it is possible that fewer items may truly have this quantity of authors listed on their actual publication, but see their number inflated by databases when indexed.

Why the increase in mega-authorship?

Clearly from the results there are thousands of items, mostly journal articles, with 500 or more authors. The vast majority of these are in physics. So what might be some reasons for this large number of items, especially with physics?

One major contributing factor to the quantity of mega-author titles is the Large Hadron Collider (LHC). This particle accelerator project involves thousands of scientists and is making many discoveries in high energy physics and related areas. Regalado (Citation1995), as mentioned earlier, indicated the effect of accelerators on high authorship counts, and the results found in this article suggest the tradition continues.

When examining the content of the mega-author items, a large number of them are projects related to or at the LHC. In fact, four projects have variations of their names listed in the Group Authors field for over 3,100 of the identified mega-author titles: Compact Muon Solenoid (CMS), A Toroidal LHC Apparatus (ATLAS), Large Hadron Collider beauty (LHCb), and A Large Ion Collider Experiment (ALICE). As an interesting side note, one of these projects is responsible for the mode author quantity, with the ALICE collaborative having 34 articles all with 1,019 authors (another LHC article not from ALICE was the 35th).

Examining the timeline of the LHC and mega-author research data points, there is some alignment. In particular, one can see noteworthy events (such as the discovery of the Higgs boson and the COVID-19 restrictions) seem to have similar important events in the mega-author scholarship timeline. See .

Figure 4. Timeline of 500+ authors content events vs. LHC events.

Source: Cid Vidal and Cid Manzano (Citation2023)
Figure 4. Timeline of 500+ authors content events vs. LHC events.

What’s wrong with mega-authorship?

This question is goes to the heart of the third research question. While it is clear that there tends to be agreement that people should get credit for their work, the type of credit seems to be where agreement does not exist. Although listing everyone as an author who had some contribution to the work that resulted in a specific publication does give credit to that work, this practice can blur the lines of authorship. After all, many publications or disciplines do not have such a practice and who may be included can vary. For example, a librarian may have helped someone find some of the resources listed in a bibliography, but they do not (usually, at least) get author credit for doing so.

It is likely that none of these works examined in this study would adhere to the ICMJE recommendations on what constitutes author credit. But as mentioned previously, different disciplines and scientists may disagree on who deserves an author credit. Thinking about some of the issues seen during this study, in addition to thoughts from the literature review, illustrates the major problems with mega-authorship.

Figure 5. Problems with mega-authorship content.

Figure 5. Problems with mega-authorship content.

Of course, there are some potential questions that might get raised that are likely difficult to answer or may vary widely, depending on the situation. Such as:

  • Who has the rights to any awards won based on a mega-authorship publication?

  • Who faces consequences when potential problems occur, like accusations of plagiarism, misconduct, etc.?

  • What type of credit does one get for tenure and promotion for being one of hundreds or thousands of authors?

  • Does it diminish one’s contributions to be in a list with so many people?

  • Is there an upper limit to the number of authors that even these journals that have previously published mega-author content would not credit?

Collaborators vs. Authors – and possible solutions

One thing that became clear when doing this research is that the lines between collaborator and author are blurred to some degree (for example, using the ICMJE criteria). illustrates the most common scenarios seen with all of the examined publications. The 24 items that were removed, as mentioned in the Methodology section, provided some good examples of alternatives to listing all names as authors, but as noted had different behaviors between Web of Science and Scopus.

Figure 6. 500+ names and how they are handled.

Figure 6. 500+ names and how they are handled.

Scenario 3 is more accurate than the other two. Several items from the 24 discrepancy Items examined that fit this scenario usually indicated the authors are on behalf of the named project. Some publishers were noting that all contributors to the project are coauthors (which muddies the line between author and collaborator). The collaborators in the project were then listed somewhere separate. This scenario gives more information about the exact authors and, if the collaborators are in a separate file and not in the published item, has less of the negative impacts compared to Scenario 1 or 2. However, even if Scenario 3 is done by the publisher, databases may still make the decision to list all of the collaborators as authors – which they seem to sometimes do. Unfortunately, most databases do not seem to have a separate field for contributors that are not authors (but perhaps they should).

However, the ideal would be for a Scenario 4 for mega-author content which is outlined in . While this article studied specifically items with 500+ authors, it is likely this would be ideally used for a lower threshold. Each person’s name would go through this list to consider where they are best placed.

Figure 7. Possible process to determine author vs. Collaborator status.

Figure 7. Possible process to determine author vs. Collaborator status.

As previously mentioned, many items in this study were listed as having group authors. However, given that these were in addition to the hundreds or thousands of individual names, publishers or databases are not currently or consistently using group names as a way to deal with authorship of mega-authored works.

Do these suggestions align with publishers?

But in the long run, are publishers or editors going to check in on these things if the corresponding author insists the author list is legitimately that long? Would the publisher push back and demand details of contributions to the manuscript?

As mentioned in the literature review, there are a mix of approaches to how publishers view who should be allowed on author lists. Some take the path of using terms like “significant contribution” that can result in justification for mega-authorship lists. In other cases, ICMJE (or similar/modified) recommendations result in publishers like IOP and Taylor & Francis being more detailed in their views on who should be on the author list. Springer Nature takes a mixed approach, recommending aligning with the research field, but providing a modified ICMJE criteria in absence of those. But given the different approaches seen within major physics society publishers, research fields may not be consistent within their own field.

Supplemental options

A number of journal articles that followed Scenario 3 had a separate file available, sometimes along with other extra files for supplemental information, with the collaborators (sometimes with a role). This is somewhat similar to what McConnell (Citation1958) suggested, although of course, back then this was all in print, had to be at the end of the document (or perhaps an appendix to the book, journal issue, etc.), and far fewer authors than seen in this study. Today, these thousands of contributors, but not technically authors, can be listed in a separate file. Ideally, it would be nice if databases would retroactively correct these for past listings.

An item not among the items retrieved as part of the study, but listed by Guiness World Records (Citation2021) as “Most authors on a single peer-reviewed academic paper” with 15,025 authors, is COVIDSurg Collaborative and GlobalSurg Collaborative (Citation2021). Examining this item, the work indicates the collaboratives as authors, but also indicates individual members are all authors. Two single names are listed as corresponding authors. The reason this item was not retrieved is because Web of Science indexes this with only two corresponding authors that are listed as such on the PDF of the article, as authors, along with the two collaboratives as group authors. The larger author list, divided up by roles, is listed only within a supplementary file online and not within the PDF. They are specifically indicated to be authors, even though some of the roles are not related to the writing of the actual article, such as “Hospital Leads,” “Dissemination Committee,” and “Local Collaborators.”Footnote1 As previously mentioned, some journals’ author guidelines provide options for alternative ways to share lengthy author lists (Physical Review D, Citationn.d.; Physical Review Letters, Citationn.d.). These could be adapted for names that contributed in other ways.

Others have come up with more complex tools than lists or supplementary tables to find ways of better specifying contributions and potentially ending the listing of hundreds or thousands of names as if they were equal and with ambiguous roles. Holcombe (Citation2019) recommends that CRediT (Contributor Roles Taxonomy), which is already part of some journal management systems, be made standard and be used to more specifically the specific roles of contributors. This allows for more targeted recognition for their exact function as part of large studies, but also allow for other researchers to know and potentially contact those people who are aligned with a specific need. This would, if implemented by publishers, improve how names are listed. In addition to CRediT, Vasilevsky et al. (Citation2021) gives other options, Rescognito, Discogs, Mozilla Open Badges, Contributions table, Contributor Attribution Model, Scholarly Contributions and Roles Ontology, and Manubot.

While these are all possible solutions to this issue, they leave a few problems. First, databases that index these publications will need to address (and ideally, incorporate) roles beyond authorship. Second, existing content with such information would ideally be updated in databases to include these differing roles. Finally, there would ideally be more consistency in how author and contributor roles are described in order to give a fuller picture of their function,

Future directions

One area for potential future study would be to examine the full text of the items from this study to determine if the authorship quantity information retrieved from databases is flawed. Again, it is likely the case that more than 24 items from the original set were in error, that many more of these items have people listed as collaborators, rather than authors, but the databases chose to record them as authors. It may be found that some items are listing a group authorship, but databases are choosing to list individual names as well, even if indicated as collaborators. It would be valuable to determine whether the actual official published version lists all, some, or no individual names as authors, where individual names are listed, and how group authorship comes into play.

So in all likelihood, the number of items which genuinely list all of the people associated with the item as authors is lower than the final 5,533. Due to the size of this list, this was not further examined due to the infeasibility of examining this quantity of items in addition to the other lines of research and it not being within the scope for the focus of this article. This would have consequences for databases, but especially those like Web of Science and Scopus that use such data to produce impact information. This line of research would have potential significant implications for the reliability of impact data in such cases.

Conclusion

The examination of 5,533 items flagged by InCites as having 500 or more authors found that 91.13% of the items were from physics, with medicine and other science disciplines falling at an extreme distance. Many of these items were found to be related to the Large Hadron Collider, with the progression of events at the LHC corresponding to points of interest in the graphs related to the quantity of publications and average number of authors for the 500+ author works.

Unfortunately, this quantity of authors presents problems. Simply listing all authors and their affiliations can result in a ballooning publication length and cause problems with data output from databases such as Web of Science and Scopus as the number climbs higher.

ICMJE has recommendations that discourage listing people as authors who did not have a specific level of contribution to an item. However, not everyone may agree with these recommendations and some disciplines may differ in what they see as appropriate.

While some items list a collaborative group as an author, some that do so additionally list all individual names from that group as authors. Some specify author roles, some seemingly not tired to the actual manuscript authorship. Perhaps such practices will eventually lead to more specificity about roles and thus more distinction between those directly involved in authorship of items and those whose contributed in other important. Some works utilize the solution of listing those who wrote the text of the specific work, while listing those that contributed in some other fashion in a separate file. There are tools that make this information more available and detailed. Such solutions seems more ideal and less ambiguous, but it seems as if some databases have not even caught up to supplemental file contributor lists yet. Authors, publishers, and databases will need collaborate to come up with a solution to this issue.

At the heart of mega-authorship is the desire to give credit to people who contribute in some fashion to a work. There may be many reasons driving why so many people want this credit, including problems like recognition for their work, evidence for tenure and promotion, and/or other work-related career factors. Ideally, solutions will become more commonly implemented to detail people’s specific contribution and unblur the lines between author and contributor.

Supplemental material

Supplemental Material

Download Zip (324 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/08989621.2024.2318790

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Notes

1. Note that the link to supplementary information in the PDF for this journal article goes to the incorrect location. The DOI provided in the reference list goes to the publisher site, with the correct supplementary file. For a direct link to this.zip file, See https://academic.oup.com/bjs/article/108/9/1056/6182412#supplementary-data.

References