1,841
Views
1
CrossRef citations to date
0
Altmetric
Contributions

Expanding the Coverage of Conflict Event Datasets: Three Proofs of Concept

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & show all
Pages 367-397 | Received 07 Apr 2023, Accepted 20 Aug 2023, Published online: 15 Dec 2023
 

ABSTRACT

Many contemporary studies on political violence/social unrest rely on conflict event datasets derived primarily from major international/national news reports. Yet, a large body of research identifies systematic patterns of ‘missingness’ in these data, calling into question statistical results drawn from them. In this project, we explore three specific opportunities for additional data collection to help recover systematically excluded events and to potentially assist in addressing resulting bias. We find that all three approaches result in additional and often systematically different material than that reported in news-based datasets, and we reflect on the advantages and drawbacks of these approaches.

Acknowledgements

We thank the seven freelance journalists who collaborated with us and the many additional news media professionals who agreed to be interviewed for this research. For comments on this project, we are grateful to participants of the Empirical Studies of Conflict (ESOC) University of California Faculty Meeting and Civil Wars’ editorial team and anonymous referees. For their support in executing this project, we thank the following University of California, Merced departments: Customer Relations and Research and Academic Procurement Team; Center for Business Services and Solutions; Institutional Review Board Office; Office of Research and Economic Development; Tax Services; and Accounts Payable. We received approval from the University of California-Merced Institutional Review Board #UCM2021-125.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13698249.2023.2254988.

Notes

1. As described below, this effort involves various comparisons with existing conflict event data. We have sought to use these data responsibly and in good faith. The overall goal of this exercise is to identify means by which these existing datasets might be further improved to the collective benefit of the dataset curators and their users, including potential governmental funders. As such, this effort is in no way intended to aid in the development of datasets (or other products) that serve as competitors for these existing conflict event datasets. Instead, the intention is to provide their curators insights about the nature of missing or likely missing incidents from their previous data collection efforts that might inform future collection efforts to their benefit. This research is not intended to negatively depict these conflict datasets or their curators in any manner. Indeed, we have invested a substantial number of work hours in this project precisely because we consider news report based conflict event datasets to be such a critical resource to academic (and potentially other) communities seeking to understand, forecast and otherwise engage conceptually with political violence and social unrest globally. To the best of our knowledge, there are presently no viable alternatives to the existing news report based conflict event datasets that track conflict and/or social unrest on a global basis. As such, and given how extensively these data are used within academia and government/intergovernmental entities, understanding how these datasets might be further improved is an important public good.

2. In economics, see Voors et al. (Citation2012); Minoiu and Shemyakina (Citation2014); Manacorda and Tesei (Citation2020); political science, see Choi (Citation2010); Fortna (Citation2015); Steinert-Threlkeld (Citation2017); climate, atmospheric sciences, and oceanography, see O’Loughlin et al. (Citation2014); Hoffmann et al. (Citation2020); and ecology and evolutionary biology, see Daskin and Pringle (Citation2018).

3 . Gleditsch et al. (Citation2014) show that datasets include increasingly disaggregated statistics on (1) the actors in conflict such as ethnic minorities (including the ‘All Minorities at Risk’ (Birnir et al. Citation2018) and ‘Ethnic Power Relations’ datasets (Vogt et al. Citation2015); (2) strategies and tactics of conflict such as improvised explosives and terrorist attacks; and (3) conflict beyond violence such as non-violent protests (see the ‘Non-violent and Violent Campaigns and Outcomes Data’ dataset (Chenoweth and Lewis Citation2013)).

4. Some of these governmental actors are identified publicly. Though, we have learned about the identities of various other government/intergovernmental users through interviews with foreign affairs professionals. See A.1 for a description of these interviews.

5. ICEWS incorporates the data formerly included in the WITS database (Bowie Citation2017).

6. Such datasets track terrorism (EDTG) (Hou et al. Citation2020), one-sided ethnic attacks (EOSV) (Fjelde et al. Citation2021), violence against refugees (POSVAR) (Gineste and Savun Citation2019), electoral violence (DECO and CREV) (Birch and Muchlinski Citation2020, Fjelde and Hoglund Citation2022) and violence against peacekeepers (PAR) (Lindberg Bromley Citation2018). Others have created new media-based datasets on specific regions or topics: these include country-specific conflict measurements (BFRS and OCVED) (Bueno de Mesquita et al. Citation2015, Osorio and Beltr´an Citation2019); data on suicide attacks (CPOST) (Pape et al. Citation2021); violent and non-violent electoral contestation (ECAV) (Daxecker et al. Citation2019); water-related conflict (WARICC) (Bernauer et al. Citation2012); non-violent resistance in conflict setting Chenoweth et al. (Citation2019).

7. As of August 26th 2023, Google Scholar citations of the articles introducing/describing these datasets are: ACLED (Raleigh et al. Citation2010): 1,975; GDELT (Leetaru and Schrodt Citation2013): 843; GED (Sundberg and Melander Citation2013): 1,233; GTD (LaFree Citation2010): 861; ICEWS (Obrien Citation2010): 350; and SCAD (Salehyan et al. Citation2012): 541.

8. These include Afghanistan, Burkina Faso, Burundi, China, Colombia, El Salvador, Iraq, Israel, Libya, Mali, Mexico, Pakistan, the Palestinian Territories, the Philippines, South Korea, Sudan, Syria, Rwanda, Ukraine, Venezuela, Yemen and Zimbabwe.

9. These include, but are not limited to, Al Jazeera, the British Broadcasting Corporation (BBC), BuzzFeed, Der Spiegel, France 24, The Guardian, The HuffPost The New York Times, Public Radio International, Reuters, and The Wall Street Journal. In some cases, outlets are not listed here following interviewee requests for anonymity.

10. In , we plot the global distribution of government-imposed internet outages from 2016 through 2022. Journalists operating in conflict zones may also engage in self-censorship and under-report events due to safety concerns (Larreguy et al. Citation2020). Outlets sympathetic to one side of a conflict may selectively report events.

Figure 1. This figure shows many of the countries experiencing political violence/social unrest around the world are the same ones whose governments are shutting down information and communication technologies that likely underpin the news media’s ability to report violence in those countries.

Sources: AccessNow (Citation2016), Schvitz et al. (Citation2022).
Figure 1. This figure shows many of the countries experiencing political violence/social unrest around the world are the same ones whose governments are shutting down information and communication technologies that likely underpin the news media’s ability to report violence in those countries.

11. Interviewee 1, 2022. Reporter from a major wire service.

12. However, it is also important to note that some events are only covered with a written article. Another interviewee reported that budget constraints led their outlet to cover protests in Sudan in a written article, but no photojournalism: ‘Ever since the Ukraine crisis, we have not had the ability to cover [most] of the protests… in Sudan with video and photos because the budget just isn’t there. We’ve still, as text reporters, been able to cover them [in writing]’. (Interviewee 1, 2022. Reporter from a major wire service). Thus, while editorial considerations appear to drive greater coverage of smaller events with photo and video only, budgetary considerations can limit this.

13. Interview 2, 2022. Sub-regional News Director for a major wire service.

14. For instance, as technologies develop, video content might be used to estimate crowd sizes when they are not reported/estimated in news reports. Or they might serve as an alternative estimate.

15. Fatalities: All Data, Btselem: The Israeli Information Center for Human Rights in the Occupied Territories

16. Comprehensive Listing of Terrorism Victims in Israel (September 1993–Present), www.jewishvirtuallibrary.org/comprehensive-listing-of-terrorism-victims-in-israel

17. Mapping Terrorism in the West Bank, FDD Visuals, www.fdd.org/analysis/2022/12/12/mapping-terrorism-in-the-west-bank/

20. Johnston’s Archive, ‘Chronology of Terrorist Attacks in Israel Introduction’, www.johnstonsarchive.net/terrorism/terrisrael.html

21. Interviewee 5, 2022. Staff Writer at the New York Times Magazine (former Wall Street Journal writer in the Middle East).

22. Interviewee 2, 2022. Sub-regional News Director for a major wire service.

23. Interviewee 10, 2022. Journalist with The New York Times.

24. Interviewee 7, 2022. Freelance journalist who worked with major North American outlets and others, including BBC, PRI, France 24, and Canadian Public Broadcasting.

25. Interviewee 4, 2022. Former cable news executive.

26. Interviewee 8, 2022. BuzzFeed News Reporter/former Reuters reporter

27. Interviewee 3, 2022. Freelance journalist/former New York Times reporter.

28. Interviewee 10, 2022. Journalist with The New York Times.

29. Interviewee 6, 2022. Reuters reporter in Latin America.

30. Interviewee 1, 2022. Reporter for a major wire service; Interviewee 2, 2022. Sub-regional News Director for a major wire service.

31. Interviewee 2, 2022. Sub-regional News Director for a major wire service.

32. Interviewee 8, 2022. BuzzFeed News Reporter/former Reuters reporter

33. Interviewee 9, 2023. Freelance human rights journalist; Interviewee 10, 2022. Journalist with The New York Times; Interviewee 11, 2022. Staff Writer at The New York Times Magazine.

34. Interviewee 9, 2023. French Freelance Journalist.

35. Such an approach may not entirely eliminate this source of bias, as editorial pressures surely influence where and how journalists focus their time and efforts in the first place. Yet, significant mismatches in what journalists learn about vs. what they report would provide insight into the nature of editorial bias and potentially provide the direction for measuring/estimating it and potentially using such inferences in statistical analyses (e.g. in establishing upper/lower bounds).

36. Future work might compare patterns to SCAD as well if/when that dataset has been updated to include recent events.

37. We estimate with Bayesian logistic regression PUi=1|1iV,vc,τt=logit1γ1iV+vc+τt, where Ui indicates whether a given incident i was not previously captured by ACLED and the indicator variable captures whether that event is estimated to have involved violence. Country and year fixed effects are given by νc, τt, respectively. Predicted probabilities from alternative models with either country or year fixed effects are displayed in grey in and are effectively unchanged. We generate uncertainty estimates using quasi- Bayesian Monte Carlo simulation. Linear probability model results are consistent (see the accompanying R code), which we generate given possible incidental parameter biases that fixed effects can introduce in logistic regression.

Figure 2. This figure displays the results of comparing photo/video content tracked by the news media based data (blue) vs. those newly identified from the materials (gray). The upper-left figure depicts differences across countries. The upper-right figure displays the predicted probability of not being previously identified when violence is and is not assessed to have been associated with the event. Finally, the bottom figure displays the locations of newly and previously identified events (These plots involve, but are not limited to, the use of data from acleddata.com).

Figure 2. This figure displays the results of comparing photo/video content tracked by the news media based data (blue) vs. those newly identified from the materials (gray). The upper-left figure depicts differences across countries. The upper-right figure displays the predicted probability of not being previously identified when violence is and is not assessed to have been associated with the event. Finally, the bottom figure displays the locations of newly and previously identified events (These plots involve, but are not limited to, the use of data from acleddata.com).

38. See the accompanying R code. This speaks to a more general possible use of the photo/video records: they might be used not only to reduce the number of systematically undercovered events, they might be incorporated into imputation efforts intended to more generally estimate overall levels of underreporting.

39. We compare only the attacks logged by each dataset meeting our inclusion criteria. We exclude datasets’ attacks from analysis based on their classification of actors involved and other key terms. In a limited number of cases, ambiguities in event details could potentially lead to events that were indeed captured by these datasets being dropped. However, we do not believe that we systematically under-count dataset event coverage of any particular type of event.

40. As described in A.3, to determine whether a given event captured by the journalists was included in ACLED, members of our research team manually inspected each event. In some cases, differences in coordinates, dates, or description of the cause(s) and nature of the event between the events reported by the journalists and those included in ACLED made it difficult to determine whether an event reported by the journalists was indeed newly identified. In these cases, we create two datasets: one in which such cases are assumed to be newly identified and one in which they are not. We then calculated the statistics reported in this section using both datasets in order to produce plausible upper and lower bounds.

41. These numbers are conservative as we subset only to those attacks reported by the journalist in which insurgent and separatist force involvement is described. Other potentially responsive cases are dropped from this calculation.

42. Beitbridge, Gutu, Hwedza, Marondera, Mutoko, Nyanga.

43. Nevertheless, this may change. We understand from internal discussions with one news agency that efforts are underway to explore providing specific spatial details extracted from photos/videos.

44. For instance, with greater photo/video coverage, efforts to estimate crowd sizes, participants, whether or not violence occurred, etc. may benefit from the multiple resources.

45. For instance, ACLED reports using Arabic, but not Hebrew language sources to collect data on Israel/Palestine ACLED (Citation2020).

46. Please see the accompanying R code for a description of the UNMISS-GED and UNMISS-ACLED comparisons.

Additional information

Funding

The work was supported by the University of California Merced [Start-up/incidental funding].

Notes on contributors

Andrew Shaver

Andrew Shaver is an assistant professor of political science at the University of California, Merced and founding director of the Political Violence Lab.

Hannah Kazis-Taylor

Hannah Kazis-Taylor is a PhD student in Politics at Princeton University studying Middle East politics and political violence.

Claudia Loomis

Claudia Loomis is a graduate of UC Merced’s political science programme and a research intern in the Political Violence Lab.

Mia Bartschi

Mia Bartschi is a former research intern in the Political Violence Lab.

Paul Patterson

Paul Patterson is a former research intern in the Political Violence Lab.

Adrian Vera

Adrian Vera is a research intern in the Political Violence Lab.

Kevin Abad

Kevin Abad is a former research intern in the Political Violence Lab.

Saher Alqarwani

Saher Alqarwani is a former research intern in the Political Violence Lab.

Clay Bell

Clay Bell is a research intern in the Political Violence Lab.

Sebastian Bock

Sebastian Bock is a former research intern in the Political Violence Lab.

Kieran Cabezas

Kieran Cabezas is a former research intern in the Political Violence Lab.

Heidi Felix

Heidi Felix is a former research intern in the Political Violence Lab.

Jennifer Gonzalez

Jennifer Gonzalez is a former research intern in the Political Violence Lab.

Christopher Hoeft

Christopher Hoeft is a former research intern in the Political Violence Lab.

Aileen Ibarra Martinez

Aileen Ibarra Martinez is a former research intern in the Political Violence Lab.

Kai Keltner

Kai Keltner is a former research intern in the Political Violence Lab and an undergraduate student at Vanderbilt University.

Jessica Moroyoqui

Jessica Moroyoqui is a former research intern in the Political Violence Lab.

Kieko Paman

Kieko Paman is a former research intern in the Political Violence Lab.

Ethan Ramirez

Ethan Ramirez is a former research intern in the Political Violence Lab.

Priscilla Reis

Priscilla Reis is a former research intern in the Political Violence Lab.

Juan Jose Rodriguez

Juan Jose Rodriguez Jr. is a research intern with the Political Violence Lab and an alumnus of the University of California, Riverside.

Jazmin Santos-Perez

Jazmin Santos-Perez is a former research intern in the Political Violence Lab.

Katha Komal Sikka

Katha Komal Sikka is a research fellow with the Political Violence Lab and M.A. student at Columbia University.

Arjan Singh

Arjan Singh is a former research intern in the Political Violence Lab.

Cassidy Tao

Cassidy Tao is a former research intern in the Political Violence Lab.

Richard Tirado

Richard Tirado is a former research intern in the Political Violence Lab.

Aishvari Trivedi

Aishvari Trivedi is a former research intern in the Political Violence Lab.

Lillian Xu

Lillian Xu is a former research intern in the Political Violence Lab.

Margaret You

Margaret You is a former research intern in the Political Violence Lab.

Meriam Eskander

Meriam Eskander is a former research intern in the Political Violence Lab.