1,839
Views
1
CrossRef citations to date
0
Altmetric
Contributions

Expanding the Coverage of Conflict Event Datasets: Three Proofs of Concept

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & show all
Pages 367-397 | Received 07 Apr 2023, Accepted 20 Aug 2023, Published online: 15 Dec 2023

ABSTRACT

Many contemporary studies on political violence/social unrest rely on conflict event datasets derived primarily from major international/national news reports. Yet, a large body of research identifies systematic patterns of ‘missingness’ in these data, calling into question statistical results drawn from them. In this project, we explore three specific opportunities for additional data collection to help recover systematically excluded events and to potentially assist in addressing resulting bias. We find that all three approaches result in additional and often systematically different material than that reported in news-based datasets, and we reflect on the advantages and drawbacks of these approaches.

Introduction

Academics and policymakers rely on cross-national conflict event datasets derived wholly or partly from news reports; however, a significant body of research has revealed systematic issues with such datasets. In this 25th Anniversary Special Issue article, we pilot diverse approaches to supplementing existing datasets and offer recommendations to address potential sources of systematic ‘missingness’.Footnote1

Empirical conflict research has shifted away from analysing broad global patterns in conflicts using country-level data (e.g., conflict onset (Fearon and Laitin Citation2003), duration (Collier et al. Citation2004, De Rouen and Sobek Citation2004) and settlement (Walter Citation1997)). Recent literature has taken a more fine-grained approach, analysing patterns in individual violent events or conflict dynamics in smaller regions (see Berman et al. (Citation2018) for a broad review). To do so, scholars across different fieldsFootnote2 have increasingly relied on conflict event datasets. Datasets on various aspects of conflict, including such conflict-event datasets, have improved by becoming increasingly disaggregated, i.e. reporting more precise details rather than topline statistics (e.g. yearly measures at the country level).Footnote3 Disaggregated incident-level datasets have been used to examine at a more micro-level not only civil wars and insurgencies (Berman et al. Citation2011, Crost et al. Citation2014, Sexton Citation2016, Condra et al. Citation2018), but also terrorism (Laktabai Citation2020, Mroszczyk and Abrahms Citation2021, Tin et al. Citation2021, Hoeffler et al. Citation2022), social unrest including protest activity (Sutton et al. Citation2014, Bodnaruk Jazayeri Citation2016, Klein and Regan Citation2018, Ives and Lewis Citation2020), and other forms of political violence.

Disaggregated conflict-event datasets are also both used and funded by a variety of public sector actors including United Nations entities, United States governmental agencies, and other national governments.Footnote4 These datasets, which are broadly focused on political violence/social unrest and often global in coverage, include the Armed Conflict Location Event dataset (ACLED) (Raleigh et al. Citation2010); Global Data on Events, Location, and Tone (GDELT) (Leetaru and Schrodt Citation2013); the Georeferenced Events Dataset (GED) (Sundberg and Melander Citation2013); Global Terrorism Dataset (GTD) (LaFree and Dugan Citation2007); Integrated Crisis Early Warning System database (ICEWS) (Boschee et al. Citation2015);Footnote5 and the Social Conflict Analysis Database (SCAD) (Salehyan et al. Citation2012). Derivative datasets have manipulated the existing media-based datasets to focus on more specific aspects of conflicts.Footnote6 Finally, the rising collation of news reports into datasets has generated subsequent integration efforts to improve the accuracy of relevant data (Zhukov et al. Citation2017, Donnay et al. Citation2019).

These conflict events datasets underpinning increasingly micro-level research on conflict by academics and policymakers are constructed largely or wholly from major international and national news media reports.Footnote7 However, a growing body of research has identified patterns of systematic missing data in media-based conflict event datasets (which we cite at length below). We pilot and test approaches to recovering missing data, and show that our approaches can help correct patterns of systematic missing data in existing conflict event datasets.

Given the prominence of media-based datasets, we do not seek to discourage their use (though, we encourage caution!) but rather suggest ways of enriching these data to benefit academic scholarship and governments’ policy and programming. In the discussion, we reflect on the reliability of alternative data sources and how these approaches might help address sources of systematic ‘missingness’ in existing cross-national datasets. We focus on recommendations that entities funding and developing conflict event datasets might consider adopting.

We pilot three different approaches to expanding existing datasets. Our approach and conclusions are informed by in-depth interviews with media professionals familiar with reporting on political violence/social unrest in countries around the worldFootnote8 who work, or have reported as freelance journalists, for major outlets.Footnote9

Our first effort uses photo and video journalism, rather than written articles, to track previously unidentified incidents of political protest and social unrest. Second, we integrate records of violent incidents from local-language media, NGOs and local authorities. Third, we contract in-country journalists to log all relevant events they learn about; we then compare the events they identify to those reported in existing conflict-event data.

In brief, we find that all three approaches result in additional and often systematically different material than that reported in news report based data, and we reflect in the discussion on the advantages and drawbacks of each approach.

Through this paper, we make several contributions. First, we help advance the debate over the use of conflict event datasets based on news reports. Rather than a black-or-white recommendation of simply avoiding or using these datasets, we not only encourage actively working to improve them but offer tangible solutions to do so. Second, our collaborations with local journalists through independent contracts provide a model for sustained and potential broader engagement with journalists around the world. Such partnerships between academia and news media, particularly in the IR space, are currently lacking and are therefore urgent and necessary. Finally, if/as the data augmentation processes we recommend are implemented, the newly updated data might be used to retest prominent research findings that rely on the existing conflict event datasets.

This paper proceeds as follows: first, we outline the shortcomings of using news reporting to build protest and violence datasets. Second, we enumerate our three independent approaches to expanding these datasets, namely (1) photo and video journalism, (2) local-language media, NGOs and local authorities, and (3) the independent contracting of in-country journalists. Third, we present our results, and finally, we reflect on our results and conclude with further suggestions on additional data sources.

Shortcomings of News Reporting for Building Protest and Violence Datasets

A significant body of research has shown that cross-national datasets tend to omit certain events; but more importantly, they are likely to systematically miss particular types of events. Skewed patterns of reporting are particularly problematic as systematic mismeasurement is likely to produce bias in statistical estimates.

Scholars have identified patterns in omissions of violent events based on geography, time, type of violence and identity of the perpetrator. Geographically, an event in a populous area is more likely to be covered than an event in a remote one (Kalyvas Citation2004, Eck Citation2012, Weidmann Citation2015, Dietrick and Eck Citation2020); events in ‘Western’ countries are also covered at higher rates (Behlendorf et al. Citation2016). The timing of an event also influences its media coverage: instances of political violence are significantly under-reported prior to elections when compared to post-election reporting (Von Borzyskowski and Wahman Citation2021). Furthermore, the type of violence can influence reporting: media outlets disproportionately cover more severe (Croicu and Eck Citation2022) and sensational forms of violence (e.g. bombings) (Zhukov and Baum Citation2019, Shaver et al. Citation2022).

Given that cross-national datasets rely on media reporting, they may reproduce these biases in media coverage. Conflict events may go unreported in international media for a host of reasons. As Shaver et al. (Citation2022) detail, journalists face a variety of restrictions on their ability to report on events, including difficulties accessing remote areas. Media outlets may be influenced or directly controlled by the local government, impacting whether and how they cover events (Miller et al. Citation2022). For instance, governments can deliberately restrict journalists’ access to information with internet blackouts, as likely occurred following protests in Iran in 2022 (Campbell Citation2023) and in India in 2018 (Hussain Citation2023).Footnote10

The effects of systematic underreporting are significant. In a large-scale ‘reverse replication’ exercise, Shaver et al. (Citation2022) attempt to recover the results of a large number of articles published in leading economics and political science journals using media-derived conflict data in place of high-quality administrative data originally used in those studies. They find that the majority are irrecoverable.

Paths Forward? Exploring Plausible Supplements to Major News Article-Derived Data

Given the established limitations of conflict event datasets that rely on major news media reports, how can we supplement these datasets to limit missing data problems?

Schutte et al. (Citation2022) highlight some solutions for remedying discrepancies in data collection, including recommended statistical analysis and conceptual framings. Similarly, Donnay et al. (Citation2019) put forth MELTT, or Matching Event Data by Location, Time, and Type, as an inexpensive methodology designed to improve data collection accuracy concerning spatially aggregated and machine-coded datasets. Von Borzyskowski and Wahman (Citation2021) suggest employing a survey-based approach for studies with a small scope and cross-referencing the results with dataset findings to reduce errors. When working on a larger scale where multiple datasets contain relevant data, Cook et al. (Citation2017) introduce an estimate model of misclassification and appropriate risk-model probabilities weighting, which they find significantly reduces the impact of biases on data. Otto (Citation2013) offers more broad suggestions, including increasing transparency of coding procedures, using more exact definitions and ensuring researchers utilise appropriate statistical models.

In the rest of this section, we propose and carry out preliminary tests of three strategies for supplementing media-derived conflict event data.

Identifying Events from Photo- and Video-Journalism

Our first data effort to track the types of incidents systematically overlooked by media and conflict event datasets uses photojournalism, rather than print journalism.

Conflict-event datasets generally rely on written news articles. Yet, in addition to producing news articles about incidents of violence and social unrest, major global news organisations like the Agence-France Presse (AFP), Associated Press (AP) and Reuters frequently capture video and photos of events. Crucially, these outlets often do not publish a written story about events captured by photo and/or video. As an intervieweeFootnote11 described:

There’s a considerably lower bar for covering protests and such for video than for text. If we are covering a protest for text… before you write a story about it, you would want there to be a reason – and the reason is usually that it represents a large portion of society going out to the streets… you want thousands of people, at least, representing the grievances of many more thousands of people. With photo and video especially, they cover protests a lot, even when there is as few as fifteen people there…Footnote12

We test whether photo and video archives maintained by the AFP Forum (Citationn.d.) and the AP Newsroom (Citationn.d.) can help expand conflict event datasets. For this proof of concept, we focused specifically on comparing protest and riot events tracked by ACLED, given that both photo-/video-journalism and that particular dataset both often focus on protests and social unrest.

We examined patterns of reporting across 11 countries, chosen to ensure generally broad geographic representation: Brazil, Colombia, Egypt, France, Haiti, Myanmar, Nicaragua, Pakistan, Peru, South Africa and Yemen. We describe our video and photo search processes and comparisons with ACLED in detail in A.2.

Through a subscription with the AP, we also gained access to the outlet’s planned coverage of events by media type. For each news event that the organisation plans to cover, the means of coverage (photo, video and/or text) is indicated, allowing us to validate an interviewee’sFootnote13 description of heterogeneity across reporting types. For a one month period beginning on 23 May 2023, we monitored the planned news coverage of expected protest and similar activity. To identify such events, we used search terms consistent with our previous efforts investigating biases in photo and video journalism (again, see A.2 for details). For each identified incident of expected social unrest, we recorded the intended coverage method. For instance, on 29 May 2023, AP reported plans to cover the ‘[p]rotest over Saudi execution of two Bahraini men over militant activities’, with a video segment but no written article.

During this one month effort, we identified 65 planned coverage events deemed relevant. Of those, 38 events 58.46 per cent were to be covered by text and 25 (38.46 per cent) were to be covered by photo or video and not with text. More generally, 62 (95.38 per cent) were to be covered with photo or video. The extremely high rate of photo/video coverage is informative because, in addition to the events that do not receive text coverage that can be detected through photo/video, these resources might also be used in the identification of additional details that do not appear in the written articles.Footnote14

Detecting Events from Local Sources

Our second effort involves tracking events covered by local-language news sites, civil society actors and government to identify systematic patterns of missing data in existing datasets. While international/national media may tend to cover larger-scale events and those otherwise aligned with particular editorial preferences, local news sources and organisations based on the ground may be more likely to also report smaller scale events, given local populations’ interest. A number of scholars have empirically evaluated whether reports by local media and civil society organisations suffer from fewer biases than aggregated event datasets that often draw on international news sources. Demarest and Langer (Citation2018) show that in Nigeria, datasets relying on local, as opposed to international, news sources log significantly more protest and political violence events. Clarke (Citation2021) also finds that local media sources on protests in Egypt capture many more events than existing datasets, but still reveal some biases of existing datasets in undercounting smaller events outside of the capital, as compared to local activist groups’ records. Davenport and Ball (Citation2002) identify different biases of violence reporting by newspapers, human rights organisations and interviews in Guatemala.

However, local media may not offer better coverage of violent events than current datasets in all cases. The quality of local media reporting depends on local conditions including availability of communications technology (Croicu and Kreutz Citation2016, Weidmann Citation2016), regime type (Baum and Zhukov Citation2015), severity of conflict (Davies and True Citation2017) and geographical region, with sparser event coverage in Africa (Dietrick and Eck Citation2020). Cultural biases may lead local media to omit certain forms of violence, such as violence against women or ethnic minorities (Davies and True Citation2017). Shaver et al. (Citation2022) describe how threats to local journalist safety can skew reporting.

For a proof of concept of improving existing datasets with local-language reporting, we piloted the use of local media, governmental and NGO sources to identify political violence events in Israel/Palestine. As described in the supplementary material, we track only extrajudicial violence and not military activity. To track local-language media coverage of political violence events, we used the Hebrew-language version of the Israeli news site Yedioth Ahronot (Citationn.d.), a popular mainstream news site in Israel. We identified all articles including the term ‘attack’ and manually determined whether an article described a case of political violence meeting the criteria of existing datasets. We also logged political violence events covered by local NGO/watchdog organisations and government bodies that were not reported in existing datasets. We drew on catalogues created by civil society organisations aligned with both sides of the conflict. We used catalogues of attacks compiled by the Israeli anti-occupation non-profit Btselem,Footnote15 the American Jewish non-profit Jewish Virtual Library,Footnote16 the Foundation for the Defense of Democracy,Footnote17 the pro-Palestinian DC-based think tank the Jerusalem Foundation,Footnote18 the Israeli government-linked Meir Amit Intelligence and Terrorism Information Center,Footnote19 and a catalogue by Dr. Wm. Robert Johnston.Footnote20 Finally, the Israel Defense Forces published a list of political violence events covering September 2016 through October 2016; we logged events recorded here but absent from existing datasets.

Direct Information Sharing from Journalists

Our third approach to collecting data on the types of violent incidents often overlooked by international/national media – and thus conflict event datasets derived from their reporting – takes a different approach: we pilot directly contracting journalists on the ground to report on all incidents of political violence and social unrest about which they hear.

Our in-depth interviews with media professionals make clear that journalists learn about many more events than are ultimately reported by the outlets they write for: ‘There is a lot of violence that happens and it can’t all be written about…’Footnote21

Interviewees highlighted various factors that affect ultimate reporting likelihood. Generally, the more people involved or affected by an event; the more novel the event; and the more high-profile individuals or groups involved, the more likely it is to be covered. Below, we draw from the interviews, highlighting some of the exclusionary criteria and examples of reporting bias they shared.

In conflict settings, often only fatal events (and particularly attacks with many fatalities) are covered. ‘If people do not die, [there is] much less chance that we are going to be writing about it’.Footnote22

Another interviewee who reported extensively on conflict in Colombia, made a similar observation: ‘Sadly and tragically, news events that involve… casualties, deaths often rise in importance and how they are viewed. That’s kind of just the nuts and bolts of our business. It elevates a news event in a way that it wouldn’t otherwise’.Footnote23

Reflecting on reporting on violence in Burkina Faso, an interviewee described how attacks ‘against security forces get absolutely no traction… it happens too regularly unfortunately… Sometimes it’s six [military personnel killed]; sometimes it’s ten… but, in terms of international news, it never makes headlines anymore’.Footnote24

The identity of the people affected can also determine coverage. ‘…I hate to be blunt about it, but all lives are not considered equal in the eyes of journalists from the lowest level to the highest’.Footnote25

The international wire services ‘put a lot more emphasis on reporting who is harmed if that person is American or Western’.Footnote26

Another interviewee offered examples from the Iraq War: ‘If ten Iraqi people get killed, that’s nothing. That is not even worth a story in the New York Times. [And even more so] if they are killed in remote parts’.Footnote27

Concerning the perpetrators, an interviewee described ‘one of the more frustrating aspects of reporting [on conflict in Colombia is] that international news organisations would often pay much more attention to atrocities that were carried out by the guerillas… than atrocities carried out by paramilitaries or by government forces’.Footnote28

In settings where protests and social unrest are more common, the focus is often instead often on the numbers of individuals involved. Describing coverage of protest activity, an interviewee described their organisation ‘limit[ing] the number and type [of protests] we report on because of their newsworthiness’Footnote29 which two intervieweesFootnote30 described in terms of participant numbers: protests typically would not be covered unless they reached thousands.

One interviewee further highlighted that even major protests are less likely to be covered the longer they last given the ‘repetitive’ nature of the events and that ‘stories end up looking alike’.Footnote31 Another interviewee echoed this: ‘There is a calculus of whether something is newsworthy: supposing there is a protest in a place where there is always protests, you wouldn’t necessarily write about that’.Footnote32

Several interviewees all described a lack of editorial interest in reports of violence in particular countries or during particular periods.Footnote33 For instance, an interviewee described the difficulty they faced in placing stories of events they uncovered during the 2016 through 2021 period of conflict preceding Russia’s invasion of Ukraine.Footnote34

These factors influencing event coverage clearly highlight significant differences between events journalists learn about and what they publish; further, this list of editorial pressures is certainly not exhaustive. Ideally, accessing the complete set of incidents that journalists learn about in the course of their reporting (not just those that are reported) can help circumvent editorial bias.Footnote35 This is precisely what we seek to do through a series of collaborations with freelance reporters in our third proof of concept.

To better understand the set of events that journalists learn about in the course of their reporting, we entered into partnerships with seven freelance journalists who have all written for major international news media outlets. They reported to us all incidents of political violence and social unrest that they learned about in the course of their work, regardless of whether or not they considered the events newsworthy or likely to be published. Collectively, they covered events in Mozambique, Pakistan, Peru, South Africa, and Zimbabwe. We then compared the events that these journalists identified with the events reported by ACLED for these same countries over the same respective time periods. Additional details of these arrangements and our approach to comparing events appears in A.3.

These countries were chosen on the basis of 1) the success of our efforts in identifying journalists to collaborate with and 2) the nature of ongoing or expected unrest in these countries. These countries vary significantly in terms of the type of political violence they are experiencing. They span different regions of the world. For instance, one journalist described how ongoing violence in Pakistan’s Balochistan province is underreported by the news media relative to other parts of the country. Peru recently experienced significant unrest in its Puno region, where protesters established roadblocks and temporarily shut down several airports (Al Jazeera Citation2023). Despite a recent respite in unrest, violence is expected to resume. Zimbabwe has experienced some violence as the 2023 elections approached, with more potential political violence and government repression expected as the date was confirmed and drew nearer.

Results

Our three efforts each identified new incidents that were not previously tracked by news-based datasets. While the efforts varied in the number of new incidents they uncovered, each revealed incidents that differed systematically from those that were tracked by existing media-based datasets.

Identifying Events from Photo- and Video-Journalism

We find that the news-based datasets did not cover a significant portion of events tracked by photo- and video-journalism outlets, and that the events not tracked by the news-based datasets often systematically differed from those included in the datasets.

However, we found that a relatively small number of events overall were reported only in photo/video journalism but not in print news reporting. Photographed/video-recorded events not captured by the news-report based data made up a small proportion of total events related to social unrest – from just shy of one per cent in one case (Myanmar) to approximately four per cent in another (Nicaragua).Footnote36 Relative to the other two proofs of concept, this approach thus generated much less overall new material (though, as we discuss later, these results are based on a limited set of source materials, which future efforts might expand).

Although the volume of newly identified events is relatively limited, we find that previously and newly identified events vary significantly. Newly identified incidents may serve not only to expand overall content but to help specifically in expanding the set of activities that are systematically underreported in existing event datasets. Using our estimates of whether the photographed/video-recorded events involved violence, we calculate the predicted probability of inclusion in ACLED.Footnote37 We find that violent events were 16.48 percentage points more likely be included in the media based data (42.66 per cent vs. 26.18 per cent) (See ). Furthermore, when we compare the set of newly identified events with the overall body of ACLED events involving social unrest (that is, all comparable events from the the same countries and time periods), we again find that non-violent events were significantly less likely to be captured.Footnote38

We also observe significant cross-country differences. Of the photos and videos depicting social unrest in Myanmar, for instance, only 14.96 per cent were not tracked by ACLED. In contrast, of the incidents identified in South Africa, we estimate that 62.96 per cent were not captured.

So, given that the photo/video-only events often differed systematically from those recorded in existing datasets, there are compelling reasons to augment existing conflict event datasets with photo and video based records; we discuss these recommendations further in the conclusion.

Detecting Events from Local Sources

Our second data collection effort, which draws on local media, non-profit and government sources, approximately doubled the number of attacks identified in Israel/Palestine by existing cross-national datasets from 2000–2023 (we identified 4,516 relevant attacks across the ACLED, RAND and GTD datasets, and uncovered an additional 4,191 incidents).Footnote39

We find that the three existing datasets we tested systematically omit certain kinds of attacks; omissions follow the patterns identified by (Shaver et al. Citation2022). We find that non-fatal attacks are systematically omitted, with existing datasets identifying only 50 per cent of non-fatal attacks we have logged, as compared to 72 per cent of all identified fatal attacks.

We also find that unarmed attacks, or attacks using homemade weapons, are disproportionately omitted from existing datasets (see ). While almost all attacks using guns have previously been identified by existing cross-national datasets, existing datasets have identified less than half of unarmed attacks and attacks targeting property and almost no attacks using rocks.

Figure 3. This figure displays the results of our data collection efforts on extrajudicial violence in Israel/Palestine from 2000–March 2023, based on nonprofits, local governments, and local-language media. The first figure shows the percentage of all identified attacks (pooling existing datasets and our original dataset) logged within the time period each dataset was active that each dataset had included, by attack type. The second and third graphs compare all three datasets’ coverage of attacks by fatality and region, respectively, to the complete set of attacks identified by any of the datasets or our original efforts (these plots involve, but are not limited to, the use of data from acleddata.com).

Figure 3. This figure displays the results of our data collection efforts on extrajudicial violence in Israel/Palestine from 2000–March 2023, based on nonprofits, local governments, and local-language media. The first figure shows the percentage of all identified attacks (pooling existing datasets and our original dataset) logged within the time period each dataset was active that each dataset had included, by attack type. The second and third graphs compare all three datasets’ coverage of attacks by fatality and region, respectively, to the complete set of attacks identified by any of the datasets or our original efforts (these plots involve, but are not limited to, the use of data from acleddata.com).

As suggested by our interviews, existing datasets disproportionately miss attacks in more dangerous regions. Existing datasets cover 23 per cent more of the identified attacks within Israel as compared to attacks within the occupied West Bank, which has suffered approximately four times as many attacks in the period under study.

The identity of the perpetrator also predicts omission from cross-national datasets. Existing datasets have significantly more complete coverage of attacks perpetrated by Palestinians as compared to attacks perpetrated by Israelis.

Direct Information Sharing from Journalists

Our third data collection effort, contracting local journalists to report on violent incidents, also substantially expands the set of events reported by the media-based data. We believe that such efforts can help mitigate patterns in missing data in existing conflict-event datasets. We provide the broad descriptive statistics for each country in turn:

In the most modest case, in South Africa, the journalist reported 19 events, 36.84 to 52.63 per cent of which were newly identified.Footnote40 We estimate that these newly identified events make up 3.61 to 5.15 per cent of the 194 total comparable events tracked by the news report based data during this period.

In contrast, in Pakistan, the two journalists reported 184 events, 61.41 to 76.09 per cent of which we estimate to be newly identified events. Again, for comparison, the newly identified events are estimated to make up between 19.25 to 23.85 per cent of the total 587 events tracked by the news report based dataset during the same period. Of particular note, a substantial number of these newly identified incidents depict extreme levels of violence in the country’s Balochistan province that are virtually invisible in the news-report based data, which we discuss in more detail below.

Results from Zimbabwe are also stark: the two journalists reporting from that country reported 31 events, of which 62.07 to 68.97 per cent were newly identified. The newly identified events account for 75.00 to 83.33 per cent of the 24 total events tracked by the news media data during this period.

Finally, in Mozambique the journalist reported 18 events, of which we estimate 22.22 to 50.00 per cent were newly identified (making up between 13.33 to 30 per cent of the total number of news report based dataset entries). And in Peru, the journalist reported 34 incidents. 55.88 to 61.76 per cent were newly identified, making up 11.24 to 12.43 per cent of 169 incidents tracked by the new report based data.

Importantly, the journalists often picked up classes of events that systematically differed from those that appear in the news report based data. The heterogeneity of results across countries makes generalising difficult. Thus, we instead remark on a few prominent findings across country cases, which reveal the potential power of involving journalists directly in the reporting/data collection process.

In Pakistan, we make two observations. First, the journalists reported a substantial number of deaths associated with armed conflict and social unrest beyond those reported in the news report based data. Over the one month of reporting, we estimate that they tracked between 59 and 83 additional fatalities (between 33.91 and 47.70 per cent of the total number of fatalities reported by the news report based data). Second, they tracked a substantial number of armed attacks involving insurgent and separatist forces that were not captured by the media-based data. We estimate that the journalists captured between 71 and 91 additional attacks during their one month of reporting (resulting in many dozens of previously untracked injuries and deaths).Footnote41

As reveals, much of this fighting occurs in Pakistan’s Balochistan region, which was clearly highly undercovered relative to the provinces of Khyber Pakhtunkhwa and Sindh. As one of the reporting journalists described to us, ‘[l]awlessness, Balochistan’s remote location, strict army control, and inadequate communication and infrastructure were the dominant factors’ that limit reporting in the area.

Figure 4. This figure displays the distribution of incidents of political violence and social unrest tracked by the journalists with whom we contracted (blue) alongside the distribution of comparable events tracked by news report based conflict event data (red) over the same time periods (These plots involve, but are not limited to, the use of data from acleddata.com).

Figure 4. This figure displays the distribution of incidents of political violence and social unrest tracked by the journalists with whom we contracted (blue) alongside the distribution of comparable events tracked by news report based conflict event data (red) over the same time periods (These plots involve, but are not limited to, the use of data from acleddata.com).

In Zimbabwe, we note the broad geographic coverage of the two journalists’ activities. Nearly one third of incidents of political violence and social unrest occurred across six districtsFootnote42 of the country’s 81 (7.41 per cent) in which there was no recorded activity in the media based data.

Finally, we note that relative to the percentage of newly identified incidents of social unrest (relative to captured incidents of social unrest), the percentage of newly identified violent attacks (relative to detected attacks) was larger in three of the five countries. While our sample is small, this result may point to important cross-country heterogeneities in marginal returns to journalist engagement across different forms of violence/unrest. For instance, in Peru, we estimate that between 50 to 60 per cent of incidents of social unrest captured by the journalists were newly identified. In contrast, more than 70 per cent of attacks were newly identified. Similarly, in Zimbabwe, whereas 33.33 per cent of incidents of social unrest were newly identified, we estimate that between 66.67 and 75 per cent of attacks were newly identified. We observe a similar pattern in South Africa.

Discussion & Conclusion

In this article, we explore a series of methods that curators of conflict event datasets might engage in to supplement existing efforts. We find that all three efforts can be used to identify incidents not tracked in news report-based conflict data. Below, we reflect on the advantages and disadvantages of each approach, and on additional possible avenues for collecting incident-level data on political violence.

Reflections on the Photo/Video Effort

Although the number of events that we newly identify is modest, we first note that we did not consult the universe of professional news photo and video media for this effort. This paper’s analysis relies only on entries from the AP (photo and video) and AFP (photo only). However, other major databases exist (e.g. AFP video, Bloomberg (Citationn.d.), Reuters (Citationn.d.) and EPA Images (Citationn.d.)), which if collectively consulted would result in a wider and potentially more substantial set of newly identified events.

Furthermore, incorporating photo and video materials into the conflict event datasets may be relatively straightforward and, perhaps more importantly, sustainable. Just as the curators of existing conflict event datasets have established data streams consisting of written news article content, they might also establish subscriptions and collection methods with news organisations to regularly augment their materials with details extracted from photos and videos.

AI language models might be used to increase the efficiency of collecting conflict event data from photo- and video-journalism records. For instance, data extracted from photo- and video-journalism metadata through existing application programming interfaces might then be filtered through an AI language model to classify incidents of political violence and social unrest. Limitations of this effort are similar to those associated with relying on written news articles to identify and describe incidents of political violence and social unrest. Specifically, only those details reported in the photo/video’s title or caption or that can be gleaned from the photo/video itself can be translated into the rows of datasets with incident-level details. For instance, event locations associated with photos/videos are often general (e.g., city name), limiting the precise spatial identification of events.Footnote43 Differences across news media platforms sometimes produce discrepancies in how dates are reported – creating the need for deeper critical analysis of existing creation, arrival, and event dates to determine actual incident dates. Additionally, for a given event (e.g. for a particular protest), multiple photos/videos may be produced capturing that event. While there may be advantages to this approachFootnote44, it also com- plicates efforts to identify unique events. Furthermore, when individual photos or videos are used to extract details about an event, there is a risk that particular details related to the event may be missed. For instance, one event photo may depict violence while another does not.

Reflections on the Effort to Detect Events from Local Sources

Our pilot data collection effort tracking incidents of political violence in Israel/Palestine shows that local civil society organisations and government authorities have tracked a significant number of violent events omitted from existing cross-national datasets. We have found that including events from these government or NGO actors in datasets is significantly easier than reviewing local media records. Logging attacks from local media sources is labour intensive and often requires language skills. Further, research assistants often cannot code local media reports without an understanding of a country’s geography, factions of a conflict and ability to recognise the perpetrator’s background based on a name. On the other hand, government and NGO reports are typically easier for foreign researchers to code.

Yet, researchers must be cognisant of governments’ and NGOs’ incentives to exaggerate or underplay certain forms of violence based on their political interests. We omitted a significant number of NGO reports that did not meet our threshold for a violent incident (eg. involving only verbal exchange of insults), or that did not provide sufficient information about an attack. Researchers must verify the quality of the NGO and its reporting, and perhaps triangulate different NGO/media reports.

Despite the challenges related to using local-language media sources, we have found that local-language media often covers more incidents than English-language national media, and may be a useful data sources. Conflict-event datasets and academics might consider using automated translation of local-language media in order to review publications in languages that are not widely spoken.Footnote45

Reflections on Contracting with Local Journalists

Contracting with local journalists comes with a wide variety of benefits. Chief amongst these is the ability to work with them to obtain the specific details associated with each event of interest to researchers.

Journalists provided us with a more precise location of incidents than would typically be reported in a news article. Per our agreement, they provided specific details related to the weapons employed in attacks and the precise coordinates at which an event took place. While journalists writing news articles that form the basis of much of the existing conflict event datasets likely also have access to such details, the news report writing process typically does not provide a mechanism for conveying that information.

Furthermore, only a small number of journalists are required to achieve substantial increases in reporting. In particular, in those countries in which we hired two journalists, increases in newly identified events relative to levels of events reported in the media based data were substantial – though, of course, returns to additional journalists are likely to vary substantially across countries given differences in their sizes (geographic and population), government restrictions, and levels of ongoing political violence. In short, direct and continuing engagement with journalists may be more feasible than expected given high marginal returns at low numbers. Nevertheless, there are limitations as well. While the collaborations with journalists overcome substantial editorial biases, they do not disappear entirely. For instance, the journalist with whom we collaborated in Peru remarked after working with us on the difficulty of learning about events when many in the country were eager to move past recent unrest: “[A]fter a couple extremely tumultuous months, the country seems to be exhausted and wanting to avoid anything related to political violence”. Broader industry editorial pressures may simply make identifying particular content difficult even where the individual journalist is not themself bound by the editorial constraints of their principal employer(s).

Furthermore, freedom from editorial constraints does not necessarily address impediments that make learning about events difficult in the first place. For instance, as one of our journalists reported to us after completing the assignment, ‘[a]nother issue was that most of the cases of political violence were/are happening in remote rural areas where victims are not even reporting the cases’. One of the Pakistan-based journalists described the limitations of reporting given governmental restrictions: ‘To control the narrative, the Pakistani military has imposed restrictions on media outlets… As a result, mainstream media sources do not cover all militant attacks, except major attacks that occur in cities, like Peshawar’. Direct collaborations with journalists may partially circumvent some of these issues (as our analysis shows); but they are likely to persist, continuing to produce some degree of systematic ‘missingness’ in the data in the process (though attenuated relative to reliance on news articles alone).

Steps Forward

The set of efforts we pilot are by no means comprehensive, and scholars seeking to incorporate conflict event data in their own work might consider parallel efforts. For instance, various high-quality administrative records have been released from government sources, and other similar records may exist.

Researchers might pursue available channels for requesting administrative data from relevant governments and international organisations to potentially acquire non-media data on political violence. For instance, to the best of our knowledge, the U.S. military has not released ‘SIGACTs’ type data related to its engagements in countries like Libya and Yemen. Given the comprehensive nature of data released by the U.S. Defense Department related to its Operations Iraqi Freedom, Enduring Freedom and Inherent Resolve, it stands to reason that similar records probably exist. Wartime records may also be accessible in archives. For instance, both the U.S. Department of Veterans Affairs, through its Official Military Activities Report (OMAR) (Aragao Citation2019) database, and Shaver et al. (Citation2023) have extracted fine grained conflict details from the Vietnam War from the National Archives and Records Administration archival base camp data (NARA Citation2023), which are electronically available for download (See ).

Figure 5. This figure displays the distribution of wartime incidents tracked by U.S. forces during the Vietnam war for the year 1969.

Sources: Shaver et al. (Citation2023); NARA (Citation2023).
Figure 5. This figure displays the distribution of wartime incidents tracked by U.S. forces during the Vietnam war for the year 1969.

We also note past and ongoing efforts worldwide to track political violence through other means. For instance, we recognize the recent efforts of Solstad (Citation2023) to track wartime activity in Ukraine through the use of satellite data on temperature anomalies. Another example comes from the various United Nations missions that have collected high quality civilian casualty data. An excellent example of this is the United Nations Mission in South Sudan (UNMISS) – the UN’s peacekeeping mission for the country – which is engaged in a large-scale effort to track civilian casualties across that country. UNMISS collects a large quantity of fine-grained data on violence against civilians. The UNMISS initiative itself provides an important proof of concept for supplementary collection methods to media-based datasets. Indeed, in comparing the number of civilian casualties that UNMISS tracked in South Sudan between the years of 2019 through 2021 with GED, we estimate that UNMISS tracked 981 (Citation2020), 2,336 (Citation2021), and 1,856 (Citation2021) additional fatalities (UNMISS Citation2021, Citation2022, Citation2023a, Citation2023b). In percentage terms, GED’s civilian fatality numbers make up 13.26, 3.67, and 2.67 per cent of UNMISS’ totals. Similarly, we estimate that UNMISS tracked an additional 160 (2019), 1,763 (2020), and 966 (Citation2021) civilian fatalities compared to ACLED, which equates to around 85.85, 27.30, and 49.34 per cent of UNMISS’ totals for those years.Footnote46

As discussed above, future efforts to collect data on political violence and social unrest are likely to be increasingly augmented with AI technologies – to potentially include utilising computer vision, enabling machines to analyse and extract information from visual inputs including images, videos, graphics, and text. Indeed, the existing literature already includes some proofs of concept for applying such technologies to conflict analyses. For example, Mueller et al. (Citation2021) trained an AI model to identify structural damage in satellite imagery in Syrian cities, and Aronson (Citation2018) developed an AI model to classify objects in citizen video to identify human rights violations in Aleppo. Further, Radeva (Citation2021) demonstrated the use of visual AI in analysing documentary evidence through processing text, document format, graphics, and predefined objects.

We close, however, with a focus on data generating processes. As AI methods enable us to collect increasingly large conflict datasets, the importance of deep, ongoing collaborations with experts familiar with the data generating processes such as journalists will be essential to ensure that such future efforts do not fall victim to the same patterns of selection that have skewed existing datasets. As David Hand (Citation2020), emeritus professor of mathematics of Imperial College London, writes, ‘while it helps to have lots of data – that is, “big data” – size is not everything. And what you don’t know, the data you don’t have, may be even more important in understanding what’s going on than the data you do have… [T]he problems of dark [missing] data… are ubiquitous’. We hope that our proofs of concept not only provide specific paths forward but serve to encourage greater and sustained attention to those processes.

Supplemental material

FCIV_2254988_Supplementary_Material

Download PDF (541.9 KB)

Acknowledgements

We thank the seven freelance journalists who collaborated with us and the many additional news media professionals who agreed to be interviewed for this research. For comments on this project, we are grateful to participants of the Empirical Studies of Conflict (ESOC) University of California Faculty Meeting and Civil Wars’ editorial team and anonymous referees. For their support in executing this project, we thank the following University of California, Merced departments: Customer Relations and Research and Academic Procurement Team; Center for Business Services and Solutions; Institutional Review Board Office; Office of Research and Economic Development; Tax Services; and Accounts Payable. We received approval from the University of California-Merced Institutional Review Board #UCM2021-125.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13698249.2023.2254988.

Additional information

Funding

The work was supported by the University of California Merced [Start-up/incidental funding].

Notes on contributors

Andrew Shaver

Andrew Shaver is an assistant professor of political science at the University of California, Merced and founding director of the Political Violence Lab.

Hannah Kazis-Taylor

Hannah Kazis-Taylor is a PhD student in Politics at Princeton University studying Middle East politics and political violence.

Claudia Loomis

Claudia Loomis is a graduate of UC Merced’s political science programme and a research intern in the Political Violence Lab.

Mia Bartschi

Mia Bartschi is a former research intern in the Political Violence Lab.

Paul Patterson

Paul Patterson is a former research intern in the Political Violence Lab.

Adrian Vera

Adrian Vera is a research intern in the Political Violence Lab.

Kevin Abad

Kevin Abad is a former research intern in the Political Violence Lab.

Saher Alqarwani

Saher Alqarwani is a former research intern in the Political Violence Lab.

Clay Bell

Clay Bell is a research intern in the Political Violence Lab.

Sebastian Bock

Sebastian Bock is a former research intern in the Political Violence Lab.

Kieran Cabezas

Kieran Cabezas is a former research intern in the Political Violence Lab.

Heidi Felix

Heidi Felix is a former research intern in the Political Violence Lab.

Jennifer Gonzalez

Jennifer Gonzalez is a former research intern in the Political Violence Lab.

Christopher Hoeft

Christopher Hoeft is a former research intern in the Political Violence Lab.

Aileen Ibarra Martinez

Aileen Ibarra Martinez is a former research intern in the Political Violence Lab.

Kai Keltner

Kai Keltner is a former research intern in the Political Violence Lab and an undergraduate student at Vanderbilt University.

Jessica Moroyoqui

Jessica Moroyoqui is a former research intern in the Political Violence Lab.

Kieko Paman

Kieko Paman is a former research intern in the Political Violence Lab.

Ethan Ramirez

Ethan Ramirez is a former research intern in the Political Violence Lab.

Priscilla Reis

Priscilla Reis is a former research intern in the Political Violence Lab.

Juan Jose Rodriguez

Juan Jose Rodriguez Jr. is a research intern with the Political Violence Lab and an alumnus of the University of California, Riverside.

Jazmin Santos-Perez

Jazmin Santos-Perez is a former research intern in the Political Violence Lab.

Katha Komal Sikka

Katha Komal Sikka is a research fellow with the Political Violence Lab and M.A. student at Columbia University.

Arjan Singh

Arjan Singh is a former research intern in the Political Violence Lab.

Cassidy Tao

Cassidy Tao is a former research intern in the Political Violence Lab.

Richard Tirado

Richard Tirado is a former research intern in the Political Violence Lab.

Aishvari Trivedi

Aishvari Trivedi is a former research intern in the Political Violence Lab.

Lillian Xu

Lillian Xu is a former research intern in the Political Violence Lab.

Margaret You

Margaret You is a former research intern in the Political Violence Lab.

Meriam Eskander

Meriam Eskander is a former research intern in the Political Violence Lab.

Notes

1. As described below, this effort involves various comparisons with existing conflict event data. We have sought to use these data responsibly and in good faith. The overall goal of this exercise is to identify means by which these existing datasets might be further improved to the collective benefit of the dataset curators and their users, including potential governmental funders. As such, this effort is in no way intended to aid in the development of datasets (or other products) that serve as competitors for these existing conflict event datasets. Instead, the intention is to provide their curators insights about the nature of missing or likely missing incidents from their previous data collection efforts that might inform future collection efforts to their benefit. This research is not intended to negatively depict these conflict datasets or their curators in any manner. Indeed, we have invested a substantial number of work hours in this project precisely because we consider news report based conflict event datasets to be such a critical resource to academic (and potentially other) communities seeking to understand, forecast and otherwise engage conceptually with political violence and social unrest globally. To the best of our knowledge, there are presently no viable alternatives to the existing news report based conflict event datasets that track conflict and/or social unrest on a global basis. As such, and given how extensively these data are used within academia and government/intergovernmental entities, understanding how these datasets might be further improved is an important public good.

2. In economics, see Voors et al. (Citation2012); Minoiu and Shemyakina (Citation2014); Manacorda and Tesei (Citation2020); political science, see Choi (Citation2010); Fortna (Citation2015); Steinert-Threlkeld (Citation2017); climate, atmospheric sciences, and oceanography, see O’Loughlin et al. (Citation2014); Hoffmann et al. (Citation2020); and ecology and evolutionary biology, see Daskin and Pringle (Citation2018).

3 . Gleditsch et al. (Citation2014) show that datasets include increasingly disaggregated statistics on (1) the actors in conflict such as ethnic minorities (including the ‘All Minorities at Risk’ (Birnir et al. Citation2018) and ‘Ethnic Power Relations’ datasets (Vogt et al. Citation2015); (2) strategies and tactics of conflict such as improvised explosives and terrorist attacks; and (3) conflict beyond violence such as non-violent protests (see the ‘Non-violent and Violent Campaigns and Outcomes Data’ dataset (Chenoweth and Lewis Citation2013)).

4. Some of these governmental actors are identified publicly. Though, we have learned about the identities of various other government/intergovernmental users through interviews with foreign affairs professionals. See A.1 for a description of these interviews.

5. ICEWS incorporates the data formerly included in the WITS database (Bowie Citation2017).

6. Such datasets track terrorism (EDTG) (Hou et al. Citation2020), one-sided ethnic attacks (EOSV) (Fjelde et al. Citation2021), violence against refugees (POSVAR) (Gineste and Savun Citation2019), electoral violence (DECO and CREV) (Birch and Muchlinski Citation2020, Fjelde and Hoglund Citation2022) and violence against peacekeepers (PAR) (Lindberg Bromley Citation2018). Others have created new media-based datasets on specific regions or topics: these include country-specific conflict measurements (BFRS and OCVED) (Bueno de Mesquita et al. Citation2015, Osorio and Beltr´an Citation2019); data on suicide attacks (CPOST) (Pape et al. Citation2021); violent and non-violent electoral contestation (ECAV) (Daxecker et al. Citation2019); water-related conflict (WARICC) (Bernauer et al. Citation2012); non-violent resistance in conflict setting Chenoweth et al. (Citation2019).

7. As of August 26th 2023, Google Scholar citations of the articles introducing/describing these datasets are: ACLED (Raleigh et al. Citation2010): 1,975; GDELT (Leetaru and Schrodt Citation2013): 843; GED (Sundberg and Melander Citation2013): 1,233; GTD (LaFree Citation2010): 861; ICEWS (Obrien Citation2010): 350; and SCAD (Salehyan et al. Citation2012): 541.

8. These include Afghanistan, Burkina Faso, Burundi, China, Colombia, El Salvador, Iraq, Israel, Libya, Mali, Mexico, Pakistan, the Palestinian Territories, the Philippines, South Korea, Sudan, Syria, Rwanda, Ukraine, Venezuela, Yemen and Zimbabwe.

9. These include, but are not limited to, Al Jazeera, the British Broadcasting Corporation (BBC), BuzzFeed, Der Spiegel, France 24, The Guardian, The HuffPost The New York Times, Public Radio International, Reuters, and The Wall Street Journal. In some cases, outlets are not listed here following interviewee requests for anonymity.

10. In , we plot the global distribution of government-imposed internet outages from 2016 through 2022. Journalists operating in conflict zones may also engage in self-censorship and under-report events due to safety concerns (Larreguy et al. Citation2020). Outlets sympathetic to one side of a conflict may selectively report events.

Figure 1. This figure shows many of the countries experiencing political violence/social unrest around the world are the same ones whose governments are shutting down information and communication technologies that likely underpin the news media’s ability to report violence in those countries.

Sources: AccessNow (Citation2016), Schvitz et al. (Citation2022).
Figure 1. This figure shows many of the countries experiencing political violence/social unrest around the world are the same ones whose governments are shutting down information and communication technologies that likely underpin the news media’s ability to report violence in those countries.

11. Interviewee 1, 2022. Reporter from a major wire service.

12. However, it is also important to note that some events are only covered with a written article. Another interviewee reported that budget constraints led their outlet to cover protests in Sudan in a written article, but no photojournalism: ‘Ever since the Ukraine crisis, we have not had the ability to cover [most] of the protests… in Sudan with video and photos because the budget just isn’t there. We’ve still, as text reporters, been able to cover them [in writing]’. (Interviewee 1, 2022. Reporter from a major wire service). Thus, while editorial considerations appear to drive greater coverage of smaller events with photo and video only, budgetary considerations can limit this.

13. Interview 2, 2022. Sub-regional News Director for a major wire service.

14. For instance, as technologies develop, video content might be used to estimate crowd sizes when they are not reported/estimated in news reports. Or they might serve as an alternative estimate.

15. Fatalities: All Data, Btselem: The Israeli Information Center for Human Rights in the Occupied Territories

16. Comprehensive Listing of Terrorism Victims in Israel (September 1993–Present), www.jewishvirtuallibrary.org/comprehensive-listing-of-terrorism-victims-in-israel

17. Mapping Terrorism in the West Bank, FDD Visuals, www.fdd.org/analysis/2022/12/12/mapping-terrorism-in-the-west-bank/

20. Johnston’s Archive, ‘Chronology of Terrorist Attacks in Israel Introduction’, www.johnstonsarchive.net/terrorism/terrisrael.html

21. Interviewee 5, 2022. Staff Writer at the New York Times Magazine (former Wall Street Journal writer in the Middle East).

22. Interviewee 2, 2022. Sub-regional News Director for a major wire service.

23. Interviewee 10, 2022. Journalist with The New York Times.

24. Interviewee 7, 2022. Freelance journalist who worked with major North American outlets and others, including BBC, PRI, France 24, and Canadian Public Broadcasting.

25. Interviewee 4, 2022. Former cable news executive.

26. Interviewee 8, 2022. BuzzFeed News Reporter/former Reuters reporter

27. Interviewee 3, 2022. Freelance journalist/former New York Times reporter.

28. Interviewee 10, 2022. Journalist with The New York Times.

29. Interviewee 6, 2022. Reuters reporter in Latin America.

30. Interviewee 1, 2022. Reporter for a major wire service; Interviewee 2, 2022. Sub-regional News Director for a major wire service.

31. Interviewee 2, 2022. Sub-regional News Director for a major wire service.

32. Interviewee 8, 2022. BuzzFeed News Reporter/former Reuters reporter

33. Interviewee 9, 2023. Freelance human rights journalist; Interviewee 10, 2022. Journalist with The New York Times; Interviewee 11, 2022. Staff Writer at The New York Times Magazine.

34. Interviewee 9, 2023. French Freelance Journalist.

35. Such an approach may not entirely eliminate this source of bias, as editorial pressures surely influence where and how journalists focus their time and efforts in the first place. Yet, significant mismatches in what journalists learn about vs. what they report would provide insight into the nature of editorial bias and potentially provide the direction for measuring/estimating it and potentially using such inferences in statistical analyses (e.g. in establishing upper/lower bounds).

36. Future work might compare patterns to SCAD as well if/when that dataset has been updated to include recent events.

37. We estimate with Bayesian logistic regression PUi=1|1iV,vc,τt=logit1γ1iV+vc+τt, where Ui indicates whether a given incident i was not previously captured by ACLED and the indicator variable captures whether that event is estimated to have involved violence. Country and year fixed effects are given by νc, τt, respectively. Predicted probabilities from alternative models with either country or year fixed effects are displayed in grey in and are effectively unchanged. We generate uncertainty estimates using quasi- Bayesian Monte Carlo simulation. Linear probability model results are consistent (see the accompanying R code), which we generate given possible incidental parameter biases that fixed effects can introduce in logistic regression.

Figure 2. This figure displays the results of comparing photo/video content tracked by the news media based data (blue) vs. those newly identified from the materials (gray). The upper-left figure depicts differences across countries. The upper-right figure displays the predicted probability of not being previously identified when violence is and is not assessed to have been associated with the event. Finally, the bottom figure displays the locations of newly and previously identified events (These plots involve, but are not limited to, the use of data from acleddata.com).

Figure 2. This figure displays the results of comparing photo/video content tracked by the news media based data (blue) vs. those newly identified from the materials (gray). The upper-left figure depicts differences across countries. The upper-right figure displays the predicted probability of not being previously identified when violence is and is not assessed to have been associated with the event. Finally, the bottom figure displays the locations of newly and previously identified events (These plots involve, but are not limited to, the use of data from acleddata.com).

38. See the accompanying R code. This speaks to a more general possible use of the photo/video records: they might be used not only to reduce the number of systematically undercovered events, they might be incorporated into imputation efforts intended to more generally estimate overall levels of underreporting.

39. We compare only the attacks logged by each dataset meeting our inclusion criteria. We exclude datasets’ attacks from analysis based on their classification of actors involved and other key terms. In a limited number of cases, ambiguities in event details could potentially lead to events that were indeed captured by these datasets being dropped. However, we do not believe that we systematically under-count dataset event coverage of any particular type of event.

40. As described in A.3, to determine whether a given event captured by the journalists was included in ACLED, members of our research team manually inspected each event. In some cases, differences in coordinates, dates, or description of the cause(s) and nature of the event between the events reported by the journalists and those included in ACLED made it difficult to determine whether an event reported by the journalists was indeed newly identified. In these cases, we create two datasets: one in which such cases are assumed to be newly identified and one in which they are not. We then calculated the statistics reported in this section using both datasets in order to produce plausible upper and lower bounds.

41. These numbers are conservative as we subset only to those attacks reported by the journalist in which insurgent and separatist force involvement is described. Other potentially responsive cases are dropped from this calculation.

42. Beitbridge, Gutu, Hwedza, Marondera, Mutoko, Nyanga.

43. Nevertheless, this may change. We understand from internal discussions with one news agency that efforts are underway to explore providing specific spatial details extracted from photos/videos.

44. For instance, with greater photo/video coverage, efforts to estimate crowd sizes, participants, whether or not violence occurred, etc. may benefit from the multiple resources.

45. For instance, ACLED reports using Arabic, but not Hebrew language sources to collect data on Israel/Palestine ACLED (Citation2020).

46. Please see the accompanying R code for a description of the UNMISS-GED and UNMISS-ACLED comparisons.

References

  • AccessNow, 2016. Keep It On: fighting internet shutdowns around the world, 2016–2022, technical report.
  • ACLED, 2020. Israel/Palestine sourcing profile. Available from: https://acleddata.com/acleddatanew/wp-content/uploads/2021/11/ACLED_Israel-Palestine-Sourcing-Profile_April-2020.pdf.
  • AFP Forum, n.d. Available from: www.afpforum.com
  • Al Jazeera, 2023. Peru Anti-Government protesters clash with police in Puno, Al Jazeera (January).
  • AP Newsroom, n.d. Available from: https://newsroom.ap.org/editorial-photos-videos .
  • Aragao, C., 2019. War on a thumb drive: how SigAct data could transform care for veterans.
  • Aronson, J.D., 2018. Computer vision and machine learning for human rights video analysis: case studies, possibilities, concerns, and limitations. Law & Social Inquiry, 43 (4), 1188–1209. doi:10.1111/lsi.12353.
  • Baum, M.A. and Zhukov, Y.M., 2015. Filtering revolution: reporting bias in international newspaper coverage of the Libyan civil war. Journal of Peace Research, 52 (3), 384–400. doi:10.1177/0022343314554791.
  • Behlendorf, B., Belur, J., and Kumar, S., 2016. Peering through the kaleidoscope: variation and validity in data collection on terrorist attacks. International Journal of Offshore and Polar Engineering, 39 (7–8), 641–667. doi:10.1080/1057610X.2016.1141004.
  • Berman, E., Felter, J.H., and Shapiro, J.N., 2018. Small wars, big data. In: Small wars, big data. Princeton, NJ.: Princeton University Press. doi:10.23943/9781400890118.
  • Berman, E., Shapiro, J.N., and Felter, J.H., 2011. Can hearts and minds be bought? The economics of counterinsurgency in Iraq. Journal of Political Economy, 119 (4), 766–819. doi:10.1086/661983.
  • Bernauer, T., et al., 2012. Water-related intrastate conflict and cooperation (WARICC): a new event dataset. International Interactions.
  • Birch, S. and Muchlinski, D., 2020. The dataset of countries at risk of electoral violence. Terrorism and Political Violence, 32 (2), 217–236. doi:10.1080/09546553.2017.1364636.
  • Birnir, J.K., et al., 2018. Introducing the AMAR (All Minorities At Risk) data. Journal of Conflict Resolution, 62 (1), 203–226. doi:10.1177/0022002717719974.
  • Bloomberg, n.d. Bloomberg Media Distribution. Available from: https://www.bloomberg.com/distribution [Accessed 18 August 2023].
  • Bodnaruk Jazayeri, K., 2016. Identity-based political inequality and protest: the dynamic relationship between political power and protest in the Middle East and North Africa. Conflict Management and Peace Science, 33 (4), 400–422. doi:10.1177/0738894215570426.
  • Boschee, E., et al., 2015. ICEWS coded event data.
  • Bowie, N.G., 2017. Terrorism events data: an inventory of databases and data sets, 1968–2017. Perspectives on Terrorism, 11 (4), 50–72.
  • Bueno de Mesquita, E., et al., 2015. Measuring political violence in Pakistan: insights from the BFRS dataset. Conflict Management and Peace Science, 32 (5), 536–558. doi:10.1177/0738894214542401.
  • Campbell, E., 2023. Mahsa Amini and the future of internet repression in Iran.
  • Chenoweth, E., Hendrix, C.S., and Hunter, K., 2019. Introducing the nonviolent action in violent contexts (NVAVC) dataset. Journal of Peace Research, 56 (2), 295–305. doi:10.1177/0022343318804855.
  • Chenoweth, E. and Lewis, O.A., 2013. Nonviolent and violent campaigns and outcomes (NAVCO) data project, version 2.0, campaign year data, codebook. In: Josef Korbel School of International Studies. University of Denver.
  • Choi, S.-W., 2010. Fighting terrorism through the rule of law? Journal of Conflict Resolution, 54 (6), 940–966. doi:10.1177/0022002710371666.
  • Clarke, K., 2021. Which protests count? Coverage bias in Middle East event datasets. Mediterranean Politics, 28 (2), 302–328. doi:10.1080/13629395.2021.1957577.
  • Collier, P., Hoeffler, A., and Söderbom, M., 2004. On the duration of civil war. Journal of Peace Research, 41 (3), 253–273. doi:10.1177/0022343304043769.
  • Condra, L.N., et al., 2018. The logic of insurgent electoral violence. American Economic Review, 108 (11), 3199–3231. doi:10.1257/aer.20170416.
  • Cook, S.J., et al., 2017. Two wrongs make a right: addressing underreporting in binary data from multiple sources. Political Analysis, 25 (2), 223–240. doi:10.1017/pan.2016.13.
  • Croicu, M. and Eck, K., 2022. Reporting of non-fatal conflict events. International Interactions, 48 (3), 450–470. doi:10.1080/03050629.2022.2044325.
  • Croicu, M. and Kreutz, J., 2016. Communication technology and reports on political violence: cross-national evidence using African events data. Political Research Quarterly, 70 (1), 19–31. doi:10.1177/1065912916670272.
  • Crost, B., Felter, J., and Johnston, P., 2014. Aid under fire: development projects and civil conflict. American Economic Review, 104 (6), 1833–1856. doi:10.1257/aer.104.6.1833.
  • Daskin, J.H. and Pringle, R.M., 2018. Warfare and wildlife declines in Africa’s protected areas. Nature, 553 (7688), 328–332. doi:10.1038/nature25194.
  • Davenport, C. and Ball, P., 2002. Views to a kill: exploring the implications of source selection in the case of Guatemalan state terror, 1977–1995. Journal of Conflict Resolution, 46 (1), 427–450. doi:10.1177/0022002702046003005.
  • Davies, S.E. and True, J., 2017. The politics of counting and reporting conflict-related sexual and gender-based violence: the case of Myanmar. International Feminist Journal of Politics, 19 (1), 4–21. doi:10.1080/14616742.2017.1282321.
  • Daxecker, U., Amicarelli, E., and Jung, A., 2019. Electoral contention and violence (ECAV): a new dataset. Journal of Peace Research, 56 (5), 714–723. doi:10.1177/0022343318823870.
  • Demarest, L. and Langer, A., 2018. The study of violence and social unrest in Africa: a comparative analysis of three conflict event datasets. African Affairs, 117 (467), 310–325. doi:10.1093/afraf/ady003.
  • De Rouen, K.R., Jr and Sobek, D., 2004. The dynamics of civil war duration and outcome. Journal of Peace Research, 41 (3), 303–320. doi:10.1177/0022343304043771.
  • Dietrick, N. and Eck, K., 2020. Known unknowns: media bias in the reporting of political violence. International Interactions, 46 (6), 1043–1060. doi:10.1080/03050629.2020.1814758.
  • Donnay, K., et al., 2019. Integrating conflict event data. Journal of Conflict Resolution, 63 (5), 1337–1364. doi:10.1177/0022002718777050.
  • Eck, K., 2012. In data we trust? A comparison of UCDP GED and ACLED conflict events datasets. Cooperation and Conflict, 47 (1), 124–141. doi:10.1177/0010836711434463.
  • EPA Images, n.d. Epa. Available from: https://epaimages.com/search.pp0 [Accessed 18 August 2023].
  • Fearon, J.D. and Laitin, D.D., 2003. Ethnicity, insurgency, and civil war. American Political Science Review, 97 (1), 75–90. doi:10.1017/S0003055403000534.
  • Fjelde, H., et al., 2021. Introducing the ethnic one-sided violence dataset. Conflict Management and Peace Science, 38 (1), 109–126. doi:10.1177/0738894219863256.
  • Fjelde, H. and Hoglund, K., 2022. Introducing the deadly electoral conflict dataset (DECO). Journal of Conflict Resolution, 66 (1), 162–185. doi:10.1177/00220027211021620.
  • Fortna, V.P., 2015. Do terrorists win? Rebels use of terrorism and civil war outcomes. International Organization, 69 (3), 519–556. doi:10.1017/S0020818315000089.
  • Gibilisco, M. and Steinberg, J., 2022. Strategic reporting: a formal model of biases in conflict data. American Political Science Review, 1–17. doi:10.1017/S0003055422001162.
  • Gineste, C. and Savun, B., 2019. Introducing POSVAR: a dataset on refugee-related violence. Journal of Peace Research, 56 (1), 134–145. doi:10.1177/0022343318811440.
  • Gleditsch, K.S., Metternich, N.W., and Ruggeri, A., 2014. Data and progress in peace and conflict research. Journal of Peace Research, 51 (2), 301–314. doi:10.1177/0022343313496803.
  • Hand, D.J., 2020. Dark data: why what you don’t know matters. Princeton, NJ.: Princeton University Press.
  • Hoeffler, A., et al., 2022. Tracking the SDGs: a methodological note on measuring deaths caused by collective violence. The Economics of Peace & Security Journal, 17 (2), 32–46. doi:10.15355/epsj.17.2.32.
  • Hoffmann, R., et al., 2020. A meta-analysis of country-level studies on environmental change and migration. Nature Climate Change, 10 (10), 904–912. doi:10.1038/s41558-020-0898-6.
  • Hou, D., Gaibulloev, K., and Sandler, T., 2020. Introducing extended data on terrorist groups (EDTG). Journal of Conflict Resolution, 64 (1), 199–225. doi:10.1177/0022002719857145.
  • Hussain, B., 2023. Kashmir registers highest number of internet restrictions globally.
  • Ives, B. and Lewis, J.S., 2020. From rallies to riots: why some protests become violent. Journal of Conflict Resolution, 64 (5), 958–986. doi:10.1177/0022002719887491.
  • Kalyvas, S.N., 2004. The urban bias in research on civil wars. Security Studies, 13 (3), 160–190. doi:10.1080/09636410490914022.
  • Klein, G.R. and Regan, P.M., 2018. Dynamics of political protests. International Organization, 72 (2), 485–521. doi:10.1017/S0020818318000061.
  • LaFree, G., 2010. The global terrorism database (GTD). Perspectives on Terrorism, 4 (1), 24–46.
  • LaFree, G. and Dugan, L., 2007. Introducing the global terrorism database. Terrorism and Political Violence, 19 (2), 181–204. doi:10.1080/09546550701246817.
  • Laktabai, V.K., 2020. Using GIS to assess the risk of terrorism: a case study of Garissa County. Thesis (PhD). University of Nairobi
  • Larreguy, H., et al., 2020. Dont read all about it: drug trafficking organizations and media reporting of violence in Mexico.
  • Leetaru, K. and Schrodt, P.A., 2013. GDELT: Global Data on Events, Location, and Tone, 1979–2012. In: ISA Annual Convention. vol. 2, Citeseer, 1–49.
  • Lindberg Bromley, S., 2018. Introducing the UCDP peacemakers at risk dataset, Sub-Saharan Africa, 1989–2009. Journal of Peace Research, 55 (1), 122–131. doi:10.1177/0022343317735882.
  • Manacorda, M. and Tesei, A., 2020. Liberation technology: mobile phones and political mobilization in Africa. Econometrica, 88 (2), 533–567. doi:10.3982/ECTA14392.
  • Miller, E., et al., 2022. An agenda for addressing bias in conflict data. Scientific Data, 9 (1), 593. doi:10.1038/s41597-022-01705-8.
  • Minoiu, C. and Shemyakina, O.N., 2014. Armed conflict, household victimization, and child health in Côte d'Ivoire. Journal of Development Economics, 108, 237–255. doi:10.1016/j.jdeveco.2014.03.003
  • Mroszczyk, J. and Abrahms, M., 2021. Terrorism in Africa: explaining the rise of extremist violence against civilians, E-International Relations.
  • Mueller, H., et al., 2021. Monitoring war destruction from space using machine learning. Proceedings of the National Academy of Sciences, 118 (23), e2025400118. doi:10.1073/pnas.2025400118.
  • NARA, 2023. Electronic records relating to the Vietnam War— archives.gov. Available from: https://www.archives.gov/research/military/vietnam-war/electronic-data-files#:~:text=During%20the%20Vietnam%20War%2C%20the,(IBM)%20developed%20the%20system [Accessed 30 June 2023].
  • Obrien, S.P., 2010. Crisis early warning and decision support: contemporary approaches and thoughts on future research. International Studies Review, 12 (1), 87–104. doi:10.1111/j.1468-2486.2009.00914.x.
  • O’Loughlin, J., Linke, A.M., and Witmer, F.D., 2014. Effects of temperature and precipitation variability on the risk of violence in Sub-Saharan Africa, 1980–2012. Proceedings of the National Academy of Sciences, 111 (47), 16712–16717. doi:10.1073/pnas.1411899111.
  • Osorio, J. and Beltran, A., 2019. Enhancing the detection of criminal organizations in Mexico using ML and NL, 1–7.
  • Otto, S., 2013. Coding one-sided violence from media reports. Cooperation and Conflict, 48 (4), 556–566. doi:10.1177/0010836713507668.
  • Pape, R.A., Rivas, A.A., and Chinchilla, A.C., 2021. Introducing the new CPOST dataset on suicide attacks. Journal of Peace Research, 58 (4), 826–838. doi:10.1177/0022343320978260.
  • Radeva, E., 2021. The potential for computer vision to advance accountability in the Syrian crisis. Journal of International Criminal Justice, 19 (1), 131–146. doi:10.1093/jicj/mqab015.
  • Raleigh, C., et al., 2010. Introducing ACLED: an armed conflict location and event dataset. Journal of Peace Research, 47 (5), 651–660. doi:10.1177/0022343310378914.
  • Reuters, n.d. Reuters. Available from: https://www.reutersagency.com/en/content-types/pictures/?gad=1amp;gclid=Cj0KCQjw4s-kBhDqARIsAN-ipH0n7lZy9naHHKczJDCVFOCCUmn4aULvDwvrlRWAbrQ1PoHL8B25ucaAt6XEALwwcBamp;gclsrcaw.ds [Accessed 18 August 2023].
  • Salehyan, I., et al., 2012. Social conflict in Africa: a new database. International Interactions, 38 (4), 503–511. doi:10.1080/03050629.2012.697426.
  • Schutte, S., Kelling, C., and Al-Ameri, T., 2022. A Monte Carlo analysis of false inference in spatial conflict event studies. PLoS One, 17 (4), 1–22. doi:10.1371/journal.pone.0266010.
  • Schvitz, G., et al., 2022. Mapping the international system, 1886–2017: the CShapes 2.0 dataset. Journal of Conflict Resolution, 66 (1), 144–161. doi:10.1177/00220027211013563.
  • Sexton, R., 2016. Aid as a tool against insurgency: evidence from contested and controlled territory in Afghanistan. American Political Science Review, 110 (4), 731–749. doi:10.1017/S0003055416000356.
  • Shaver, A., et al., 2022. News media reporting patterns and our biased understanding of global unrest, Technical Report. Empirical Studies of Conflict Project.
  • Shaver, A., et al., 2023. Global patterns of political violence and social unrest. Working Paper.
  • Solstad, S.U., 2023. Using satellite data to track Ukraine’s counter-offensive. Available from: https://github.com/TheEconomist/the-economist-war-fire-model.
  • Steinert-Threlkeld, Z.C., 2017. Spontaneous collective action: peripheral mobilization during the Arab Spring. American Political Science Review, 111 (2), 379–403. doi:10.1017/S0003055416000769.
  • Sundberg, R. and Melander, E., 2013. Introducing the UCDP Georeferenced event dataset. Journal of Peace Research, 50 (4), 523–532. doi:10.1177/0022343313484347.
  • Sutton, J., Butcher, C.R., and Svensson, I., 2014. Explaining political Jiu-jitsu: institution-building and the outcomes of regime violence against unarmed protests. Journal of Peace Research, 51 (5), 559–573. doi:10.1177/0022343314531004.
  • Tin, D., et al., 2021. Terrorism-related chemical, biological, radiation, and nuclear attacks: a historical global comparison influencing the emergence of counter-terrorism medicine. Prehospital and Disaster Medicine, 36 (4), 399–402. doi:10.1017/S1049023X21000625.
  • UNMISS, 2020. Annual brief on violence affecting civilians, January–December 2020. Available from: https://unmiss.unmissions.org/sites/default/files/unmiss_annual_brief_violence_against_civilians_2020_final_for_publication.pdf [Accessed 30 June 2023].
  • UNMISS, 2021. Annual brief on violence affecting civilians, January–December 2020. Available from: https://www.ohchr.org/sites/default/files/2022-03/unmiss_hrd_annual_brief_2021.pdf [Accessed 30 June 2023].
  • UNMISS, 2022. Annual brief on violence affecting civilians, January–December 2021. Available from: https://www.ohchr.org/sites/default/files/Documents/Countries/SS/unmiss_annual_brief_violence_against_civilians_2020.pdf [Accessed 30 June 2023].
  • UNMISS, 2023a. Annual brief on violence affecting civilians, January–December 2022. Available from: https://www.ohchr.org/sites/default/files/Documents/Countries/SS/unmiss_annual_brief_violence_against_civilians_2020.pdf [Accessed 30 June 2023].
  • UNMISS, 2023b. Annual brief on violence affecting civilians, January–March 2023. Available from: https://reliefweb.int/report/south-sudan/unmiss-brief-violence-affecting-civilians-january-march-2023 [Accessed 30 June 2023].
  • Vogt, M., et al., 2015. Integrating data on ethnicity, geography, and conflict: the ethnic power relations data set family. Journal of Conflict Resolution, 59 (7), 1327–1342. doi:10.1177/0022002715591215.
  • Von Borzyskowski, I. and Wahman, M., 2021. Systematic measurement error in election violence data: causes and consequences. British Journal of Political Science, 51 (1), 230–252. doi:10.1017/S0007123418000509.
  • Voors, M.J., et al., 2012. Violent conflict and behavior: a field experiment in Burundi. American Economic Review, 102 (2), 941–964. doi:10.1257/aer.102.2.941.
  • Walter, B.F., 1997. The critical barrier to civil war settlement. International Organization, 51 (3), 335–364. doi:10.1162/002081897550384.
  • Weidmann, N.B., 2015. On the accuracy of media-based conflict event data. Journal of Conflict Resolution, 59 (6), 1129–1149. doi:10.1177/0022002714530431.
  • Weidmann, N.B., 2016. A closer look at reporting bias in conflict event data. American Journal of Political Science, 60 (1), 206–218. doi:10.1111/ajps.12196.
  • Yedioth Ahronot, n.d. Available from: https://www.ynet.co.il/home/0,7340L-8,00.html [Accessed 18 August 2023].
  • Zhukov, Y.M. and Baum, M.A., 2019. How selective reporting shapes inferences about conflict. Unpublished working paper, 1–7. doi: 10.1177/0022343319836697.
  • Zhukov, Y.M., Davenport, C., and Kostyuk, N., 2017. Introducing xSub: a new portal for cross-national data on subnational violence. Journal of Peace Research, 56 (4), 0022343319836697. https://doi.org/10.1177/0022343319836697.