ABSTRACT
COVID-19 surveillance across the United States is essential to tracking and mitigating the pandemic, but data representing cases and deaths may be impacted by attribute, spatial, and temporal uncertainties. COVID-19 case and death data are essential to understanding the pandemic and serve as key inputs for prediction models that inform policy-decisions; consistent information across datasets is critical to ensuring coherent findings. We implement an exploratory data analytic approach to characterize, synthesize, and visualize spatial-temporal dimensions of uncertainty across commonly used datasets for case and death metrics (Johns Hopkins University, the New York Times, USAFacts, and 1Point3Acres). We scrutinize data consistency to assess where and when disagreements occur, potentially indicating underlying uncertainty. We observe differences in cumulative case and death rates to highlight discrepancies and identify spatial patterns. Data are assessed using pairwise agreement (Cohen’s kappa) and agreement across all datasets (Fleiss’ kappa) to summarize changes over time. Findings suggest highest agreements between CDC, JHU, and NYT datasets. We find nine discrete type-components of information uncertainty for COVID-19 datasets reflecting various complex processes. Understanding processes and indicators of uncertainty in COVID-19 data reporting is especially relevant to public health professionals and policymakers to accurately understand and communicate information about the pandemic.
Acknowledgments
This research was made possible by the open source and open access efforts of the New York Times, Johns Hopkins University, and continuing public access of data from the CDC and USAFacts. Thanks to 1Point3Acres for continued data use permissions.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed here.
Data availability and code availability statement
Data used at time of writing and code notebook for this analysis are available at https://github.com/geoda/covid-uncertainty. An interactive version of key figures in this paper are available at https://observablehq.com/@uscovidatlas/data-uncertainty-national-us-covid-data. For further exploration of individual data differences for particular counties over time, please explore the interactive code notebook available at https://colab.research.google.com/drive/1iRKtRsNf-tBYJN6Jx_0bYgQpVY0p3onj.