Full article: A web-based analytical framework for the detection and visualization space-time clusters of COVID-19

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The COVID-19 pandemic has had a profound impact worldwide and continues to spread due to various mutations of the virus. Many governmental and nonprofit agencies at different levels have quickly developed COVID-19 dashboards to disseminate information on the pandemic to the public. However, most of these systems have mainly distributed “plain” information (e.g. cases, death counts, vaccination), and rarely provided insights that can be gained from spatiotemporal analyses, such as the detection of emerging clusters. The results from these analyses hold tremendous potential for health policymakers as they try to identify ways to slow down transmission. We present a web-based geographic framework to detect and visualize space-time clusters of COVID-19. Our tightly coupled framework integrates the prospective space-time scan statistics and local indicators of spatial association (LISA) with novel 2D and 3D interactive visuals in a cyber environment (http://159.223.164.41/app/). We illustrate the applicability of our approach using COVID-19 data for the continental US. Our framework is portable to other regions that may experience infectious diseases but is also flexible to handle data of different spatial and temporal granularities. This paper fits within an effort to integrate space-time analytics for the monitoring of infectious diseases in web environment, ultimately improving health surveillance systems.

KEYWORDS:

Introduction

The coronavirus disease 2019 (COVID-19) has continually spread worldwide after the World Health Organization declared it a pandemic on 11 March 2020. Even with the increase in the vaccination rate, the rapid emergence of novel variants still poses challenges as countries worldwide attempt to slow down the transmission among the population. As of October 2022, the Centers for Disease Control and Prevention reported more than 97 million cases and over one million deaths of COVID-19 in the US alone.

In a need to inform risk to the public and facilitate response among healthcare providers, a dashboard is a popular approach to visualize and share COVID-19 information worldwide. Ivanković et al. (Citation2021) evaluated 158 dashboards (in a total of 265 dashboards) from 53 countries by 30 June 2020, and 70.3% of them included maps as the main visualization features, typically used to illustrate indicators (e.g. cases, deaths, or rates). Fareed et al. (Citation2021) assessed COVID-19 dashboards for all state governments in the US and noted that three dashboard vendors (ESRI, Tableau, and Microsoft Power BI) were the most popular; the primary goal of these dashboards was to communicate and inform citizens on the intensity and geographic distribution of the disease. In that sense, most COVID-19 dashboards are designed to provide basic information that can be interacted with and consumed by the general public. At the same time, it is equally important for health policymakers to have access to COVID-19 surveillance systems that can integrate interactive results of spatial and space-time analysis results. As the pandemic continues to unravel, fully equipped COVID-19 dashboards will need to be integrated into routine health surveillance systems instead of temporary monitoring and communication tools early in the pandemic (Barbazza et al., Citation2021).

Several COVID-19 studies have implemented spatial or space-time cluster detection methods to understand COVID-19 transmission. In epidemiology, spatial or space-time cluster detection is an essential method that can guide public health interventions (Kirby et al., Citation2017; Pfeiffer et al., Citation2008). Desjardins et al. (Citation2020) were the first to use the prospective space-time scan statistic (Kulldorff, Citation2001) to detect and evaluate emerging clusters of COVID-19 in the US, and many studies followed a similar approach for the monitoring of COVID-19 in various countries or regions, such as in Brazil (Gomes et al., Citation2020; Martines et al., Citation2021), Hong Kong (Kan et al., Citation2021), Bangladesh (Masrur et al., Citation2020), and Spain (Rosillo et al., Citation2021). Several studies have used other methods to detect geographical clusters at different time. For instance, Siljander et al. (Citation2022) used two analyses of local spatial autocorrelation methods, the Local Indicators of Spatial Association (LISA) (Anselin, Citation1995) and the Getis-Ord Gi* statistic (Getis & Ord, Citation2010), to identify hot spot areas of COVID-19 in Helsinki, Finland, and then applied space-time scan statistic to detect clusters of high relative risk. In fact, according to (Lan & Delmelle, Citation2022), the three most popular algorithms for detecting space-time clusters of infectious diseases were the space-time scan statistic, global Moran’s I, and LISA, in that order.

Unlike many COVID-19 dashboards that do not integrate explorative spatial data analysis (ESDA) nor statistics for cluster detection, several studies have attempted to deploy these methods into a web-based surveillance system for COVID-19. Hohl et al. (Citation2020) developed a web application of daily prospective space-time clusters in the US, while Rosillo et al. (Citation2021) created a similar application for monitoring COVID-19 in Spain during the summer of 2020. However, these two websites are no longer updated and are limited in terms of visualization, as they merely show the size of the clusters. One notable exception is Kolak et al. (Citation2021), who integrated the LISA statistic to track hotspots of COVID-19 in the US at the daily level. However, the visualization from this application only provides the spatial extent of clusters from the LISA algorithm, which could mislead the interpretation of results. For instance, all the counties within “High-High” clusters may be interpreted as having the same high-risk level in the transmission, which is hardly the case. Other information, such as the significance of clusters (also known as p-value), brings in additional information important to decision-making.

Another crucial issue that is not receiving enough attention in the literature is how to visualize space-time clusters. Many COVID-19 dashboards are not able to incorporate the temporal dimension other than through an animation (Lan et al., Citation2021). With that approach, the maps mostly reflect the distribution of COVID-19 counts of cases, deaths, or rates at a specific date, also known as a time slice (see Gomes et al., Citation2020; Kan et al., Citation2021; Masrur et al., Citation2020; Rosillo et al., Citation2021). And when the animation spans over long period of time, the amount of space-time information provided may be too dense, leading to cognitive overload (Harrower, Citation2007). Several studies have incorporated 3D visualization to represent the space-time clusters, where the z-axis represents time. Hohl et al. (Citation2022) used 3D voxels to represent significant clusters of dengue fever in Colombia, while Desjardins et al. (Citation2022) used the space-time cylinder to represent the size of space-time clusters of dengue fever in that area. Similarly, Guchhait et al. (Citation2023) and Purwanto et al. (Citation2021) used space-time bins to represent time-stamp of hot/cold spots of COVID-19. However, none of these studies implemented 3D visualization into a web-based platform.

Ideally, a COVID-19 surveillance system designed for health policymakers needs to support spatiotemporal analysis for cluster detection and robust visualization to facilitate the monitoring of the disease on a regular (e.g. daily) basis. However, most spatial and space-time clustering detection algorithms (e.g. the scan statistic and local spatial autocorrelation methods) require an application (e.g. SaTScan and GeoDa) or libraries (through R, for instance) to estimate the presence and magnitude of clusters, and commercial GIS software are weak in supporting interactive, space-time visualization of disease clusters. Thus, a tight-coupling system with different modules connected into one system is attractive. Using this approach, epidemiologists or health decision-makers can utilize spatiotemporal analysis and visualization to uncover important underlying patterns without the need to go from one software to another (Delmelle et al., Citation2011). To the best of our knowledge, a tight coupling system for space-time clustering detection and visualization is not available.

To fill this gap, this study proposes a web-based geographic framework for the detection and visualization of space-time clusters for infectious diseases. To demonstrate our system, we develop an automatic surveillance system that uses the prospective space-time scan statistics and the LISA algorithm at the county level in the continental US. The system retrieves daily updated COVID-19 data. We further elaborate on the objectives of our tightly coupled system, namely: (1) to implement automatic and customized space-time clustering detection for a given geography and specific time range and (2) to generate novel 2D and 3D visual features of space-time clustering; (3) to develop a tight coupling system that incorporates daily data updated and components for the objective one and two. This system is named US COVID-19 YuTu and is described further in the sections below. Our approach is innovative in that it utilizes open-source tools to seamless integrate data analysis and visualization in a tightly coupled framework. Further, it utilizes cutting-edge web technologies to streamline the development of the analysis and geovisualization process.

Methodology

Data

In this study, the COVID-19 Data Repository prepared by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Dong et al., Citation2020)Footnote¹ is used to extract daily US COVID-19 data (JHU CSSE COVID-19 Data) at both state and county levels. Halpern et al. (Citation2021) claimed the dataset from JHU is one of the closest to the one from CDC when compared to other commonly used COVID-19 datasets.

For this paper, we use data from 22 January 2020 – the date CDC confirmed the first US coronavirus case in Washington state – to the most recent date that our website has been updated. is an example of daily case data retrieved on 18 October 2021. Daily COVID-19 data are extracted and updated into corresponding databases. Attributes of the COVID-19 data include federal information processing standards code (FIPS), county name, state name, date, latitude, longitude, counts of confirmed cases, and counts of deaths. In addition, the latest available population figures and boundaries are retrieved from the US Census Bureau. The population data for both 48 states and corresponding counties are sourced from the 2019 Community Survey (ACS) 5-year (2015–2019) estimates of the resident population (see Bureau, Citation2019), and the boundaries data used is from the 2020 TIGER/Line shapefiles (see Bureau, Citation2020). Rates are calculated by dividing the number of cases – averaged over seven days – in a geographic region by the population in that region.

Table 1. Example of daily case data from JHU.

Download CSV Display Table

Methodology

This section introduces the framework and workflow for detecting and visualizing space-time clusters of infectious diseases using COVID-19 data in the conterminous US. The framework is based on a “tight-coupling” system with customized spatial and temporal settings. It incorporates data extraction capabilities, clustering detection, and geovisualization in a web-based GIS environment, and uses a server-side (running procedure on the server) and a client-side (running procedure on the user’s web browser), as illustrated in . Our system is accessible at http://159.223.164.41/app/.

Figure 1. The framework of the YuTu system to detect space-time clustering of COVID-19.

The server side is articulated around three types of servers for different purposes: a database server, an interface server, and a method server; each server in this system is represented as a docker container. The database server stores all the relevant input data and output results. Disease information at the county level is extracted daily from the JHU CSSE COVID-19 database and imported into our COVID-19 database using a processing script written in Python, which removes unnecessary attributes. The other two datasets are population and cartographic boundaries from the US Census Bureau. The disease detection algorithms are processed on the methods server and generated results. This process is repeated every night after the data is automatically retrieved and processed. The interface server connects the client and database sides for visualizations.

The client side is the graphical user interface (GUI) of the COVID prototype. It consists of basic online map functions, including zoom, pan, etc. The default homepage contains the distribution of US COVID-19 space-time clusters at the county level. The geovisualization of space-time clusters is displayed in both two and three dimensions. Our framework is articulated around three modules implemented on individual servers: a) an analysis module (Method Server), b) a visualization module (Interface Server), and c) a data processing module (Database Server).

a) Analysis module

Among various methods of disease space-time detection, the local indicators of spatial association (LISA) and space-time scan tests are two popular methods.

Local indicators of spatial association-LISA

Anselin (Citation1995) introduced LISA as decomposition of global indicators of each individual observation, which can detect significant local clustering around an individual location and recognize the spatial nonstationary with outliers. For a region i, the local indicators of spatial association $I_{i}$ defined as:

(1)

I_{i} = \frac{(n - 1) (x_{i} - \underline{x})}{\sum_{j = 1}^{n} {(x_{j} - \overset{ˉ}{x})}^{2}} \sum_{j = 1}^{n} w_{ij} (x_{j} - \overset{ˉ}{x})

(1)

where $x_{i}$ is the attribute of the variable of interest (here, the disease rate) in region i, $\overset{ˉ}{x}$ is the mean of $x_{i}$ (i = 1 …, n), $w_{ij}$ is the spatial weight between regions i and j (typically derived from an adjacency matrix). The results of the LISA algorithm group regions into different groups (e.g. High-High, Low-Low, High-Low, Low-High) with an associated p-value. When a location is categorized as High-High/Low-Low, it denotes a region exhibiting high/low rates, surrounded by other regions with similar, high values/low values. A High-Low category characterizes a county with high rates, surrounded by low rates. This could be indicative of an area experiencing a rapid increase in cases that is more than what would be expected, while surrounding regions do not experience such rapid growth. Low-High represents outlier regions of low value surrounded by high values. The LISA statistic (Anselin, Citation1995) is purely a cross-sectional method that does not take the temporal information into account. There are ample examples of this repetitive approach to identify clusters of infectious diseases (see Ghosh & Cartone, Citation2020; Sugumaran et al., Citation2009). However, the LISA statistic is likely to lead to the discovery of false negatives and false positives.

Space-time scan statistics

Kulldorff and Nagarwalla (Citation1995) introduced the spatial scan statistic as a test for detecting clusters by assessing the likelihood ratios of events inside and outside of circular scanning windows, adjusted for the density of the population. The radii of the windows are varied continuously from zero to the maximum bandwidth, e.g. to a size containing a certain percentage of the population. The window with the maximum likelihood ratio is defined as a cluster, and only regions located within this window are considered to “belong” to that cluster.

Kulldorff et al. (Citation1998) further expand the spatial scan statistics to incorporate the temporal dimension by adding the circle’s height to represent the time ().

Figure 2. The illustration of space-time scan statistics.

As the statistic is designed to detect clusters, the null hypothesis H0 is that the risk of infection within a cylinder Z is similar to this risk outside the cylinder; the alternative hypothesis Ha is that the risk of infection within a cylinder Z is larger than this risk outside this cylinder. Accordingly, the expected number of cases ( $μ$ ) within the scan window based on the null hypothesis is

(2)

μ = p \times \frac{N}{P}

(2)

with $p$ the population in the cylinder, $N$ the total number of cases and P the total population within the window. Thus, the maximum likelihood ration to identify space-time clusters within the space-time scan window is defined as:

(3)

\frac{L (Z)}{L_{0}} = \frac{{(\frac{n_{Z}}{μ (Z)})}^{n_{Z}} {(\frac{N - n_{Z}}{N - μ (Z)})}^{N - n_{Z}}}{{(\frac{N}{μ (T)})}^{N}}

(3)

where $L (Z)$ is the likelihood function for the cylinder Z, and $L_{0}$ is the likelihood for the null hypothesis H0, $n_{Z}$ is the number of cases in the cylinder Z, $μ (Z)$ is the number of expected cases in cylinder Z, and $μ (T)$ is the total number of expected cases within all time periods within the scan window.

We reported 1) the relative risk for each location (RR of the location), defined as the estimated risk (observed/expected) within the location divided by the estimated risk outside the location, and 2) the relative risk for the cluster that the location belonged to (RR of the cluster), representing the estimated risk divided by the risk outside of the cluster. For instance, if the RR of the county is 1.4 and its RR of the cluster is 2.5, then this county is 1.4 times more likely to be exposed to COVID-19 while it also belongs to a cluster that is 2.5 times more likely than outside this cluster.

The space-time scan statistic on the other hand “scans” the data using a cylindrical window in both space and time (Kulldorff et al., Citation2005). Also, unlike the LISA statistics, space-time scan statistics are not restricted by administrative boundaries (Naish & Tong, Citation2014), because the scan statistic searches for clusters beyond the so-called “adjacency matrix,” which is central to the LISA statistic.

Implementation of the algorithms

The LISA and space-time scan statistics are automated and conducted every night when the JHU data are updated and retrieved. The LISA algorithm is repeatedly conducted within the system to detect geographic clusters for each day. An open-source, cross-platform python library of spatial analysis functions, including LISA – known as pygeoda, is implemented and integrated for the temporal repetition of the LISA statistic. We conduct LISA on the incidence rate (7-day average cases divided by population) using a queen contiguity matrix. For the space-time cluster detection, we run SaTScan in a batch mode approach using a discrete Poisson prospective test with a maximum spatial cluster size as 50% of the population at risk and a maximum temporal cluster size of 50 days. The maximum spatial cluster size (50%) is the default setting; the maximum temporal cluster size is based on our calibration with different values (n = 1–59 days) for this variable using the dataset from October 2021 to December 2021, that maximizes the log likelihood ratio. Input files, parameter files, and batch files for the analysis in SaTScan were generated using python scripts. Both outputs from LISA and space-time scan statistics are stored in separated databases.

b) Visualization module

In the YuTu system, several visualizations are implemented. One of them is the animated bivariate map which displays two variables simultaneously (see ). This visualization displays results from the space-time scan statistics, using animated bivariate maps to visualize different cluster detection results (Lan et al., Citation2021). The two presented variables are the relative risk when the location with a cluster and the relative risk for this location on that day. In this interactive system, the values for each variable can be displayed by hovering over the county.

Figure 3. The animated bivariate map of space-time cluster using the space-time scan statistics.

We also complement our system with LISA results (see ). The two variables for the LISA map are the p-value and the cluster group to which a county belongs. The p-value is ranged from 0.05, 0.01, 0.001, and 0.0001, and the cluster group is ranged from Low-Low, Low-High, High-Low, to High-High. From the two maps, some areas are detected as clusters in both maps, while some regions are detected only on one map.

Figure 4. The animated bivariate map of space-time cluster using space-time scan statistic (left) and LISA (right) around 11 August 2022.

Although the animated bivariate map has the advantage of showing the dynamic of cluster distribution each day, it is hard to memorize the overall patterns. To complement that, other visualization solutions are incorporated to present the data in various ways, and these methods include the spiral map, the time chart, and the 3D space-time cube.

In the spiral map (Weber et al., Citation2001), each bar represents the average daily relative risk at the state level (see ), using the size (length) variable, and reinforced by color (the darker and more extended the bar, the bigger its average daily relative risk). When one county is selected in the bivariate map, the spiral map is switched to the spiral of the state that county belongs to.

Figure 5. The animated bivariate map of space-time cluster using the space-time scan statistic (left) and a spiral map reflecting the average relative risk for each conterminous US state (right).

We use the TimeChart to show the results of the bivariate map in a static and linear fashion (). When one or more counties are selected on the bivariate map, the TimeChart displays the chart for the selected counties. The first chart in red represents the county’s relative risk (RR of the location), while the second chart in blue represents 1) the RR of the cluster that the county belonged to, and 2) the 7-day average cases for this county. In this way, animated results are linked with static and linear results to help discovering the dynamic patterns in space and time.

Figure 6. The animated bivariate map of space-time cluster using space-time scan statistic (top) and the TimeChart of different variables (bottom).

We also develop 3D web-based geovisualization () using multiple JavaScript libraries (3D Scatter Plots and Data-Driven Documents (D3) (Bostock et al., Citation2011)). In this 3D plot, the x and y represent the latitude and longitude of the centroid for each county, while the z-axis represents the time. Finally, each dot is color-coded to reflect the value of its relative risk. The system also incorporates a filter that essentially masks to focus on regions is flexible given that a filter of the relative risk is offered to show more or fewer points.

Figure 7. The 3D space-time cubes of clusters with displaying the relative risk of the cluster (left) and relative risk of the county (right).

c) Data processing module

The data processing module contains daily data retrieving and processing, data analysis, and storing, and these steps are connected to the WebGIS environment. All the data are stored in databases created and managed using PostGIS, an open-source software program that supports geographic objects.

Daily retrieved data are processed and imported into the database on the server. Python scripts are used for daily data retrieving, processing, and space-time cluster detection for all counties. Population and boundaries data are stable in years and are stored as separate databases.

Case study

We illustrate our system to monitor the variation of COVID-19 cases across the conterminous US. As multiple visual components display different results, we introduce several case studies as examples to show potential ways to use this system by combining visualizations. The animated bivariate map is intended to indicate the daily relative risks, which are the basic information for all other visuals.

Four waves of COVID-19 outbreaks

Four waves are identifiable from , the 7-day average cases in the US since the beginning of the pandemic. We select four time intervals around the peak of each wave, that is 21 July 2020 (peak 1), 4 January 2021 (peak 2), 3 September 2021 (peak 3), and 15 January 2022 (peak 4).

Figure 8. The four waves and their estimated peak dates using the data from WHO coronavirus (COVID-19) dashboard (World Health Organization, Citation2020).

shows the results of the SaTScan algorithm a) and the LISA statistic b) for the first peak. From the SaTScan results, one large cluster covered several counties in the south and center of the US. From the LISA results however, high-high and high-low clusters were found in the south, southeast and the southwest, while many counties in the central US belonged to groups of low-low clusters. Also, several counties in Washington state and Idaho were classified as high-high clusters by the LISA method, yet SaTScan did not detect these counties.

Figure 9. The animated bivariate maps at peak 1 using a) the prospective space-time scan statistics and b) LISA.

shows the results of SaTScan a) and LISA b) for the second peak. The SaTScan results show that the relative risk of the clusters are not as high as during the first peak (). One cluster with higher relative risk was in the southwestern section of the US, including the southern part of California, the western part of Arizona, and several counties close to the Nevada border. Most counties in the high-high LISA clusters coincided with the one from SaTScan.

Figure 10. The animated bivariate maps at peak 2 using a) the prospective space-time scan statistics and b) LISA.

Interestingly, we also found clusters with higher relative risk one month earlier than with the second peak. On 26 November 2020, SaTScan () detected one very large cluster covering several counties in the northern and central parts of the US. Most counties within this cluster had higher relative risks (colored in dark purple) compared with the rest of counties (colored in pink). As to LISA results (), high-high and high-low were also found in the north and center of the US.

Figure 11. The animated bivariate maps on November 26th, 2020, 40 days before the second peak using a) the prospective space-time scan statistics and b) LISA.

On the third peak (), multiple small clusters were detected with the SaTScan algorithm (), and the cluster with the highest relative risk on that day included the entire state of Florida and many counties from neighboring states. Other clusters were found in the western, central, southern sections of the US. Many counties that were detected using SaTScan were also classified as high-high clusters with the LISA algorithm (). Clusters with higher relative risk were detected one month before the third peak ().

Figure 12. The animated bivariate maps a) & b) during the third peak, and c) & d) the maps on July 24th, 2021.

During the interval that covered peak 4, only one cluster covering many states in the east of the US was detected by the SaTScan results (), while high-high cluster were distributed across the US according to LISA results (). When looking at the results one month ago of peak 4, both SaTScan and LISA () suggested clusters of higher relative risks or high-high values detected in the northeast of the US.

Figure 13. Animated bivariate maps at a) & b) peak 4 and c) & d) the maps on December 16th, 2021, 31 days before peak 4.

Since the two bivariate maps/results are generated from fundamentally different algorithms, it is not uncommon that their results may differ. The left SaTScan map displays true space-time clusters that consider data from the previous 50 days, while the Figure on the right (LISA map) displays spatial clusters based on the current date (that process is repeated on a daily basis). We argue that the cluster obtained using the SaTScan algorithm better reflects the true pattern in space and time, while the LISA algorithm may lack this characteristic. However, clusters obtained by the LISA algorithm can provide a signal of the cluster exhibited using SaTScan. Furthermore, the left map could also be used to illustrate the emergence or decline of a cluster, whereas the right map may suddenly fail to display a previously existing cluster.

Comparing situations among counties

We picked three counties with relatively high population density in three different states: Los Angeles County in California, Miami-Dade County in Florida, and Queens County in New York (see ). According to the TimeChart functionality, all of them experienced a very high number of cases on 11 January 2022, around the peak day of the fourth wave. The specific relative risk for each county was 1.1 with 38,007 cases (Los Angeles County), 2.54 with 15,777 cases (Miami-Dade County), and 2.28 with 11,896 cases (Queens County), respectively. Two counties (Miami-Dade and Queens County) belonged to the same cluster with an average cluster relative risk of 3.7.

Figure 14. a) & b) The animated bivariate maps of selected three counties on the date that all of them reported most cases c) the TimeChart of 7-day average cases, and d) the timechart of relative risk of clusters.

Interpreting waves using the 3D space-time cube

We also examined the third wave from June 2020 to December 2020 using a 3D space-time cube (). The left figures show the extent of the space-time clusters by displaying the relative risk of each cluster, while its right counterparts illustrate the relative risk distribution of counites with clusters in space and time. The value can be filtered to show more or fewer counties – here, represented as a point at their centroid – according to a relative risk threshold (counties with a RR lower than a certain threshold are made transparent).

Figure 15. The 3D space-time cubes during the third wave from June 2020 to December 2020 with different threshold of relative risk.

When the relative risk is equal to or larger than 2, it is clear that there is a shift from the center and some counties in the east to the northwest of the US, and this change happened around September, which was around the peak time of the third wave. By increasing the threshold to higher values, the results are clearer, suggesting that the relative risk of clusters and counties was higher before September.

Scalability to different levels of granularity

To explore the merits of our system and its scalability, we tested it as a case study for the state of Wisconsin on four different scales. According to the spiral map, there were two periods when the average relative risk of Wisconsin was high (). We selected three months from one of these two periods, from 1 December 2021, to 28 February 2022, when the value peaked and then decreased.

Figure 16. The animated bivariate (left) and spiral map (right) of Wisconsin for 20 January 2022 (the state of Wisconsin is highlighted by a red circle).

To compare the results at different levels of granularity, we included scales from the county levels with all other counties in the continental US, the county level with the state of Wisconsin, the zip code from the state of Wisconsin, and the census tract level from the state as well (see ).

Figure 17. The animated bivariate map of Wisconsin at multiple levels on 1 December 2021. It included scales from a) the county levels with all other states, b) the county level with one state, c) the zip code level, and d) the census tract level.

Figure 18. The animated bivariate map of Wisconsin at multiple levels on 5 January 2022. It included scales from a) the county levels with all other states, b) the county level with one state, c) the zip code level, and d) the census tract level.

Figure 19. The animated bivariate map of Wisconsin at multiple levels on 28 February 2021. It included scales from a) the county levels with all other states, b) the county level with one state, c) the zip code level, and d) the census tract level.

Most counties in Wisconsin were detected as clusters on 1 December 2021, when using the counties of the entire US as input for SaTScan. When applying SaTScan at the different levels of granularity within the state, most regions were detected as clusters as well, with similar results. On 5 January 2022 () however, clusters could only be detected when using data solely from Wisconsin. On 28 February 2022, clusters were only detected at the zip code level. The last two examples suggest that SaTScan is sensitive to the granularity of the input data (zip code, census tract or county), but also the spatial extent of the data (continental versus state level).

Discussion and conclusion

US COVID-19 YuTu is a health surveillance system based on space-time cluster detection analysis and visualization. It is implemented with near real-time monitoring and novel visual features. The system emphasizes spatiotemporal analysis and representation in various ways. To illustrate the framework, the prospective space-time scan statics and the LISA algorithm are applied to detect space-time clusters on a daily basis, and our approach is flexible so that other statistical methods can be integrated. As for the visualization, the animated bivariate map of those two methods presents all the available space-time clusters each day, together with the TimeChart for linear representation and the spiral map for a higher-level summary to complement the weakness of processing a large set of dynamic information. The 3D space-time cube offers another way to explore the same results with the ability to filter out data on both time and relative risk of space-time clusters. Our framework is flexible to handle multiple levels of granularity, as illustrated in our case study.

Despite these strengths, our system also suffers from some limitations. First, the calibration of the parameters used in the space-time analysis may require fine-tuning with epidemiologists. Different spatial and temporal search radiuses in SaTScan or different definitions of spatial contiguity may lead to the detection of hidden clusters; as such the choice of the parameters may impact the hypothesis that are built in discovering these clusters. For instance, when attempting to identify space-time clusters, we only took the last 50 days into account. This choice attempted to balance stability in the results and computational demand of the SaTScan clustering algorithm (too small of a window may not have allowed to detect a signal before an outbreak). Second, we did not consider the uncertainty that is inherent to the ACS population data. Most COVID-19 dashboards in the US used 5 years rolling averages from the 2019 ACS to estimate disease rates since the collection of the 2020 census was severely delayed. Along these lines, we did not use Bayesian mapping on the rates, which are known to be unstable in low populated areas (Delmelle et al., Citation2022; Kirby et al., Citation2017). Fourth, because of limited sources of available open access datasets, this system only considered cases and deaths into account. However, our framework is flexible to integrate other sources of data (e.g. hospitalization or vaccination rate). Fifth, all modules are currently contained within a single server. Future research should allow to scale up our system to handle large number of end users and distribute the modules across multiple servers. This approach would also help ensure data security, particularly in situations where users need to upload their own data sources (Delmelle et al., Citation2012). By separating the modules into independent servers, the framework can provide a more secure and scalable solution for managing infectious disease data. Finally, we did not evaluate the system, which is necessary to understand how it can be used practically by health policymakers in a decision-making context.

This study presents a framework of novel surveillance systems that automatically detects space-time clusters on a fine, daily basis. It is illustrated using COVID-19 cases at the county level for the 48 conterminous US states. This COVID-19 surveillance system integrates two popular methods and a various of novel interactive visualization features in 2D and 3D within an open-source web-based GIS environment. Although the prototype focused on 48 states in the US using the space-time scan statistics and the LISA algorithm, because it is open source, it can be applied to any other countries or regions at multiple scales, and integrate other clustering algorithms. Expectedly, the results using one clustering algorithm (e.g. SaTScan) may be different from the ones using another algorithm (e.g. LISA); as such caution should be applied when using these maps in a policy decision-making. A solid understanding of the algorithms embedded in our system (SaTScan, LISA) and awareness of data uncertainty is important to make accurate decisions. This underscores the importance to train public health specialists to the mechanism of spatial cluster detection algorithms. We hope that system like this one can provide a framework to assist health policymakers in making decisions, such as slowing down the spread of COVID-19 in the US. On the other hand, we hope to inspire others researchers to develop health surveillance systems that can incorporate spatiotemporal data and analysis.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The data and codes that support the findings of this study are available on GitHub under the identifier https://github.com/YuLanGeoHealth/US-Covid-19-YuTu.

Notes

1. As of 10 March 2023, the source has stopped collecting and reporting global COVID-19 data.

References

Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x
Web of Science ®Google Scholar
Barbazza, E., Ivanković, D., Wang, S., Gilmore, K. J., Poldrugovac, M., Willmington, C., Larrain, N., Bos, V., Allin, S., Klazinga, N., & Kringos, D. (2021). Exploring changes to the actionability of COVID-19 dashboards over the course of 2020 in the Canadian context: Descriptive assessment and expert appraisal study. Journal of Medical Internet Research, 23(8), e30200. https://doi.org/10.2196/30200
PubMed Web of Science ®Google Scholar
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3 data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301–2309. https://doi.org/10.1109/TVCG.2011.185
PubMed Web of Science ®Google Scholar
Bureau., U. S. C. (2019). Total population, American community survey 5-year estimates. Retrieved March 19, 2023, from https://data.census.gov/table?q=2019+Community+Survey+(ACS)+5-year+estimates+&t=Populations+and+People&g=010XX00US$0400000&d=ACS+5-Year+Estimates+Detailed+Tables&tid=ACSDT5Y2019.B01003
Google Scholar
Bureau, U. S. C. (2020). TIGER/line shapefiles and TIGER/line files. Retrieved March 19, 2023, from https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
Google Scholar
Delmelle, E., Delmelle, E. C., Casas, I., & Barto, T. (2011). HELP: A GIS-based health exploratory analysis tool for practitioners. Applied Spatial Analysis and Policy, 4(2), 113–137. https://doi.org/10.1007/s12061-010-9048-2
Web of Science ®Google Scholar
Delmelle, E. M., Desjardins, M. R., Jung, P., Owusu, C., Lan, Y., Hohl, A., & Dony, C. (2022). Uncertainty in geospatial health: Challenges and opportunities ahead. Annals of Epidemiology, 65, 15–30.
PubMed Web of Science ®Google Scholar
Desjardins, M. R., Hohl, A., & Delmelle, E. M. (2020). Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters. Applied Geography, 118, 102202. https://doi.org/10.1016/j.apgeog.2020.102202
PubMed Web of Science ®Google Scholar
Desjardins, M., Hohl, A., Delmelle, E., & Casas, I. (2022). Identifying and visualizing space-time clusters of vector-borne diseases. In Geospatial technology for human well-being and health (pp. 203–217). Springer International Publishing. https://doi.org/10.1007/978-3-030-71377-5_11
Google Scholar
Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5), 533–534. https://doi.org/10.1016/S1473-3099(20)30120-1
PubMed Web of Science ®Google Scholar
Fareed, N., Swoboda, C. M., Chen, S., Potter, E., Wu, D. T., & Sieck, C. J. (2021). US COVID-19 state government public dashboards: An expert review. Applied Clinical Informatics, 12(2), 208–221. https://doi.org/10.1055/s-0041-1723989
PubMedGoogle Scholar
Getis, A., & Ord, J. K. (2010). The analysis of spatial association by use of distance statistics. In L. Anselin & S. J. Rey (Eds.), Perspectives on spatial data analysis (pp. 127–145). Springer Berlin Heidelberg.
Google Scholar
Ghosh, P., & Cartone, A. (2020). A Spatio‐temporal analysis of COVID‐19 outbreak in Italy. Regional Science Policy & Practice, 12(6), 1047–1062. https://doi.org/10.1111/rsp3.12376
Web of Science ®Google Scholar
Gomes, D.-S., Andrade, L. A., Ribeiro, C. J. N., Peixoto, M., Lima, S., Duque, A., Cirilo, T. M., Góes, M., Lima, A., Santos, M., Araújo, K. C. G. M., & Santos, A. D. (2020). Risk clusters of COVID-19 transmission in northeastern Brazil: Prospective space–time modelling. Epidemiology & Infection, 148. https://doi.org/10.1017/S0950268820001843
Web of Science ®Google Scholar
Guchhait, S., Das, S., Das, N., & Patra, T. (2023). Mapping of space–time patterns of infectious disease using spatial statistical models: A case study of COVID-19 in India. Infectious Diseases, 55(1), 27–43. https://doi.org/10.1080/23744235.2022.2129778
PubMed Web of Science ®Google Scholar
Halpern, D., Lin, Q., Wang, R., Yang, S., Goldstein, S., & Kolak, M. (2021). Dimensions of uncertainty: A spatiotemporal review of five COVID-19 datasets. Cartography and Geographic Information Science, 1–22. https://doi.org/10.1080/15230406.2021.1975311
Web of Science ®Google Scholar
Harrower, M. (2007). The cognitive limits of animated maps. Cartographica: The International Journal for Geographic Information and Geovisualization, 42(4), 349–357. https://doi.org/10.3138/carto.42.4.349
Google Scholar
Hohl, A., Delmelle, E. M., Desjardins, M. R., & Lan, Y. (2020). Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States. Spatial and Spatio-Temporal Epidemiology, 34, 100354. https://doi.org/10.1016/j.sste.2020.100354
PubMed Web of Science ®Google Scholar
Hohl, A., Tang, W., Casas, I., Shi, X., & Delmelle, E. (2022). Detecting space–time patterns of disease risk under dynamic background population. Journal of Geographical Systems, 24(3), 389–417. https://doi.org/10.1007/s10109-022-00377-7
PubMed Web of Science ®Google Scholar
Ivanković, D., Barbazza, E., Bos, V., Fernandes, Ó. B., Gilmore, K. J., Jansen, T., Kara, P., Larrain, N., Lu, S., Meza-Torres, B., Mulyanto, J., Poldrugovac, M., Rotar, A., Wang, S., Willmington, C., Yang, Y., Yelgezekova, Z., Allin, S., Klazinga, N., & Kringos, D. (2021). Features constituting actionable COVID-19 dashboards: Descriptive assessment and expert appraisal of 158 public web-based COVID-19 dashboards. Journal of Medical Internet Research, 23(2), e25682. https://doi.org/10.2196/25682
PubMed Web of Science ®Google Scholar
Kan, Z., Kwan, M. P., Huang, J., Wong, M. S., & Liu, D. (2021). Comparing the space‐time patterns of high‐risk areas in different waves of COVID‐19 in Hong Kong. Transactions in GIS, 25(6), 2982–3001. https://doi.org/10.1111/tgis.12800
PubMed Web of Science ®Google Scholar
Kirby, R. S., Delmelle, E., & Eberth, J. M. (2017). Advances in spatial epidemiology and geographic information systems. Annals of Epidemiology, 27(1), 1–9. https://doi.org/10.1016/j.annepidem.2016.12.001
PubMed Web of Science ®Google Scholar
Kolak, M., Li, X., Lin, Q., Wang, R., Menghaney, M., Yang, S., & Anguiano, V., Jr. (2021). The US COVID atlas: A dynamic cyberinfrastructure surveillance system for interactive exploration of the pandemic. Transactions in GIS, 25(4), 1741–1765. https://doi.org/10.1111/tgis.12786
PubMed Web of Science ®Google Scholar
Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A (Statistics in Society), 164(1), 61–72. https://doi.org/10.1111/1467-985X.00186
Web of Science ®Google Scholar
Kulldorff, M., Athas, W. F., Feurer, E. J., Miller, B. A., & Key, C. R. (1998). Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of Public Health, 88(9), 1377–1380. https://doi.org/10.2105/ajph.88.9.1377
PubMed Web of Science ®Google Scholar
Kulldorff, M., Heffernan, R., Hartman, J., Assunçao, R., Mostashari, F., & Blower, S. M. (2005). A space–time permutation scan statistic for disease outbreak detection. Plos Medicine, 2(3), e59. https://doi.org/10.1371/journal.pmed.0020059
PubMed Web of Science ®Google Scholar
Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810. https://doi.org/10.1002/sim.4780140809
PubMed Web of Science ®Google Scholar
Lan, Y., & Delmelle, E. (2022). Space-time cluster detection techniques for infectious diseases: A systematic review. Spatial and Spatio-Temporal Epidemiology, 44, 100563. https://doi.org/10.1016/j.sste.2022.100563
PubMed Web of Science ®Google Scholar
Lan, Y., Desjardins, M. R., Hohl, A., & Delmelle, E. (2021). Geovisualization of COVID-19: State of the art and opportunities. Cartographica: The International Journal for Geographic Information and Geovisualization, 56(1), 2–13. https://doi.org/10.3138/cart-2020-0027
Google Scholar
Martines, M. R., Ferreira, R. V., Toppa, R. H., Assunção, L., Desjardins, M. R., & Delmelle, E. M. (2021). Detecting space–time clusters of COVID-19 in Brazil: Mortality, inequality, socioeconomic vulnerability, and the relative risk of the disease in Brazilian municipalities. Journal of Geographical Systems, 23(1), 7–36. https://doi.org/10.1007/s10109-020-00344-0
PubMed Web of Science ®Google Scholar
Masrur, A., Yu, M., Luo, W., & Dewan, A. (2020). Space-time patterns, change, and propagation of COVID-19 risk relative to the intervention scenarios in Bangladesh. International Journal of Environmental Research and Public Health, 17(16), 5911. https://doi.org/10.3390/ijerph17165911
PubMed Web of Science ®Google Scholar
Naish, S., & Tong, S. (2014). Hot spot detection and spatio-temporal dynamics of dengue in Queensland, Australia. In Proceedings of the ISPRS Technical Commission VIII Symposium [International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences-ISPRS Archives, Volume XL-8] (pp. 197–204). International Society of Photogrammetry and Remote Sensing (ISPRS).
Google Scholar
Pfeiffer, D., Robinson, T. P., Stevenson, M., Stevens, K. B., Rogers, D. J., & Clements, A. C. (2008). Spatial analysis in epidemiology (Vol. 142). Oxford University Press Oxford. https://doi.org/10.1093/acprof:oso/9780198509882.001.0001
Google Scholar
Purwanto, P., Utaya, S., Handoyo, B., Bachri, S., Astuti, I. S., Utomo, K. S. B., & Aldianto, Y. E. (2021). Spatiotemporal analysis of COVID-19 spread with emerging hotspot analysis and space–time cube models in east java, Indonesia. ISPRS International Journal of Geo-Information, 10(3), 133. https://doi.org/10.3390/ijgi10030133
Web of Science ®Google Scholar
Rosillo, N., Del-Águila-Mejía, J., Rojas-Benedicto, A., Guerrero-Vadillo, M., Peñuelas, M., Mazagatos, C., Segú-Tell, J., Ramis, R., & Gómez-Barroso, D. (2021). Real time surveillance of COVID-19 space and time clusters during the summer 2020 in Spain. BMC Public Health, 21(1), 961. https://doi.org/10.1186/s12889-021-10961-z
PubMed Web of Science ®Google Scholar
Siljander, M., Uusitalo, R., Pellikka, P., Isosomppi, S., & Vapalahti, O. (2022). Spatiotemporal clustering patterns and sociodemographic determinants of COVID-19 (SARS-CoV-2) infections in Helsinki, Finland. Spatial and Spatio-Temporal Epidemiology, 41, 100493. https://doi.org/10.1016/j.sste.2022.100493
PubMed Web of Science ®Google Scholar
Sugumaran, R., Larson, S. R., & DeGroote, J. P. (2009). Spatio-temporal cluster analysis of county-based human West Nile virus incidence in the continental United States. International Journal of Health Geographics, 8(1), 1–19. https://doi.org/10.1186/1476-072X-8-43
PubMedGoogle Scholar
Weber, M., Alexa, M., & Müller, W. (2001). Visualizing time-series on spirals. Infovis.
Google Scholar
World Health Organization. (2020). WHO coronavirus (COVID-19) dashboard. Retrieved October 10, 2022, from https://covid19.who.int/
Google Scholar

A web-based analytical framework for the detection and visualization space-time clusters of COVID-19

ABSTRACT

Introduction

Methodology

Data