1,779
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A multivariate hierarchical regionalization method to discovering spatiotemporal patterns

, ORCID Icon, , , , & show all
Article: 2176704 | Received 19 Oct 2022, Accepted 01 Feb 2023, Published online: 21 Feb 2023

ABSTRACT

In GIScience, the regionalization method is widely used for geographical data mining, spatiotemporal pattern discovery, and regional studies. An ideal regionalization method should consider spatial contiguity, temporal contiguity, and attribute similarity. Existing regionalization approaches mostly focus on spatial contiguity and attribute similarity while ignoring the temporal contiguity characteristics of geographic phenomena. We propose a multivariate spatiotemporal regionalization (STR) method that considers spatiotemporal contiguity and attribute similarity. We design a bottom – up unsupervised multivariate hierarchical clustering algorithm with constraints using spatiotemporal proximity rules, enabling the automatic regionalization of spatiotemporal data. To test the performance of the STR method, we applied it to a synthetic dataset and a real-world dataset (Chinese air pollutant data) and achieved ideal results. Such a method offers a spatiotemporal perspective to address regionalization or clustering problems, potentially supporting other applications in spatiotemporal data analysis, remote sensing, urban planning, and social science.

GRAPHICAL ABSTRACT1

1 Introduction

Spatial clustering or regionalization has been one of the most common problems in GIScience, spatial statistics, and spatiotemporal data analysis (Helbich et al. Citation2013; Wolf Citation2021). Given a set of geographical objects (e.g. Chinese cities) with univariate or multivariate information, a regionalization method attempts to aggregate the geographical objects into numerous spatiotemporal contiguous regions while optimizing an objective function, which is normally a measure of the attribute similarity in each region (Aydin et al. Citation2021; Wang et al. Citation2022). Regionalization has been applied to many fields, such as map generalization, climatic zoning, housing market, social sensing, and health-related analysis (Ferreira, Campbell, and Matwin Citation2022; Wang et al. Citation2022; Zhou and Matyas Citation2021).

In previous studies, researchers have proposed many regionalization methods. Regionalization has two definitions, and both are well-accepted: partitioning, which assigns a unique label to any location in the study area, and non-partitioning, which is unnecessary to assign labels to regions, but to obtain a cluster center (Zhu et al. Citation2021). This paper focuses on partitioning regionalization, but some discussions also use the term partitioning clustering as a convention in the literature. These methods can be divided into three categories in data source: regionalization of raster, regionalization of polygon (or point), and regionalization of origin – destination flow or network (Kowe et al. Citation2020; Shu et al. Citation2020; Wei, Rey, and Knaap Citation2020). The regionalization algorithm of the raster is closely related to remote sensing and image processing (Bogucka and Jahnke Citation2018; Cowpertwait Citation2011; Lin et al. Citation2021). It is often used to identify geographic objects in remote sensing images, thereby providing a basis for object-oriented remote sensing algorithms.

However, a large amount of socio-economic data in the real world are distributed in administrative regions that have more complex neighborhood relationships than that of grid cells (Daraganova et al. Citation2012; Mohammadpour et al. Citation2021). Therefore, the regionalization of polygon has gradually developed to solve the issues in social science analysis. For instance, some researchers use data-driven regionalization algorithms to analyze the housing market, whereas others propose a mixed-level regionalization method to study national health data (Govorov et al. Citation2019; Helbich et al. Citation2013). Some scholars focus on improving regionalization methods from the algorithm perspective, in which the regionalization results can have spatial autocorrelation and other characteristics that are consistent with the real world (Liu et al. Citation2015; Niesterowicz, Stepinski, and Jasiewicz Citation2016). With the gradual development of sensor networks in recent years, plenty of socio-economic data are dynamically distributed in geographic space that are being generated with new sensor networks. (Doreian and Conti Citation2012; Pereira, Segurado, and Neves Citation2011). For example, population data at different geographic boundaries are still valuable and continue to be generated despite the development of sensors to monitor population flows.

Subsequently, a large number of algorithms in complex networks are used in the analysis of geographic flow and geographic networks because of the similarity between network community detection and regionalization (Berline et al. Citation2014; Ghawi and Pfeffer Citation2022; Poorthuis Citation2018; Shen, Liu, and Chen Citation2017). However, community detection algorithms in complex networks often only consider topological relationships. Thus, some researchers have proposed network community algorithms with spatial constraints, such as spatial constrained Louvain, spatial constrained Leiden algorithms, minimum spanning tree (MST)-regionalization method, network Max-P model, and spatially encouraged clustering and dynamically constrained clustering method (He et al. Citation2017; Mahmood and Gloaguen Citation2013; She, Duque, and Ye Citation2016). They are widely used in research on social network structure, urban structure, regionalization uncertainty, and neighborhood unit detection (Yu et al. Citation2016; Zhong et al. Citation2014).

With the in-depth study of spatiotemporal data, scholars have found that not only does spatiotemporal data have spatial autocorrelation but also temporal autocorrelation (Kristensson et al. Citation2009; Shaddick, Lee, and Wakefield Citation2013). This introduces difficulties in obtaining the expected effect when the regionalization method that only considers spatial constraints is applied to spatiotemporal data (Krivoruchko and Gribov Citation2019; Xi et al. Citation2019). Scholars have proposed solutions to this issue. For example, Hoffman proposed the multivariate spatio-temporal clustering (MSTC) method to solve environmental applications. Inspired by multivariate geographic clustering (MGC), this method transformed KNN clustering from data space to geographic feature space and used a parallel principal component analysis (PCA) tool to improve operational efficiency in large datasets. MSTC does not take spatiotemporal contiguity, especially temporal contiguity, as a strong constraint, but emphasizes the cluster similarity in geographic feature space (Hoffman et al. Citation2008). AssunÇão proposed a regionalization method based on MST to solve multivariate spatial clustering for socio-economic geographic units (Assunção et al. Citation2006). It adopts a top – down clustering, that is, it uses MST to link all geographic units, and then breaks specific links according to modularity and link strength to achieve the purpose of clustering. This method is widely used in geographic pattern discovery and integrated into ArcGIS and ArcGIS Pro after optimization using a genetic algorithm (Mahmood and Gloaguen Citation2013).

ClustR and SatScan model multivariate units from the perspective of complex networks and then transform spatial clustering into spatial complex network community detection to achieve their goals (Kriventseva et al. Citation2001; Coleman et al. Citation2009; Xie et al. Citation2022). In the face of the spatiotemporal clustering task, the MST-regionalization method, ClustR, and SatScan often adopt the idea of layering the time dimension while clustering the space dimension and then integrating the time dimension and space dimension to complete the task. It is beneficial to improve the efficiency of the algorithm, but the error probability between different time layers will increase accordingly. Therefore, this study aims to propose a new method that considers spatial and temporal relationships to solve the problem of insufficient response to temporal autocorrelation in existing regionalization methods. Thus, we realize a regionalization-based spatial clustering method that considers spatiotemporal contiguity and attribute similarity.

The rest of this paper is organized as follows. In Section 2, we introduce the multiple-variable constrained spatiotemporal regionalization (STR) method proposed in this paper. The section has four parts: spatiotemporal proximity rules (Section 2.1), feature vector distance measurement (Section 2.2), spatiotemporal constraint regionalization algorithm (Section 2.3), and evaluation of regionalization results (Section 2.4). Then, we use a synthetic dataset (Section 3.1) and a real-world dataset (Section 3.2) as case studies to verify the STR method. Finally, in Sections 4 and 5, we discuss and conclude.

2 Methodology

This paper proposes a bottom-up unsupervised hierarchical clustering algorithm, which can realize the automatic regionalization of spatiotemporal data in consideration of spatiotemporal contiguity and multivariate similarity. We call it the STR method, which comprises four steps. First, we define the spatiotemporal proximity rule and neighborhood merging rule of the spatiotemporal cube (STC). It can ensure that the spatiotemporal cluster formed by the merger of STCs has spatiotemporal contiguity. Second, because geographic entities in spatiotemporal data have multiple attributes, we convert them into feature vectors of geographic entities. Then, this paper uses the Euclidean distance of the multivariate center of gravity as the calculation method of similarity between STC and its neighbor cubes. Third, we use the idea of unsupervised hierarchical clustering to search spatiotemporal neighborhoods of each STC. We then calculate the similarity between STC and its neighbor cubes and select the largest one to merge. Fourth, we evaluate and analyze the regionalization results and detect three basic types of clusters: cylindrical, pie-shaped, and spherical. Various cluster types represent different spatiotemporal characteristics. This method fully considers spatial autocorrelation and temporal autocorrelation and is an attempt to extend the regionalization algorithm from the spatial dimension to the temporal dimension.

2.1 Data structure and spatiotemporal proximity rules

To realize the expression of spatiotemporal data, this study adopts NetCDF to construct STC. STC refers to a spatiotemporal cubic region comprising homogeneous pixels in a certain continuous spatial domain. Mathematically, an STC CubeiSi,Ti comprises the spatial attributes data Si and the temporal attributes data Ti. Si refers to the spatial domain, which can be a regular or irregular shape. Ti is the temporal domain of STC. illustrates the data structure of STC in sample data. show the three scales of STC data structure. is the data structure of spatiotemporal scale, which contains three dimensions of X, Y, and T. shows a layer of spatiotemporal scale, which can be the XY, XT, or YT domains. is the data structure of a single STC, which stores multiple values in it.

Figure 1. Three-level data structure of spatiotemporal cube (STC). (a) and (b) represent the STC structure in the spatiotemporal dimension and spatial dimension. (c) shows that each STC unit consists of multiple variables.

Figure 1. Three-level data structure of spatiotemporal cube (STC). (a) and (b) represent the STC structure in the spatiotemporal dimension and spatial dimension. (c) shows that each STC unit consists of multiple variables.

To ensure the spatiotemporal contiguity of regionalization, this study stipulates the spatiotemporal proximity rules of STC. The rules comprise three parts: the spatiotemporal proximity rule of (I) a single cube, (II) multiple cubes, and (III) common neighborhood cubes. For a single cube, this study stipulates that the cube with a common part (point, polyline, or polygon) can be neighborhood cubes (NA1) of A1 (). For multiple cubes (A1,A2,A3), the neighbor cubes (NA1,A2,A3) are defined as the intersection of NA1, NA2, and NA3 (NA1,A2,A3=NA1NA2NA3). For various types of cubes, the spatiotemporal proximity rules are relatively complicated. When the neighbor intersection is empty (NA1NB1=), the spatiotemporal proximity rules follow I or II. When the neighbor intersection is not empty (NA1NB1=CNA1B1), CNA1B1 is the common neighbor cube of A1 and B1. Whether it is A1’s neighborhood calculation or B1’s, CNA1B1 will participate and decide the attributions based on its attribute similarity.

Figure 2. Schematic illustration of the spatiotemporal proximity rules. (a) Spatiotemporal proximity rules of single STC units. (b) Spatiotemporal proximity rules of multiple STC units of the same class. (c) Spatiotemporal proximity rules of multiple STC units of different classes.

Figure 2. Schematic illustration of the spatiotemporal proximity rules. (a) Spatiotemporal proximity rules of single STC units. (b) Spatiotemporal proximity rules of multiple STC units of the same class. (c) Spatiotemporal proximity rules of multiple STC units of different classes.

2.2 Feature vector distance measurement

The spatiotemporal proximity rule (Section 2.1) guarantees the spatiotemporal contiguity between the central cube and its neighbor cubes. Thus, this section calculates the feature vector distance between the central cube and neighbor cubes.

Each cube constructed in this study stores multiple values to form a feature vector (Cubei,j,k=V1i,j,k,V2i,j,k,V3i,j,k,,Vni,j,k). To achieve the STR with spatiotemporal constraints and attribute similarity, measuring the feature vector distance between the central cube and its neighbor cubes is necessary to realize hierarchical clustering under the premise of spatiotemporal contiguity. As shown in , assuming that Cubei,j,k is located at a corner of STC, its neighbor cubes comprise Cubei,j1,k, Cubei,j1,k1, Cubei,j,k1, Cubei1,j1,k, Cubei1,j,k, Cubei1,j1,k1, and Cubei1,j,k1. The attribute values in a cube can form an n-dimensional feature vector. Given that the order of the attribute values in the cube does not affect the similarity among cubes, this study uses Euclidean distance to measure it (Formula 1).

Figure 3. Cube neighborhood and its attributes distance calculation. Each cube contains dual information: one is spatiotemporal position information, and the other is its attribute information.

Figure 3. Cube neighborhood and its attributes distance calculation. Each cube contains dual information: one is spatiotemporal position information, and the other is its attribute information.

(1) Dist=i=1,j=1,k=1nCubei,j,kCubei ,j ,k 212(1)

where Dist is the feature vector distance between the central cube and its neighbor cubes, Cubei,j,k is the feature vector of the central cube, and Cubei ,j ,k  is the vector of the neighbor cubes.

2.3 Spatiotemporal constraint regionalization algorithm

illustrates the implementation of the spatiotemporal constraint regionalization algorithm, which contains three steps and one iteration rule. First, neighborhood searching. Each cube in the study area is the starting point of the algorithm. Each cube looks for its neighbors and determines the most similar cube to merge. As shown in , we draw two cubes (CubeA and CubeB) as the starting points of the algorithm (all cubes can be used as the starting points of the algorithm). CubeA and CubeB belong to cube sets A=CubeA and B=CubeB, respectively. Second, CubeA and CubeB start to find their neighborhood cubes based on the spatiotemporal proximity rules, thereby forming two neighborhood cube sets NA=CubeNA1,CubeNA2,,CubeNAn and NB=CubeNB1,CubeNB2,,CubeNBm (). Third, we calculate the feature vector distance between the cubes in A and the cubes in NA by using Formula 1. The same is true for B and its neighbor set NB. Fourth, the cubes with the smallest distance are moved from NA and NB to A and B, respectively (). Currently, A=CubeA,CubeNA1, B=CubeA,CubeNB1. In the next iteration, the cubes in A will be regarded as a whole, and the neighborhood of A is the union of all cube neighborhood sets (Formula 2 and ). Next, the algorithm will continue to iterate until it reaches the number of clusters required by users ().

Figure 4. Framework of STR algorithm. (a) Cubes are randomly selected. (b) Neighborhood search. (c) Finding the most similar cube. (d) Neighborhood merge. (e)–(g) Looping the above progress. (h) Completed regionalization.

Figure 4. Framework of STR algorithm. (a) Cubes are randomly selected. (b) Neighborhood search. (c) Finding the most similar cube. (d) Neighborhood merge. (e)–(g) Looping the above progress. (h) Completed regionalization.

NA=Ncube1Ncube2Ncube3Ncubem    2

where NA is the neighborhood cubes set of A, Ncube1 to Ncubem, which are the neighborhood cubes sets of the cubes in A.

2.4 Evaluation and analysis of regionalization results

This section evaluates the shape and variable distribution of regionalization results. The shape of the regionalization results can reflect their spatiotemporal characteristics, while the internal variable distribution can reflect their attribute characteristics. represents three basic types of STC regionalization results: (a) cylindrical clusters, (b) pie-shaped clusters, and (c) spherical clusters. Other cluster shapes can be composed by these three basic shapes. Cylindrical clusters have a smaller range in the X and Y direction but a larger range in the T direction (). This represents a phenomenon that is relatively stable in space but lasts for a long time. Pie-shaped clusters have a larger range in the X and Y directions, but a smaller range in the T direction (). This represents a phenomenon that has a wide distribution in space but a short duration. Spherical clusters have a relatively uniform distribution in the three directions of XYT ().

Figure 5. Three basic types of STC regionalization results. (a) Cylindrical cluster. (b) Pie-shaped cluster. (c) Spherical cluster.

Figure 5. Three basic types of STC regionalization results. (a) Cylindrical cluster. (b) Pie-shaped cluster. (c) Spherical cluster.

shows the difference in variable distributions of various clusters. The left-hand part of is the regionalization results, and the middle part is the three clusters in the regionalization results: Clusters I,II, and III. in the right-hand part are the variable distribution of these three clusters. The method comprehensively considers multivariate information in the progress of calculation. There is a similar variable distribution within each cluster, which can reflect cluster attribute characteristics. For example, the cubes in Cluster I have the characteristics of relatively high V2 and V4, as well as low V1, V2, and V5 (). Cubes in other clusters have the same phenomenon. When the variables have practical significance, each cluster can analyze specific geographical meanings according to its variable distribution.

Figure 6. Variable distribution of regionalization results. a, b, and c in the right-hand figures correspond to the variable distribution characteristics of Clusters I, II, and III, respectively.

Figure 6. Variable distribution of regionalization results. a, b, and c in the right-hand figures correspond to the variable distribution characteristics of Clusters I, II, and III, respectively.

3 Case study

3.1 Experiment of synthetic dataset

In this study, a synthetic dataset is simulated to evaluate the rationality of the STR method and designed as a regular cube containing 216 STCs. Each STC includes eight attributes, as follows: “Object ID” is the number of cubes; “V1,” “V2,” and “V3” are the three attributes of cubes; “X,” “Y,” and “T” are the spatiotemporal information of cubes; and “Cluster ID” is the regionalization result of cubes. presents the list of attributes. We preset the attributes of the synthetic dataset to ensure its internal spatiotemporal heterogeneity, that is, random numbers for blocking are assigned to the attributes of the synthetic dataset. We assume that the attributes of a cube i constitute a three-dimensional vector Vi=V1i,V2i,V3i. Then, the difference in the distribution of attributes between different cubes can be measured by the Euclidean distance between vectors Vi. The Euclidean distance is small within the same cluster, and the distance between different clusters is large. Specifically, for each cube of the synthetic dataset, the range of attribute values (V1, V2, V3) is 0–200. Inside the same preset cluster, the Euclidean distance between Vi is within 2.5. The Euclidean distance between different clusters tends to be over 50, guaranteeing global heterogeneity and local homogeneity within the same cluster. Several outliers to the synthetic dataset are added to better simulate the real-world situation. The synthetic dataset already has an apparent STR structure, which can be used to evaluate the rationality of the STR method results.

Table 1. Data structure of synthetic dataset (Object ID, V1, V2, V3, X, Y, T and Cluster ID).

The synthetic dataset is converted into STCs, and the STR method is used to regionalize its spatiotemporal clusters. Given that the STR method is an unsupervised regionalization method, the number of regionalization k is determined by research needs. This section shows eight regionalization results, where the same color represents the same spatiotemporal cluster (). As shown in the figure, the results have three basic characteristics. First, spatiotemporal contiguity of the same spatiotemporal cluster is guaranteed, making the cubes mutually spatiotemporal neighbors within the same cluster. Second, the spatiotemporal structure preset in the synthetic dataset has been reasonably expressed. Third, the three basic structures of cylindrical, pie-shaped, and spherical clusters have also been mined. Thus, the method can express the spatiotemporal structure of the synthetic dataset and complete the regionalization.

Figure 7. STR method results of synthetic dataset: (a)(h) results under different cluster numbers.

Figure 7. STR method results of synthetic dataset: (a)–(h) results under different cluster numbers.

After completing the initial experiment on the small random synthetic dataset, we test the accuracy of the STR method on a larger random synthetic dataset. The dataset contains 8000 STCs, each consisting of five random variables. Before each clustering test, we pre-divided the entire dataset into a specific number of clusters and set labels. The results of the STR method are compared with the labels to obtain the misclassified STCs and accuracy. represents the comparison of theoretical and actual results on a random dataset. Each subgraph consists of two parts: the theoretical and actual results in the upper and lower parts, respectively. From the figure, we find that the actual results of the STR method are consistent with the theoretical results. At the same time, STR results also maintain a spatiotemporal contiguity. When the cluster number is 3 (), the STR results are suspected to be discontinuous in the spatiotemporal dimension. Upon inspection, the resulting outliers are connected to the main community through STCs at the bottom of the random dataset. A slight misalignment shows in the time dimension of the STR results when the cluster number is 5 and 7 (). However, in actual research, the spatial dimension of the study area is much larger than the temporal dimension. Thus, slight errors in the temporal dimension do not affect the analysis and mining of specific geographic patterns.

Figure 8. Accuracy comparison test of random synthetic dataset: (a)–(i) comparison between theoretical and actual results under different cluster numbers of the STR method.

Figure 8. Accuracy comparison test of random synthetic dataset: (a)–(i) comparison between theoretical and actual results under different cluster numbers of the STR method.

represents the numerical characteristics of each cluster generated by the STR method. The X-axis in this figure represents attributes of the synthetic dataset, and the Y-axis is the mean value of attributes of each cluster. Each line represents the numerical distribution of attributes within each cluster. The figure shows that the STR method can also discover the differences in the feature space of STCs. For example, Clusters I and IV are not only independent of each other in spatiotemporal dimension but also exhibit opposite numerical distribution patterns in feature space. If this phenomenon occurs in the real-world datasets, it often reveals a specific geographic spatiotemporal pattern. This shows that the STR method is effective in performing regionalization-based clustering and mining spatiotemporal patterns.

Figure 9. Numerical characteristics of each cluster when the number of regionalization is 10.

Figure 9. Numerical characteristics of each cluster when the number of regionalization is 10.

From the specific misclassified STCs and algorithm accuracy, we can also find interesting phenomena. With the increase in cluster numbers, the amount of misclassified STCs decreased from 381 to 294. At the same time, the accuracy of the algorithm on the random dataset also increased from 95.2375% to 96.325%. When there are more than 10 clusters, the misclassification number of the STR method tends to be stable and remains at about 295 (). The reason is related to the elimination of the error caused by the increase in the cluster number. In theory, for the same algorithm, its accuracy must be stable on any random dataset, and thus the number of misclassified STCs is also stable. As such, when the number of clusters is small, a large number of misclassified STCs is concentrated in the same cluster, resulting in a slightly higher error rate. We also applied the MSC, MST, and SKATER methods to the synthetic dataset. The result shows that the STR method has a relative advantage in regionalization-based clustering tasks for spatiotemporal data. Why other methods work poorly with spatiotemporal data is also clear. Given that these methods do not consider spatiotemporal contiguity logically but only consider spatial contiguity, they can only calculate each temporal layer and then integrate them. Discontinuous clusters will generate between different temporal layers, causing the rise of error probabilities.

Figure 10. Misclassification number of MSC, MST, SKATER, and STR methods under different cluster numbers.

Figure 10. Misclassification number of MSC, MST, SKATER, and STR methods under different cluster numbers.

represents the performance of the STR method for each cluster at different regionalization thresholds. This table has 16 rows and 17 columns. Column 1 is the amount of regionalization. Columns 2–16 show the misclassification numbers of each cluster (Roman numerals indicate the ID of clusters). Column 17 is the misclassification generated by each experiment, which is corresponding to . From , we can find that the error of the STR method gradually decreases with the increase in the amount of regionalization and tends to be stable after more than 10 clusters. The reason can be obtained from the performance of the STR method for individual clusters. Each cluster contributes to the total error and outliers. A portion of errors and outliers will become part of new clusters as the amount of regionalization increases. However, too many clusters often lead to small attribute differences between clusters. That is, multiple clusters might reflect the same spatiotemporal pattern. Therefore, we think it reasonable to regionalize spatiotemporal data using the STR method and set the cluster number to 10 in the real-world dataset experiment.

Table 2. Misclassification of STR method for each cluster under different cluster numbers.

3.2 Experiment of real-world dataset

3.2.1 Study area and data description

Air pollution is one of the important issues affecting the quality of human life and global environment changes (Cooper et al. Citation2022; Jbaily et al. Citation2022). The flourishing development of high spatiotemporal resolution and multivariate air pollutant datasets also provides a basis for the expansion of regionalization algorithms from the spatial to the spatiotemporal dimension. STR of air pollutants also helps us to study their distribution and attribute patterns from the natural environment and socioeconomic aspects. Therefore, in this section, we apply the STR method to delineate the air pollutants structure of China. The data used are derived from the China High Air Pollutants dataset co-produced by the National Earth System Science Data Center and the University of Maryland (Jing Wei et al. Citation2021). The dataset is composed of the distribution of seven major pollutants in PM1, PM2.5, PM10, O3, NO2, SO2, and CO from 2000 to 2020 (). The dataset is generated from big data (e.g. ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution (Wei et al. Citation2021). The PM2.5, PM10, O3 and NO2 datasets have a spatial resolution of 1 km and a temporal resolution of 1 day. The spatial and temporal resolutions of other pollutant datasets are 10 km and 1 month. In our experiment, we select four pollutant datasets of PM2.5, PM10, O3, and NO2 with the same spatiotemporal resolutions in 2018. shows the distribution of the four pollutants in March, June, September, and December 2018. The scope of these pollutant datasets covers the whole of China and the whole year of 2018.

Figure 11. Real-world pollutant dataset: (a)–(d) spatial distributions of PM2.5 in March, June, September, and December 2018 in China; (e)–(h) the spatial distribution of PM10 at these four time points; (i)–(l) distribution of O3; and (m)–(p) distribution of NO2.

Figure 11. Real-world pollutant dataset: (a)–(d) spatial distributions of PM2.5 in March, June, September, and December 2018 in China; (e)–(h) the spatial distribution of PM10 at these four time points; (i)–(l) distribution of O3; and (m)–(p) distribution of NO2.

3.2.2 Spatiotemporal regionalization result of real-world dataset

In this section, we first map the four-pollutant data according to their spatial and temporal resolutions. Thus, each grid point of the formed new dataset saves the emissions of the four pollutants at that location. Then, according to the temporal attributes of the new dataset, the temporal association between layers is constructed such that each grid point in the new dataset has neighbors in the spatiotemporal dimension. For convenient reading, the results are shown in an STC and are mainly divided into two parts of the spatiotemporal clusters and their numerical distribution of attributes. In Section 3.1, we have conducted stability and precision experiments and proved that when the number of clusters exceeded 10, the error of the algorithm tends to be stable ( and ). Thus, we regionalized the real-world dataset to 10 clusters using the STR method. represents the spatiotemporal distribution of the Chinese air pollutants’ regionalization results calculated through the STR method. shows the distributions in the vertical (temporal) dimension shows the length of duration and in the horizontal (spatial) dimension of each cluster. shows clear differences in the numerical distributions of pollutants in each cluster, proving that the STR method realizes the clustering not only in the spatiotemporal dimension but also in the attribute dimension. Furthermore, in , the violin plot of each clustering attribute distribution shows a flat shape. This represents that the numerical distribution of attributes within the same cluster is similar, and it also shows the internal consistency of clusters generated by the STR method. If we combine the spatiotemporal distribution of clusters with attribute values distribution, then many interesting phenomena can be found.

Figure 12. STR method result of air pollutant dataset showing 10 clusters calculated by STR method.

Figure 12. STR method result of air pollutant dataset showing 10 clusters calculated by STR method.

Figure 13. Spatiotemporal distribution of each cluster calculated by STR method.

Figure 13. Spatiotemporal distribution of each cluster calculated by STR method.

Figure 14. Plots of the numerical distribution of pollutants in each cluster: (a)–(j) pollutant distributions from Clusters 1 to 10. A violin plot’s width represents the amount of data, and its height represents the distribution range.

Figure 14. Plots of the numerical distribution of pollutants in each cluster: (a)–(j) pollutant distributions from Clusters 1 to 10. A violin plot’s width represents the amount of data, and its height represents the distribution range.

shows that the distribution of each cluster on the temporal scale is continuous, and no clusters produce a break. This result reveals unique and stable patterns of pollutant emissions across various clusters, forming cross-verification with the apparent attribute distribution features displayed in . At the same time, the necessity of using the STR method to study the multivariate distribution in spatiotemporal clusters is proven. The spatial distribution of clusters has a significant correspondence with China’s natural geographical and socioeconomic factors. Specifically, the spatial distribution of clusters has a strong correlation with the physical geographic constraints in western China and a high correlation with socioeconomic development patterns in southeastern China.

In detail, the distribution of Cluster 1 in time is persistent, and the distribution in space gradually decreases with time. At the beginning of 2018, Cluster 1’s coverage covered Northeast China, northern Inner Mongolia, Hebei Province, and Shandong Province. Then, it was gradually reduced to Northeast China over time (). Northeast China is the earliest industrialized area in the country, and its economic structure is dominated by heavy industry and agriculture. Thus, the numerical distribution of O3 is concentrated in high values and few outliers occur (). The values of NO2, PM2.5, and PM10 are concentrated at lower levels, indicating that vehicle exhaust emissions are lower in Northeast China. This finding may indirectly reflect the low urbanization rate and relatively sparse distribution of big cities (Wauchope et al. Citation2022).

Cluster 2’s spatiotemporal coverage is stable. There are no faults in the temporal dimension, and the spatial coverage is concentrated in the North China Plain (). North China Plain is the most densely populated area in China. The distribution numbers of the four pollutants are significantly more than other clusters (). We can also observe a significant feature of its numerical distribution, that is, the median value of PM2.5 is high and the value near the peak is significantly increased. This finding is related to the existence of many cities with large populations in the region and their severe domestic waste gas emissions (Sala et al. Citation2021). In addition, Cluster 6 has similar numerical distribution to Cluster 2 (), indicating that the Loess Plateau and North China have similar industrial structures and development patterns. However, the O3 of Cluster 6 is more concentrated on high values than that of Cluster 2, and the emissions of NO2 and PM2.5 are slightly lower. This finding represents the urban residents in the Loess Plateau have lower emissions than those in North China.

The spatial range of Cluster 3 consists of the Taklimakan Desert, Gansu Gobi, and the Qaidam Basin. Its temporal distribution is persistent. In the spatial dimension, its coverage is stable throughout the year and expanded to central Gansu Province at the end of the year (). This part is the most sparsely populated and most desertified area in the country. Widespread deserts and frequent dust storms result in the extremely high PM10 value in this cluster, which peaks at around 1200 ug/m3, the highest among 10 clusters (). Owing to the sparse population and industrial facilities, the emissions of the other three pollutants are extremely low, and the value of NO2 is even distributed around 0. Also distributed in Northwest China are Clusters 5 and 8, which have similar attributes and numerical distribution to Cluster 3. (). Cluster 5 is mainly distributed in the grasslands of north Xinjiang and Inner Mongolia. The contiguity is strong on the temporal dimension, but the spatial range gradually decreased with time (). Cluster 8 covers almost all the towering mountains and plateaus in western China, including the Qinghai – Tibet plateau, Himalayas, Kunlun Mountains, Pamirs, Tianshan Mountains, and the Qinling Mountains extending to central China. Its spatial extent is also stable, with little change over time (). These two clusters are also sparsely populated areas. The PM10 values are lower in both two clusters than in Cluster 3 because of being far from the deserts, suggesting that the region with similar natural and social environments have similar emission patterns of pollutants.

Clusters 4, 6, 7, 9, and 10 are distributed in the central area of China. Among them, Clusters 6, 7, and 10 do not easily change in time and space. (). This illustrates that the air emission patterns underlying the three clusters have obvious spatiotemporal autocorrelation. Clusters 4 and 9 are distributed in China’s southeast coast (). Their spatial positions overlap each other, and their temporal distributions intersect each other, showing that the two have similarities in pollutant emission patterns. Clusters 4, 7, and 9 exhibit similar patterns in attribute numerical distribution (). The four pollutants show little difference in values and a relatively uniform distribution of each pollutant. Geographically, Cluster 4 is mainly located in the coastal area of Southern China. Cluster 7 covers China’s central urban agglomeration. The range of Cluster 9 corresponds to the lower reaches of the Yangtze River plain, Taiwan Island, and Hainan Island (). These regions are among the most economically developed and commercialized in China. Cluster 9’s prosperous economic activity has also created a pattern of pollutant emissions that differs from other clusters. Cluster 10 contains Southern Tibet and Yunnan and exhibits a unique attribute numerical distribution pattern. The median value of O3 is significantly higher than the other pollutants, which is different from the other clusters (). The values of PM2.5 and PM10 are clearly lower than those of other clusters. This emission pattern may be inextricably linked to high forest cover, sufficient precipitation, and lower human activities (Z. Liu et al. Citation2015).

4 Discussion

4.1 Comparison with other methods

In the real world, most geographic phenomena follow two dimensions of properties: attribute similarity and spatial or spatiotemporal contiguity. Thus, numerous spatial clustering methods have been developed around these two properties and form three major categories: attribute-based, regionalization-based, and the methods that balance the first two (Kim et al. Citation2015; Yuvaraj et al. Citation2021). Attribute-based clustering focuses on the similarity of variables and ignores spatial relationships. This category includes the k-means, BIRCH, DBSCAN, OPTICS, CURE, SOM, and ADCN (Assunção et al. Citation2006; Guo Citation2008; Nowosad and Stepinski Citation2018). Regionalization-based clustering emphasizes the importance of spatial contiguity, such as SKATER and AUTOCLUST (Aydin et al. Citation2018; Kim et al. Citation2015). From the clustering purpose, the STR method belongs to this category because it also considers attribute similarity (R. Wei et al. Citation2021; Zhang et al. Citation2021). However, different from the traditional regionalization-based clustering, this study regards temporal contiguity as an important factor. Thus, the STR method results have spatial congruity, temporal contiguity, and attribute similarity. Researchers can then comprehensively consider the characteristics of spatiotemporal distribution and attributes of numerical distribution to find the geographic patterns within the spatiotemporal data. On this basis, we analyze the spatiotemporal characteristics and attributes numerical distribution of the STR results and achieve promising results in the synthetic and real-world datasets. The results effectively reveal China’s pollutant emission pattern under the dual control of neutral and socio-economic factors.

4.2 Shortcoming and future directions

Although our proposed STR method has shown promising results in STR in the synthetic and real-world datasets, we acknowledge three limitations, which we discuss here. (1) Regionalization results of STR are affected by the dataset shape. The STR method is a bottom-up spatiotemporal hierarchical regionalization that produces phenomena known as “region growing” during the aggregation progress. That is, approaching the edge of the dataset, each cluster is inevitably affected by the shape of the dataset. For example, when the temporal dimension is small (the T direction is too short), the region is forced to extend in the spatial domain, and vice versa. (2) The algorithm is relatively inefficient. For the hierarchical clustering method, the time complexity is On3. Thus, the computational cost is high for the large-scale dataset, restricting the analysis of ultra-long time series data. (3) The scale effect and modifiable areal unit problem (MAUP) exist in the STR method. In previous studies, scholars generally believed that MAUP is a source of geo-statistical bias and can significantly affect the results of statistical hypothesis tests (Cressie Citation1996; Holt et al. Citation1996). That is, when the smallest geo-statistical unit changes, the distribution of statistical variables of each unit also changes, thereby affecting the spatial analysis results of geographical variables (Kwan Citation2012; Menon Citation2012). Therefore, we select geographical units of other scales in this study. The internal multivariate of units will be changed, causing the distance between units in the feature space to change, thereby leading to clustering results different from the original experiment. If the range of geographic units increases, the original features will be smoothed. Similarly, if the geographic unit’s range is too small, many units will contain extreme values, destroying the spatiotemporal contiguity of clustering results.

The selection of “Distance” also impacts the clustering results. The distance in this study can be divided into two aspects: geographical space distance and feature space distance. The selection of geographical distance determines spatiotemporal proximity rules. When it is commensurable, users can use the traditional distance algorithm as geographical distance. When geographical distance is not commensurable, users can use an adjacency matrix as geographical distance. The selection of feature space distance determines the unit’s similarity. For numerical geographic variables, users can choose Euclidean distance or weighted Euclidean distance. For character geographic variables, users can use Jaccard distance as the calculation method of similarity. On this basis, we can summarize three future directions of this study. First, we can use intelligent optimization algorithms – such as genetic algorithm, particle swarm optimization, and artificial neural network – to improve the method’s efficiency. Second, we can consider the top – down idea to avoid the influence of the dataset shape on the clustering results in the bottom-up clustering progress. Last, we can develop a multiscale-stable multivariate spatial clustering method to achieve relatively stable patterns and clusters at different scales.

5 Conclusion

This study proposes a multivariate hierarchical regionalization method (STR method) to find spatiotemporal patterns. This method effectively realizes the regionalization-based clustering and pattern discovery of spatiotemporal data and achieves ideal results in the small synthetic dataset, large synthetic dataset, and real-world air pollution dataset. Compared with other regionalization methods, STR also shows advantages in spatiotemporal contiguity and attribute similarity of clusters. Through the experiment on a real-world dataset, we found three characteristics of the spatiotemporal patterns of pollutant emission in China. That is, global heterogeneity, local homogeneity, and temporal intersectionality on the southeast coast of China. This is a powerful statement of Tobler’s first law and the second law of geography (Tobler Citation1970; Goodchild Citation2004b). The similarity of attributes within the same cluster supports the third law of geography that has emerged in recent years (Zhu et al. Citation2018). While the STR method achieves ideal results, it also exposed its own limitations: dataset shape effects, inefficiency, and MAUP. It also extends the future directions of the STR method: (1) embedding of efficiency optimization algorithms, (2) top – bottom idea applications, (3) research on multi-scale stable clustering methods.

The major novel contribution of this study lies on providing a new framework for spatial – temporal comparisons and a spatiotemporal perspective on STR. This also enhances the efficiency of spatiotemporal heterogeneity analysis and enriches the means for spatial pattern discovery. The STR method is a universal method. Any data confirming the input structure of the algorithm can be used for spatiotemporal pattern discovery, greatly increasing the scope of its application. Therefore, the STR method can support remote sensing, urban planning, and social science applications with the abundance of multi-source spatiotemporal data.

CRediT authorship contribution statement

Haoran Wang: Methodology, Data analysis, Visualization, Writing original draft. Haiping Zhang:Methodology, Visualization. Hui Zhu: Methodology, Supervision. Fei Zhao: Methodology, Supervision. Guoan Tang: Methodology, Funding acquisition. Liyang Xiong: Supervision, Funding acquisition. Shangjing Jiang: Methodology.

Data and code availability

The code our completed in this work is available at https://github.com/AidenWang0309/Spatiotemporal-Regionaliaztion-Algorithm.

Acknowledgments

The work presented in this paper is supported by National Natural Science Foundation of China (NO. 42201455, NO. 41930102) and Postgraduate Research and Innovation Program of Jiangsu Province, China (NO. KYCX22_1564).

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [42201455]; National Natural Science Foundation of China [41930102]; Postgraduate Research and Innovation Program of Jiangsu Province [KYCX22_1564]

References

  • Assunção, R. M., M. C. Neves, G. Câmara, and C. Da Costa Freitas. 2006. “Efficient Regionalization Techniques for Socio‐economic Geographical Units Using Minimum Spanning Trees.” International Journal of Geographical Information Science 20 (7): 797–17. doi:10.1080/13658810600665111.
  • Aydin, O., M. V. Janikas, R. Assunção, and T. -H. Lee 2018. Skater-Con Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
  • Aydin, O., M. V. Janikas, R. M. Assunção, and T. -H. Lee. 2021. “A Quantitative Comparison of Regionalization Methods.” International Journal of Geographical Information Science 35 (11): 2287–2315. doi:10.1080/13658816.2021.1905819.
  • Berline, L., A. M. Rammou, A. Doglioli, A. Molcard, and A. Petrenko. 2014. “A Connectivity-Based Eco-Regionalization Method of the Mediterranean Sea.” PLoS One 9 (11): e111978. doi:10.1371/journal.pone.0111978.
  • Bogucka, E., and M. Jahnke. 2018. “Feasibility of the Space–Time Cube in Temporal Cultural Landscape Visualization.” ISPRS International Journal of Geo-Information 7 (6): 209. doi:https://doi.org/10.3390/ijgi7060209.
  • Coleman, M., M. Coleman, A. M. Mabuza, G. Kok, M. Coetzee, and D. N. Durrheim. 2009. “Using the SaTscan Method to Detect Local Malaria Clusters for Guiding Malaria Control Programmes.” Malaria journal 8 (1): 1–6. doi:10.1186/1475-2875-8-68.
  • Cooper, M. J., R. V. Martin, M. S. Hammer, P. F. Levelt, P. Veefkind, L. N. Lamsal, and C. A. McLinden. 2022. “Global Fine-Scale Changes in Ambient NO2 During COVID-19 Lockdowns.” Nature 601 (7893): 380–387. doi:10.1038/s41586-021-04229-0.
  • Cowpertwait, P. S. P. 2011. “A Regionalization Method Based on a Cluster Probability Model.” Water Resources Research 47 (11). doi:10.1029/2011wr011084.
  • Cressie, N. A. 1996. “Change of Support and the Modifiable Areal Unit Problem.”
  • Daraganova, G., P. Pattison, J. Koskinen, B. Mitchell, A. Bill, M. Watts, and S. Baum. 2012. “Networks and Geography: Modelling Community Network Structures as the Outcome of Both Spatial and Network Processes.” Social Networks 34 (1): 6–17. doi:10.1016/j.socnet.2010.12.001.
  • Doreian, P., and N. Conti. 2012. “Social Context, Spatial Structure and Social Network Structure.” Social Networks 34 (1): 32–46. doi:10.1016/j.socnet.2010.09.002.
  • Ferreira, M. D., J. N. A. Campbell, and S. Matwin. 2022. “A Novel Machine Learning Approach to Analyzing Geospatial Vessel Patterns Using AIS Data.” GIScience & Remote Sensing 59 (1): 1473–1490. doi:10.1080/15481603.2022.2118437.
  • Ghawi, R., and J. Pfeffer. 2022. “A Community Matching Based Approach to Measuring Layer Similarity in Multilayer Networks.” Social Networks 68: 1–14. doi:10.1016/j.socnet.2021.04.004.
  • Goodchild, M. F. 2004. “The Validity and Usefulness of Laws in Geographic Information Science and Geography.” Annals of the Association of American Geographers 94 (2): 300–303. doi:https://doi.org/10.1111/j.1467-8306.2004.09402008.x.
  • Govorov, M., G. Beconytė, G. Gienko, and V. Putrenko. 2019. “Spatially Constrained Regionalization with Multilayer Perceptron.” Transactions in GIS 23 (5): 1048–1077. doi:10.1111/tgis.12557.
  • Guo, D. 2008. “Regionalization with Dynamically Constrained Agglomerative Clustering and Partitioning (REDCAP).” International Journal of Geographical Information Science 22 (7): 801–823. doi:10.1080/13658810701674970.
  • Helbich, M., W. Brunauer, J. Hagenauer, and M. Leitner. 2013. “Data-Driven Regionalization of Housing Markets.” Annals of the Association of American Geographers 103 (4): 871–889. doi:10.1080/00045608.2012.707587.
  • He, W., H. Ling, Z. Zhang, and C. Gong. 2017. “Multi-Objective Spatially Constrained Clustering for Regionalization with Particle Swarm Optimization.” International Journal of Geographical Information Science 32 (4): 827–846. doi:10.1080/13658816.2017.1418363.
  • Hoffman, F. M., W. W. Hargrove, R. T. Mills, S. Mahajan, D. J. Erickson, and R. J. Oglesby. 2008. “Multivariate Spatio-Temporal Clustering (MSTC) as a Data Mining Tool for Environmental Applications.“ 4th International Congress On Environmental Modelling and Software, July 2008, Barcelona, Catalonia, Spain.
  • Holt, D., D. G. Steel, M. Tranmer, and N. Wrigley. 1996. “Aggregation and Ecological Effects in Geographically Based Data.” Geographical Analysis 28 (3): 244–261. doi:10.1111/j.1538-4632.1996.tb00933.x.
  • Jbaily, A., X. Zhou, J. Liu, T. H. Lee, L. Kamareddine, S. Verguet, and F. Dominici. 2022. “Air Pollution Exposure Disparities Across US Population and Income Groups.” Nature 601 (7892): 228–233. doi:10.1038/s41586-021-04190-y.
  • Kang, Y., K. Wu, S. Gao, I. Ng, J. Rao, S. Ye, and T. Fei. 2022. “STICC: A Multivariate Spatial Clustering Method for Repeated Geographic Pattern Discovery with Consideration of Spatial Contiguity.” International Journal of Geographical Information Science 36 (8): 1518–1549. doi:10.1080/13658816.2022.2053980.
  • Kim, K., D. J. Dean, H. Kim, and Y. Chun. 2015. “Spatial Optimization for Regionalization Problems with Spatial Interaction: A Heuristic Approach.” International Journal of Geographical Information Science 30 (3): 451–473. doi:10.1080/13658816.2015.1031671.
  • Kowe, P., O. Mutanga, J. Odindi, and T. Dube. 2020. “A Quantitative Framework for Analysing Long Term Spatial Clustering and Vegetation Fragmentation in an Urban Landscape Using Multi-Temporal Landsat Data.” International Journal of Applied Earth Observation and Geoinformation 88: 88. doi:https://doi.org/10.1016/j.jag.2020.102057.
  • Kristensson, P. O., N. Dahlback, D. Anundi, M. Bjornstad, H. Gillberg, J. Haraldsson, and J. Stahl. 2009. “An Evaluation of Space Time Cube Representation of Spatiotemporal Patterns.” IEEE Transactions on Visualization and Computer Graphics 15 (4): 696–702. doi:10.1109/TVCG.2008.194.
  • Kriventseva, E. V., W. Fleischmann, E. M. Zdobnov, and R. Apweiler. 2001. “CluStr: A Database of Clusters of SWISS-PROT+ TrEmbl Proteins.” Nucleic Acids Research 29 (1): 33–36. doi:10.1093/nar/29.1.33.
  • Krivoruchko, K., and A. Gribov. 2019. “Evaluation of Empirical Bayesian Kriging.” Spatial Statistics 32: 32. doi:https://doi.org/10.1016/j.spasta.2019.100368.
  • Kwan, M. P. 2012. “The Uncertain Geographic Context Problem.” Annals of the Association of American Geographers 102 (5): 958–968. doi:10.1080/00045608.2012.687349.
  • Lin, Y., L. Li, J. Yu, Y. Hu, T. Zhang, Z. Ye, J. Li. 2021. “An Optimized Machine Learning Approach to Water Pollution Variation Monitoring with Time-Series Landsat Images.” International Journal of Applied Earth Observation and Geoinformation 102: 102. doi:10.1016/j.jag.2021.102370.
  • Liu, Z., D. Guan, W. Wei, S. J. Davis, P. Ciais, J. Bai, and K. He. 2015. “Reduced Carbon Emission Estimates from Fossil Fuel Combustion and Cement Production in China.” Nature 524 (7565): 335–338. doi:10.1038/nature14677.
  • Liu, X., S. Wang, Y. Zhou, F. Wang, W. Li, and W. Liu. 2015. “Regionalization and Spatiotemporal Variation of Drought in China Based on Standardized Precipitation Evapotranspiration Index (1961–2013).” Advances in Meteorology 2015: 1–18. 2015, doi:https://doi.org/10.1155/2015/950262.
  • Mahmood, S. A., and R. Gloaguen. 2013. “Analyzing Spatial Autocorrelation for the Hypsometric Integral to Discriminate Neotectonics and Lithologies Using DEMs and GIS.” GIScience & Remote Sensing 48 (4): 541–565. doi:10.2747/1548-1603.48.4.541.
  • Menon, C. 2012. “The Bright Side of MAUP: Defining New Measures of Industrial Agglomeration.” Papers in Regional Science 91 (1): 3–28. doi:10.1111/j.1435-5957.2011.00350.x.
  • Mohammadpour, K., M. Sciortino, M. Saligheh, T. Raziei, and A. Darvishi Boloorani. 2021. “Spatiotemporal Regionalization of Atmospheric Dust Based on Multivariate Analysis of MACC Model Over Iran.” Atmospheric Research 249: 249. doi:https://doi.org/10.1016/j.atmosres.2020.105322.
  • Niesterowicz, J., T. F. Stepinski, and J. Jasiewicz. 2016. “Unsupervised Regionalization of the United States into Landscape Pattern Types.” International Journal of Geographical Information Science 30 (7): 1450–1468. doi:10.1080/13658816.2015.1134796.
  • Nowosad, J., and T. F. Stepinski. 2018. “Spatial Association Between Regionalizations Using the Information-Theoretical V-Measure.” International Journal of Geographical Information Science 32 (12): 2386–2401. doi:10.1080/13658816.2018.1511794.
  • Peng, D., Z. Gui, D. Wang, Y. Ma, Z. Huang, Y. Zhou, and H. Wu. 2022. “Clustering by Measuring Local Direction Centrality for Data with Heterogeneous Density and Weak Connectivity.” Nature Communications 13 (1): 5455. doi:10.1038/s41467-022-33136-9.
  • Pereira, M., P. Segurado, and N. Neves. 2011. “Using Spatial Network Structure in Landscape Management and Planning: A Case Study with Pond Turtles.” Landscape and Urban Planning 100 (1–2): 67–76. doi:10.1016/j.landurbplan.2010.11.009.
  • Poorthuis, A. 2018. “How to Draw a Neighborhood? The Potential of Big Data, Regionalization, and Community Detection for Understanding the Heterogeneous Nature of Urban Neighborhoods.” Geographical Analysis 50 (2): 182–203. doi:10.1111/gean.12143.
  • Sala, E., J. Mayorga, D. Bradley, R. B. Cabral, T. B. Atwood, A. Auber, and J. Lubchenco. 2021. “Protecting the Global Ocean for Biodiversity, Food and Climate.” Nature 592 (7854): 397–402. doi:10.1038/s41586-021-03371-z.
  • Shaddick, G., D. Lee, and J. Wakefield. 2013. “Incorporating Spatial Variability Within Epidemiological Studies of Environmental Exposures.” International Journal of Applied Earth Observation and Geoinformation: ITC Journal 22: 65–74. doi:10.1016/j.jag.2012.03.011.
  • She, B., J. C. Duque, and X. Ye. 2016. “The Network-Max-P-Regions Model.” International Journal of Geographical Information Science 31 (5): 962–981. doi:10.1080/13658816.2016.1252987.
  • Shen, J., X. Liu, and M. Chen. 2017. “Discovering Spatial and Temporal Patterns from Taxi-Based Floating Car Data: A Case Study from Nanjing.” GIScience & Remote Sensing 54 (5): 617–638. doi:10.1080/15481603.2017.1309092.
  • Shu, H., T. Pei, C. Song, X. Chen, S. Guo, Y. Liu, and C. Zhou. 2020. “L-Function of Geographical Flows.” International Journal of Geographical Information Science 35 (4): 689–716. doi:10.1080/13658816.2020.1749277.
  • Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (sup1): 234–240. doi:10.2307/143141.
  • Wang, H., H. Zhang, S. Jiang, G. Tang, X. Zhang, and L. Zhou. 2022. “City Association Pattern Discovery: A Flow Perspective by Using Cultural Semantic Similarity of Place Name.” Applied Geography 139: 139. doi:https://doi.org/10.1016/j.apgeog.2021.102629.
  • Wang, H., H. Zhang, G. Tang, L. Zhou, and S. Jiang. 2022. “Inter‐city Association Pattern Recognition by Constructing Cultural Semantic Similarity Network.” Transactions in GIS 26 (5): 2225–2243. doi:10.1111/tgis.12957.
  • Wauchope, H. S., J. P. G. Jones, J. Geldmann, B. I. Simmons, T. Amano, D. E. Blanco, and W. J. Sutherland. 2022. “Protected Areas Have a Mixed Impact on Waterbirds, but Management Helps.” Nature 605 (7908): 103–107. doi:10.1038/s41586-022-04617-0.
  • Wei, J., Z. Li, A. Lyapustin, L. Sun, Y. Peng, W. Xue, and M. Cribb. 2021. “Reconstructing 1-Km-Resolution High-Quality PM2.5 Data Records from 2000 to 2018 in China: Spatiotemporal Variations and Policy Implications.” Remote Sensing of Environment 252: 252. doi:10.1016/j.rse.2020.112136.
  • Wei, J., Z. Li, W. Xue, L. Sun, T. Fan, L. Liu, and M. Cribb. 2021. “The ChinaHighpm10 Dataset: Generation, Validation, and Spatiotemporal Variations from 2015 to 2019 Across China.” Environment International 146: 106290. doi:10.1016/j.envint.2020.106290.
  • Wei, R., S. Rey, and T. H. Grubesic. 2021. “A Probabilistic Approach to Address Data Uncertainty in Regionalization.” Geographical Analysis 54 (2): 405–426. doi:10.1111/gean.12282.
  • Wei, R., S. Rey, and E. Knaap. 2020. “Efficient Regionalization for Spatially Explicit Neighborhood Delineation.” International Journal of Geographical Information Science 35 (1): 135–151. doi:10.1080/13658816.2020.1759806.
  • Wolf, L. J. 2021. “Spatially Encouraged Spectral Clustering: A Technique for Blending Map Typologies and Regionalization.” International Journal of Geographical Information Science 35 (11): 2356–2373. doi:10.1080/13658816.2021.1934475.
  • Xi, W., S. Du, Y. -C. Wang, and X. Zhang. 2019. A Spatiotemporal Cube Model for Analyzing Satellite Image Time Series: Application to Land-Cover Mapping and Change Detection 231. Remote Sensing of Environment. doi:10.1016/j.rse.2019.111212.
  • Xie, Y., J. Zhang, Y. Xia, A. V. D. Hengel, and Q. Wu. 2022. “ClusTr: Exploring Efficient Self-Attention via Clustering for Vision Transformers.” arXiv preprint arXiv:2208.13138. doi:10.48550/arXiv.2208.13138.
  • Yuvaraj, M., A. K. Dey, V. Lyubchich, Y. R. Gel, and H. V. Poor. 2021. “Topological Clustering of Multilayer Networks.” Proceedings of the National Academy of Sciences of the United States of America 118 (21). doi:10.1073/pnas.2019994118.
  • Yu, T., W. Wang, P. Ciren, and Y. Zhu. 2016. “Assessment of Human Health Impact from Exposure to Multiple Air Pollutants in China Based on Satellite Observations.” International Journal of Applied Earth Observation and Geoinformation 52: 542–553. doi:10.1016/j.jag.2016.07.020.
  • Zhang, Y., X. Liu, M. Liu, X. Zou, Q. Zhang, and T. Peng. 2021. “Multi-Scale Spatiotemporal Change Characteristics Analysis of High-Frequency Disturbance Forest Ecosystem Based on Improved Spatiotemporal Cube Model.” Remote Sensing 13 (13). doi:10.3390/rs13132537.
  • Zhong, C., S. M. Arisona, X. Huang, M. Batty, and G. Schmitt. 2014. “Detecting the Dynamics of Urban Structure Through Spatial Network Analysis.” International Journal of Geographical Information Science 28 (11): 2178–2199. doi:10.1080/13658816.2014.914521.
  • Zhou, Y., and C. J. Matyas. 2021. “Regionalization of Precipitation Associated with Tropical Cyclones Using Spatial Metrics and Satellite Precipitation.” GIScience & Remote Sensing 58 (4): 542–561. doi:10.1080/15481603.2021.1908675.
  • Zhu, D., Y. Liu, X. Yao, and M. M. Fischer. 2021. “Spatial Regression Graph Convolutional Neural Networks: A Deep Learning Paradigm for Spatial Multivariate Distributions.” GeoInformatica. doi:10.1007/s10707-022-00461-6.
  • Zhu, A. X., G. Lu, J. Liu, C. Z. Qin, and C. Zhou. 2018. “Spatial Prediction Based on Third Law of Geography.” Annals of GIS 24 (4): 225–240. doi:10.1080/19475683.2018.1534890.