1,229
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Strength-weighted flow cluster method considering spatiotemporal contiguity to reveal interregional association patterns

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2252923 | Received 13 Feb 2023, Accepted 24 Aug 2023, Published online: 05 Sep 2023

ABSTRACT

One of the most crucial topics in spatial interaction studies is mining patterns from extensive origin-destination (OD) flow data to capture interregional associations. However, prevailing methodologies tend to disregard the importance of using the relative closeness of interregional connections as weights, treat spatial and temporal dimensions independently, or overlook the temporal dimension completely. Consequently, the identified patterns are susceptible to inaccuracies, and the precise identification of pattern occurrence time and duration, despite their fundamental importance, remains elusive. In light of these challenges, this study proposes a strategy to calculate and combine the strength of weighted spatiotemporal flows, and develops a clustering method and evaluation metrics based on this framework. Compared to alternative density-based methods, the strength-based calculation approach demonstrates a capacity to identify flow patterns characterized by relatively high interregional closeness. Thus, the identification of flow patterns expands beyond density-based approaches, encompassing strength-based considerations and a shift from absolute to relative closeness between regions. Experiments using synthetic datasets conducted in this research demonstrate the effectiveness, efficiency, and extraction accuracy of the proposed method. Furthermore, a case study using real Chinese population migration data demonstrates the efficacy of the method in revealing implicit spatiotemporal association patterns between regions. The present study implements an interaction strength-based flow clustering and evaluation method that considers spatiotemporal continuity, making it applicable to spatial flow data analysis involving interaction volume and time attributes. As a result, this method holds promise for facilitating the modeling of intricate spatial flows within various contexts of study.

1. Introduction

Geographers have exhibited a growing inclination toward investigating the spatial interaction of spatial social flow data, driven by advancements in information communication and Internet of Things technologies, which have resulted in the generation of diverse spatiotemporal flow data (Emch et al. Citation2012; Lu et al. Citation2016). The analysis of spatial flows encompassing human, vehicular, logistic, and information flows has assumed an increasingly significant role in the examination of geospatial phenomena, including regional association patterns, spatial network structures, and spatial diffusion processes (Andris, Liu, and Ferreira Citation2018; Giordano, Cole, and Le Noc Citation2022; Ye and Andris Citation2021). In the study of spatial interaction within social flows, scholars in the field of geographic information particularly emphasize the innovation and development of various methodologies for extracting comprehensive insights from flow data. Notably, one of the key research domains focuses on the identification of regional association patterns, employing the concept of clustering OD (origin-destination) flow data.

OD flow pattern forms are diverse, including clustering methods based on one or more attributes of origin, destination, and flow direction (Bogataj, Bogataj, and Drobne Citation2019; Guo et al. Citation2021); clustering methods for identifying network communities (Chen, Xu, and Xu Citation2015; Wang, Wang, and Onega Citation2021); and methods for extracting spatial interaction flow patterns between regions (Kim et al. Citation2014). Among these methods, interregional movement pattern clustering is the most distinctive, applicable for identifying regional association patterns, recognizing functional areas from a flow space perspective, and more. Currently, various interregional movement pattern clustering methods and applications have been proposed and developed.

However, these methods primarily reflect flow density, overlooking the importance of using the relative closeness of interregional connections as weights. No interregional pattern extraction method incorporates the time dimension based on weighted flows. Weighted flow patterns reflect the strong heterogeneity of interaction strength between regions rather than interaction density heterogeneity. Currently, density-based flow pattern extraction methods cannot directly extend to extracting spatial flow patterns from weighted flows or spatiotemporal flow patterns. Therefore, this study proposes a spatiotemporally contiguous clustering approach for origin-destination flows weighted by interaction strength, to address the gap in existing research.

The primary innovations and contributions of this study are: (1) We propose a weighted origin-destination flow model to measure interregional interaction strength and an identification algorithm that discovers patterns formed by regions with strong relative flow associations. This represents a new perspective for identifying meaningful flow patterns, contrasting prevailing density-based techniques that rely solely on absolute density thresholds to detect regions of high-flow frequency. (2) Our flow pattern identification method accounts for spatiotemporal continuity, identifying exact flow pattern occurrence time and duration.

In this paper, the related concepts of spatiotemporal flows weighted by interaction strength are first explained (Section 3.1), as are the challenges in extracting regional association patterns from density-based to strength-based and spatial to spatiotemporal OD flows, and the problems addressed (Section 3.2). The algorithm logic and specific implementation of interaction strength-based spatiotemporal flow pattern identification are then focused on (Section 4). The algorithm logic for identifying flow patterns is described (Section 4.1). Various characteristic statistical variables are constructed to elucidate multiple meanings of the flow pattern (Section 4.2). Flow pattern evaluation indicators from multiple perspectives are proposed (Section 4.3). Finally, the validity, accuracy, and application value of the proposed method are verified using synthetic and real data sets (Section 5).

2. Literature review

2.1. OD flow-based spatial interaction analysis algorithm

OD flow pattern extraction methods have long played a very important role in spatial interaction studies. Compared with the types of spatial features, such as points and area in GIS, OD flow data characteristics are richer and more complex in structure, so their pattern extraction methods are more difficult to implement and more diverse in form (Andrienko et al. Citation2017; Andris, Liu, and Ferreira Citation2018). Generally speaking, the method of OD flow pattern extraction, based on the target object of clustering, can be divided into three ways based on the origin or destination of OD flows, based on OD flow units, and based on networks composed of OD flows.

  1. The clustering method based on OD flow origins and destinations primarily considers origins and destinations separately. One approach clusters origins and destinations separately, then analyzes the association between resulting origin and destination clusters (Pei et al. Citation2009, Citation2015; Wan et al. Citation2012). Another approach clusters origins and destinations simultaneously, effectively identifying multiple cluster types formed by combining origin and destination regions of varying densities (Luo, Cats, and van Lint Citation2017; Randriamanamihaga et al. Citation2014). However, such methods disrupt the overall OD flow structure and weaken or ignore relationships between origins and destinations of individual flows.

  2. The clustering method based on OD flow units treats each OD flow unit as a whole, identifying direction-based patterns such as convergence, diffusion, and co-direction (Guo et al. Citation2020; Van Nuffel Citation2007). These methods also group flows with similar directions, origins, and destinations into flow clusters. Aggregating and visualizing raw OD flow units using map generalization techniques provides another approach to flow analysis (Graser et al. Citation2019; Koylu, Tian, and Windsor Citation2023). However, some methods ignore location, time, and other attributes inherent to flows during clustering, although they treat each OD flow unit as a whole. These should not be considered spatial clustering methods for OD flows (Nie, et al. Citation2015, Zhang, et al. Citation2016).

  3. The network analysis method based on OD flows treats all flow units as a single analytical object to construct a spatial complex network. Non-spatial or spatial clustering of the flow network is then achieved using association partitioning methods without or with spatial constraints (Crivellari, et al. Citation2022, Gao et al. Citation2013; Xu, Santi, and Ratti Citation2022). This divides the flow network’s nodes into multiple categories. These methods effectively explain the structure of regions (Louail et al. Citation2015). However, the diverse associations between regions are challenging to ascertain.

2.2. Interregional flow pattern in flow pattern mining algorithm family

In the last decade, identifying flow patterns to cluster interregional OD flows from large OD datasets has gained attention. These methods analyze raw flows using clustering, optimization, statistics, and other algorithms or map synthesis, yielding origin and destination regions of arbitrary shapes with directional interrelationships. Kim et al. (Citation2014) pioneered this clustering method, termed MZP (Kim et al. Citation2014). Chen et al. proposed an improved algorithm, MPFZ, to cluster subway, taxi, and other flow data and extract interregional movement patterns (Chen et al. Citation2022; Liu et al. Citation2022). Subsequently, interregional movement pattern algorithms based on intelligent optimization and probabilistic methods were developed and applied to analyzing residential mobility patterns in cities.

Based on probability calculation and clustering, Zhou et al. completed two studies. First, they introduced road network constraints into the mining model to identify interregional movement patterns (Zhou et al. Citation2019). Second, they developed a flow pattern identification method that accounts for variable OD flow densities (Zhou et al. Citation2019). Yao et al. (Citation2018) added road network and K-function constraints to recognize interregional movement patterns subject to road network and proximity constraints. Song and Liu proposed an interregional flow pattern mining approach based on intelligent optimization and incorporating shared nearest neighbor (Liu et al. Citation2022 Song et al. Citation2019). These methods differ in their implementation but are similar in form. They extract interregional movement patterns from unweighted OD flow sets.

2.3. Flow pattern mining algorithm with introduction of weight or time

Characteristics other than OD flow origins, destinations, and directions, such as flow weights, temporal dimensions, and other attributes, have gained attention and been incorporated into interregional interaction pattern mining. Regarding flow pattern extraction considering weight, Zhang et al. introduced OD flow weight so that results reflected flow strength rather than density (Zhang et al. Citation2018). However, their flow unit merging method was flawed, compromising result accuracy. Tao and Thill (Citation2016) used spatial statistical methods to construct an empirical spatial flow weight matrix, identifying anomalous interaction regions as clusters of very high or low flow values (Tao and Thill Citation2016). Although still relying on flow unit counts, their key contribution was achieving significant flow patterns through spatial statistics. For flow pattern mining considering time, time was introduced as a flow attribute, and spatiotemporal flow pattern extraction methods reflecting flow density over space and time were proposed (Zhou et al. Citation2019; Yao et al. Citation2018). Other approaches include origin-destination-time (ODT) matrices (Andris, et al. Citation2018), time series-based origin-destination flow prediction (Hasanpour Jesri and Shirazi Citation2022), areas of interest over time (Zhang, Liu, and Wang Citation2019), etc. However, these methods ignored flow locations and were not spatial flow pattern extraction methods. Extracting interregional flow patterns weighted by or over time remains limited.

3. Related concepts and problem definition

3.1. Related concepts

Definition 1 (Weighted spatiotemporal OD flow): A weighted spatiotemporal origin-destination flow unit fi=fio,tio,fid,tid,fiw contains five basic attributes, where i represents the ith flow unit, and fi_o and ti_o represent the origin and the corresponding starting time of the flow unit. Similarly, fi_d and ti_d represent the destination and the corresponding reaching time of the flow unit. fi_w represents the weight of the flow unit, that is, the number of people who depart from fi_o at time ti_o and reach fi_d at time ti_d. As shown in , these are three examples of weighted spatiotemporal flow units. The weights of OD flows can represent different meanings, such as interaction volumes or interaction strengths. By default, the weight refers to the interaction volume unless otherwise specified.

Based on the above definitions, OD flows with ti_o and ti_d as None are called spatial OD flows. Flow units with fi_w=None are called unweighted spatiotemporal OD flows, as shown in . Unweighted OD flows do not indicate OD flows with a weight of 1 but instead treat those with weights greater than or equal to 1 as equivalent. In other words, their weights are all set to none.

Figure 1. The basic form of weighted and unweighted spatiotemporal OD flow. (a) weighted OD flows and (b) unweighted OD flows.

Figure 1. The basic form of weighted and unweighted spatiotemporal OD flow. (a) weighted OD flows and (b) unweighted OD flows.

The spatial distance between flow units can be defined in various ways. The spatial distance defined here aims to determine the spatial proximity relationship between polygon elements and whether flow units are spatial neighbors. For polygon elements, their spatial proximity relationship can be directly determined based on topological relationships or other means, such as k-nearest neighbor and distance-based affected regions. To clarify, the spatial proximity between two flow units is illustrated through a simple example. In this study, the principle that two polygon elements sharing a border are spatial neighbors is adopted. Thus, because the origins f1_o andf2_o are spatial neighbors, and the destinations f1_d andf2_d are also spatial neighbors, the flow units f1 and f2 are spatial neighbors, as shown in . Similarly, flow units f3 and f4, f5 and f6 are spatial neighbors. As shown in , the origins and destinations of any two flow units are not proximal because f 1_d and f 2_d,f 3_o and f 4_o,f 3_d and f 4_d, and f 5_oandf 6_o are not spatial neighbors. Therefore, in this study, we define the spatial distance between OD flows as follows:

Figure 2. Spatial distance measurement between flow units. (a) flow unit pairs that satisfy the spatial adjacency relationship, (b) flow unit pairs that do not satisfy the spatial adjacency relationship.

Figure 2. Spatial distance measurement between flow units. (a) flow unit pairs that satisfy the spatial adjacency relationship, (b) flow unit pairs that do not satisfy the spatial adjacency relationship.

Definition 2

(Spatial distance between OD flows) For any two OD flows fi and fj, the spatial distance between them can be defined as:

(1) SD(fi,fj)=max(dist(fi_o,fj_o),dist(fi_d,fj_d)),(1)

where dist(fi_,fj_) represents the spatial distance between fj_ and fi_. When the origins of fi_o and fj_o have a shared border, dist(fi_o,fj_o)=0; otherwise, dist(fi_o,fj_o)=. For two flow units fi and fj, SDfi,fj=0only when dist(fi_o,fj_o)=0 and dist(fi_d,fj_d)=0; otherwise, SDfi,fj=. In practical applications, other principles for spatial proximity, such as k-nearest neighbor may be used. However, we only introduce one common principle in this study.

Definition 3

(Temporal distance between OD flow) Similar to the definition of spatial distance, the temporal distance (time interval) between any two OD flows fi and fj can be defined as

(2) TDfi,fj=maxdist(ti_o,tj_o,dist(ti_d,tj_d))(2)

where dist(ti_o,tj_o)=tiotj_o and dist(ti_d,tj_d)=ti_dtj_d. for flow units fi and fj, TDfi,fj=0only when dist(ti_o,tj_o)<τ and dis(ti_d,tj_d)<τ; otherwise, TDfi,fj=. Here τ is the threshold value of temporal distance.

The spatial distance and time interval between any two OD flows and the spatiotemporal distance of any two flow units can be determined when the principle of the above definition is followed.

Definition 4

(Spatiotemporal proximity between OD flows) When the threshold value τ of time interval are given, and if the spatial proximity of the origins and destinations of flow units is defined on the basis of a shared border, then any two flow units fi and fj are spatiotemporal proximate, only if they meet the two conditions:SDfi,fj=0 and TDfi,fj=0.

The two weighted flow units meet the spatial “colocation” and temporal “synchronicity,” i.e. spatiotemporal proximity in the context of this study, when they meet the spatiotemporal proximity defined in section 3.3.2. The merging models proposed by Kim and Zhang, the former oriented to density and the latter oriented to spatial flow rather than spatiotemporal flow, and the merging accuracy is low, so they cannot be applied to the strength-weighted spatiotemporal flow in this paper (Hasanpour Jesri and Shirazi Citation2022; Kim et al. Citation2014; Zhang et al. Citation2018).

Definition 5

(Spatiotemporal neighborhood of OD flow) For a certain flow unit fi, the set of flow units with a spatiotemporal proximity relation to this flow unit can be defined as:

(3) NFfi={fj|fjF,SDfi,fj=0andTDfi,fj=0}(3)

All flow units in NFfi belong to the neighbors of fi.

Definition 6

(Strength-weighted OD flow) OD flow strength is used to describe the closeness of the association between the origin and destination of a flow. Strength is related not only to the interaction volume of the OD flow but also to other flows leaving the origin fi_o at time ti_o and all other flows arriving at the destination fi_d at time ti_d. It is calculated as follows:

(4) STfi=fi_wOi_wfi_wDi_w=fi_w2Oi_wDi_w(4)
(5) Oi_w=fxFfx_w,fx_o=fi_oandtx_o=ti_o(5)
(6) Di_w=fxFfx_w,fx_d=fi_dandtx_d=ti_d(6)

The flow density between origin and destination regions reflects the absolute closeness of their association. For example, in the flows shown in ), the number of flow units from region 1 to 2 is 200, while that from region 3 to 4 is 500, indicating the latter density is greater than the former and the association between regions 3 and 4 is closer. However, in ), if the flow volume between origin and destination regions is regarded as the weight of a single flow unit, the flow strength of f1 and f2 can be calculated using EquationEquation (4). It can be found that in , the weighted flow f1 is far more important for the origin region 1 (larger weight) than other flows from this origin region, accounting for f1_w/O1_w=0.7143 of the importance. Its importance for the destination Region 2 is also far greater than that of other inflows, accounting for f1_w/D1_w=0.6897. Similarly, the weighted flows f2 are f2_w/O2_w=0.4762 and f2_w/D2_w=0.5882 important for their origin and destination, respectively. Finally, the strengths of f1 and f2 are calculated as STf1=0.4926 and STf2=0.2801, respectively. Although the density (absolute closeness) of f2 is far greater than that of f1, the flow strength (relative closeness) of f1 is greater than f2, indicating the closeness between the former’s origin and destination regions is closer – the strongest closeness in the local region where this flow unit is located.

Figure 3. The density and strength of OD flows respectively characterize the absolute and relative closeness of association between their origin and destination. (a) Smaller number of unweighted flows between regions 1 and 2, (b) larger number of unweighted flows between regions 3 and 4, (c) weighted OD flow between regions 1 and 2 shows low interaction volume but high interaction strength, (d) weighted OD flow between regions 3 and 4 shows high interaction volume but low interaction strength.

Figure 3. The density and strength of OD flows respectively characterize the absolute and relative closeness of association between their origin and destination. (a) Smaller number of unweighted flows between regions 1 and 2, (b) larger number of unweighted flows between regions 3 and 4, (c) weighted OD flow between regions 1 and 2 shows low interaction volume but high interaction strength, (d) weighted OD flow between regions 3 and 4 shows high interaction volume but low interaction strength.

Definition 7

(Strength-reachability of flow unit pairs) Any flow unit fj in NFfi can be a part of a certain WST-FP only when flow strength Pi_j between fj and fi reaches the threshold value δ defined by the user. Using the calculation of flow strength between fi and fj as an example. By extending existing density-based flow pair merging methods (Kim et al. Citation2014; Zhang et al. Citation2018), we propose an strength-based flow pair merging method. The calculation formula for its flow strength is expressed as follows:

(7) Pi,j=(fi_w+fj_w)2TWti_o,tj_ofi_of_d+TWti_o,tj_ofj_of_dTWti_d,tj_df_ofi_d+TWti_d,tj_df_ofj_d(7)
(8) TWti_o,tj_o(fi_of_d)=min(ti_d,tj_d)tk_dmax(ti_d,tj_d)fk_d=fi_dFfk_w(8)
(9) TWti_d,tj_d(f_ofi_d)=min(ti_o,tj_o)tk_omax(ti_o,tj_o)fk_o=fi_oFfk_w(9)

Where S is the flow unit dataset, fi_w+fj_w represents the sum of the interaction volumes of fi and fj. TWti_o,tj_ofi_of_d represents the total interaction volumes of the flows which from the origin (fi_o) and the start time should be limited to the time period [minti_o,tj_o, maxti_o,tj_o]. Similarly, TWti_d,tj_df_ofi_d represents the total interaction volumes of the flows whose destination is fi_d and the reaching time should be limited to the time period [minti_d,tj_d, maxti_d,tj_d]. If δ is used to indicate the user-defined strength threshold, for flow units fi and fj, when Pi,j>δ, fi and fj can be considered as strength-reachable.

3.2. Problem definition

This paper aims to address two problems, with the relationship between them being that solving the first problem provides the basis for the second one. Existing interregional association patterns reveal the absolute closeness between origin and destination regions through the density of OD flows but fail to capture potential association patterns with relatively high closeness between origin and destination regions despite relatively low absolute closeness. Therefore, the first problem to be solved can be defined as:

(1) How to identify interregional association patterns with relatively high closeness from a set of volume-weighted flows?

As shown in , to illustrate Problem 1 specifically, each region group is numbered as RG-1 to RG-4. In , there are many low volume-weighted flow units between RG-1 and RG-2. However, RG-1 has more high volume-weighted flow units flowing to other regions, so the proportion of flows from RG-1 to RG-2 is small. Since more high volume-weighted flow units from other regions flow into the destination region RG-2, the proportion of flows from RG-1 to RG-2 is small. This results in a small flow strength from RG-1 to RG-2. In contrast, most flow units between RG-3 and RG-4 have a large flow strength, and these flow units are adjacent to each other, resulting in a large flow strength between RG-3 and RG-4. This forms an interregional association pattern, as shown in . One of the objectives of this paper is to identify such interregional association patterns with a relatively high degree of closeness through algorithms.

Figure 4. Interregional association patterns with a relatively high degree of closeness. (a) visualization of flow dataset, (b) flow pattern with relatively high degree of closeness.

Figure 4. Interregional association patterns with a relatively high degree of closeness. (a) visualization of flow dataset, (b) flow pattern with relatively high degree of closeness.

Spatiotemporal OD flows contain not only spatial position information but also temporal information. Space and time are inseparable and should be viewed as an integral whole. Interregional association patterns extracted from spatiotemporal flows should also have spatiotemporal attributes, that is, it is necessary to obtain not only the set of flow units and the sets of origin and destination regions included in the interregional association pattern, but also the duration of the origin and destination region sets of the flow pattern. Achieving this goal constitutes the second major challenge of this paper, namely,

(2) how to capture interregional association patterns with arbitrary spatial aggregation shapes and arbitrary durations by viewing the spatial and temporal dimensions of flow units as an integral whole?

To further clarify Problem 2 above, provides a specific explanation. We first set aside the complexity of mining interregional association patterns from massive OD flow data and its challenges. Instead, we use a simple example to illustrate the complexity and importance of considering both the spatial and temporal proximity of flow units, as well as the serious drawbacks of existing simple and brutal treatments.

Figure 5. Spatiotemporal flow pattern. (a) traditional flow patterns with predefined spatial and temporal constraints, (b) flow patterns considering spatiotemporal continuity.

Figure 5. Spatiotemporal flow pattern. (a) traditional flow patterns with predefined spatial and temporal constraints, (b) flow patterns considering spatiotemporal continuity.

As shown in , Layer 1 contains multiple flow units, and the entire region is divided into two classes of areal units (fine-grained and coarse-grained regions). If a flow pattern needs identification, the first task is to determine the origin and destination regions and provide a time range constraint. Since the origin-destination regions and hourly time intervals are predefined, this method has limitations in identifying more precise spatial regions and temporal periods of flow patterns. For example, FP1 is only known to occur between 6:00 and 7:00, although it may actually occur from 6:10 to 6:30. Similarly, the exact timing of FP2 within the 8:00 to 9:00 period is uncertain. Using other predefined time ranges cannot overcome this limitation, whether the ranges are small (e.g. 30 minutes) or large (e.g. 1 day). In summary, predefining spatial regions and temporal intervals inherently limits the precision in detecting origin-destination locations and timing of flow patterns.

The example in specifically exposes the problems in the spatiotemporal flow patterns mined through predefined spatial regions and time periods. Layer 1 in is the same as Layer 1 in and contains the same set of flow units partitioned into many subregions. In the real FP1 and FP2 cases, the spatial extent may be like Layer 2 in , and the time period may be like Layer 3 in . Obviously, the origin or destination regions of a flow pattern may be smaller or larger than in . The duration may be a subperiod of time or may span the current hourly period.

This study proposes a spatiotemporally continuous clustering method to quickly and accurately detect strength-weighted spatiotemporal flow patterns (WST-FP) between OD regions of arbitrary shape over flexible time periods, without predetermining specific regions or periods.

The modifiable areal unit problem (MAUP) persists, as cluster results vary based on spatial unit selection. The introduced time dimension brings a modifiable temporal unit problem (MTUP). Different time intervals affect analysis outcomes. To mitigate, spatial units and time intervals should align with data traits and analysis goals. For example, in urban settings, flow cluster areas bounded by roads suit functionally homogeneous blocks better than grid cells. For time intervals, analysis objectives should guide selection. A 10-minute interval may sufficiently capture 24-hour taxi passenger flow aggregation patterns without excess sparsity or generalization. In summary, thoughtful spatial and temporal unit selection, tailored to the problem context, helps address MAUP and MTUP limitations. Concrete analysis of specific problems is needed, as real situations are complex.

4. Methodology

4.1. Algorithm description

The mining process of all WST-FPs from massive flow units is briefly introduced to improve our understanding. As shown in , the flow unit with the maximum strength f1f1_o=a,f1_d=b is first selected as the seed flow unit from among all the flow units, and mark it as visited. Then, all the adjacent polygon elements of the origin f1_o and destination f1_d of flow unit f1 are found. As shown in , c is an adjacent polygon element of the origin a(f1_o), and d is an adjacent polygon element of the destination bf1_d. No flow unit exists between several neighbors of origin and destination of f1. Thus, such polygon elements should be eliminated from the neighbor set, and the set of flow units that consists of adjacent polygon elements of the origin a(f1_o) and adjacent polygon elements of the destination bf1d are obtained. f1 and other flow units in meet the accessible conditions of the spatial distance defined in section 3.1.

Figure 6. Recognition of strength-weighted spatiotemporal flow patterns from massive flow units: (a) select a seed flow unit f1, (b) find the adjacent regions of the seed flow unit f1, (c)-(d) retain the adjacent regions that contain the flow unit, (e) calculate whether each adjacent flow unit can be merged with the seed unit, (f) collections of flow units that can be merged and filtered, (g) randomly select a flow unit from the set of flow units that can be merged as the next seed unit and repeat steps (b)–(f), (g) the final extracted two WST-FPs.

Figure 6. Recognition of strength-weighted spatiotemporal flow patterns from massive flow units: (a) select a seed flow unit f1, (b) find the adjacent regions of the seed flow unit f1, (c)-(d) retain the adjacent regions that contain the flow unit, (e) calculate whether each adjacent flow unit can be merged with the seed unit, (f) collections of flow units that can be merged and filtered, (g) randomly select a flow unit from the set of flow units that can be merged as the next seed unit and repeat steps (b)–(f), (g) the final extracted two WST-FPs.

Then, each flow in the flow unit set above are checked, and only the flow units whose time distance from f1 meets the threshold τ are retained, which are called NFf1, as shown in .

Subsequently, flow unit f1 and other flow units that exhibit a spatiotemporal proximity relationship with f1 are combined, and flow strength reachability of each combination are determined on the basis of the rules defined in section 3.1, as shown in . If the set of flow units of a new WST-FP is marked as FP, then f1 and all other flow units that meet the accessible spatiotemporal distance and flow strength with f1 are placed in set FP as flow unit members of this pattern.

Lastly, any flow unit that meets the spatiotemporal proximity and flow reachability is selected as the next seed flow unit and marked as visited, and the process shown in is repeated. As an example, f2f2_o=c,f2_d=d is used as the seed flow unit and marked as visited. The process in is entered, which returns to the process similar to that in . The iteration is continued until each flow unit in set WST-FP is marked as visited. Then, the flow units in FP jointly constitute a new WST-FP, as shown in , which is the schematic of the two obtained WST-FP.

The pseudo-code for strength-weighted spatiotemporal flow pattern as shown in , and the variables in the pseudo-code are consistent with this study.

Figure 7. Pseudo-code for strength-weighted spatiotemporal flow pattern mining algorithm.

Figure 7. Pseudo-code for strength-weighted spatiotemporal flow pattern mining algorithm.

4.2. Characteristic variables of flow pattern

A complete WST-FP contains at least two flow units (inclusive), and the origin or destination of the flow pattern consists of at least two proximal regions. An WST-FP contains many basic attribute variables, which are crucial for measuring the pattern and calculating various inspection quantities. An WST-FP can be represented as WST-FPi ={f1,f2,,fn}=OD, where n represents the num of flows in WST-FP, O and D represent the spatiotemporal attributes of origin and destination region groups, respectively. Then, for any flow unit fifiWSTFPi, at least one flow unit fj fjWSTFPi satisfies the spatiotemporal proximity and the threshold condition (Pi,j>δ). The characteristics of the two levels of space and time are analyzed ().

Figure 8. Spatial proximity under different rules.

Figure 8. Spatial proximity under different rules.

At the spatial level, the set of origin regions of WST-FPi can be represented as O_R=f1_o,f2_o,f3_o,f4_o, and the set of destination regions can be represented as D_R=f1_d,f2_d,f3_d,f4_d. Then, O_R and D_R jointly constitute the basic spatial characteristic quantities of the WST-FPi.

At the temporal level, the origin region of each flow unit corresponds to a starting moment ti_o, and the destination of each flow unit corresponds to a reaching moment ti_d. The time period of origin regions of the WST-FPi can be expressed asO_T=O_T1,O_T2, where O_T1=mint1_o,t2_o,t3_o,t4_o represents the earliest starting moment existing among all starting moments in the origin regions and O_T2=maxt1_o,t2_o,t3_o,t4_o represents the latest starting moment. Meanwhile, the time period of destination regions can be expressed asD_T=D_T1,D_T2, where D_T1=mint1_d,t2_d,t3_d,t4_d represents the earliest reaching moment among all reaching moments in the destination regions andD_T2=maxt1_d,t2_d,t3_d,t4_d represents the latest reaching moment. The duration of the origin regions of the WST-FPi is Δto=O_T2O_T1, and that of its destination regions is Δtd=DT_2DT_1. The duration of the entire WST-FPi is Δtod=D_T2O_T1. O_T1, O_T2, D_T1, and D_T2 and Δto, Δtd, and Δtod jointly constitute the basic temporal characteristic quantities of the WST-FPi.

4.3. Statistical metrics of result evaluation

The metrics of coverage, closeness and composite of OD flow patterns between regions were originally first proposed by Kim et al. for the evaluation of clustering results of flow density (Kim et al. Citation2014). Later, it was applied to the evaluation of clustering results of strength-weighted flow by Zhang et al. (Citation2018). In this paper, it is further refined and extended to make it applicable to the evaluation of clustering results of spatiotemporally strength-weighted flow patterns.

(1) Coverage rate

Coverage rate refers to the ratio of the sum of interaction volumes in the flow WSTFPi to the sum of the interaction volumes of all flow units with the same starting time period or reaching time period of the WSTFPi in the calculation analysis, which is used to reflect the degree of importance of the flow value of a target flow pattern in the entire flow data within the specified time period. The coverage rate formula for WSTFPi can be represented as:

(10) vWSTFPi=PrOiDi=fiOiDifi_wM(10)
(11) M=Oi_T1<tjo<oi_t2ordi_t1<tj_d<di_tsfj_w(11)

where PrOiDi is the probability that a movement from the origin regions Oi_R in time period [Oi_T1,Oi_T2] to the destination regions Di_R in time period Di_T1,Di_T2 is observed in an flow unit set whose start time in time period [Oi_T1,Oi_T2] or reaching time in time period Di_T1,Di_T2. fiOiDifi_w represents the total interaction volumes of the flow units in FPi. M represents the total interaction volumes of the flow units whose start time in time period [Oi_T1,Oi_T2] or reaching time in time period Di_T1,Di_T2.

(2) Closeness rate

The set of origin regions of any WSTFPi is Oi_R, and the total flow from Oi_R is represented by |Oi_R|. The set of destination regions is Di_R, and the total flow to the destination regions is represented by |Di_R|. For any WSTFPi, the s-value is used to represent the interaction closeness of this flow pattern. In addition, the s-value is used to reflect the strength of correlation between the origin and destination regions in a flow pattern. The calculation formula is

(12) sWSTFPi=PrODi|OiDPrODi=PrOiD|ODiPrOiD=Pr(OiDi)PrOSiDPrODi=MfiOiDifi_wOiDi(12)
(13) |Oi|=fj_oOi_Randtj_oOi_TFfj_w(13)
(14) |Di|=fj_dDi_Randtj_dDi_TFfj_w(14)

where Oi represents the total interaction volumes of flow units whose origin region belongs to Oi R and starting time in time period Oi_T. Di represents the total interaction volumes of flow units whose destination region belongs to Di R and reaching time in time period Di_T. OiDi represents the product of total flow from the origin regions and total flow to the destination regions. The greater the sWSTFPi value, the stronger the correlation between the origin and destination regions of this flow pattern; otherwise, its correlation is weaker.

(3) Composite value

Coverage rate reflects the scope of the pattern from the flow itself, and closeness reflects the strength of the pattern through the correlation between the origin Oi and destination Di of the flow pattern. The two indexes evaluate the strength of the WST-FP from a partial perspective and are limited to a certain degree. In this study, the composite value of coverage rate and accuracy is adopted to comprehensively reflect the strength of a pattern. The specific formula is

(15) cWSTFPi=vWSTFPisWSTFPi=fiOiDifi_wMMfiOiDifi_wOiDi=fiOiDifi_wOiDi(15)

5. Experiments

5.1. Test with synthetic data

To verify the effectiveness of the WST-FPs mining method proposed in this paper, two synthetic datasets are designed. One of the datasets is simple the other is complex (). shows a set of volume-weighted spatiotemporal OD flow units, and shows the basic attributes of all OD flow units corresponding to . There are a total of 9 OD flow units, respectively, named as f1, f2, … , f9. shows the spatiotemporal distribution of all OD flows, while the study area and basic areal unit with its number code are shown in . The number code of the study unit corresponding to the origin and destination of each OD flow corresponds to the O_ID and D_ID fields in . O_DATE and D_DATE indicate the occurrence time of each flow in the origin and destination units. The VAL field is the interaction volume of flow. In addition, comparative experiments between the proposed method in this paper and existing density-based spatial flow and spatiotemporal flow clustering methods are also provided, as shown in .

Figure 9. Mining flow patterns from a small set of volume-weighted spatiotemporal flow units: (a) a set of flow units, (b) whether flow units are contained in flow patterns or not, (c) experimental area, and (d) two mined flow patterns.

Figure 9. Mining flow patterns from a small set of volume-weighted spatiotemporal flow units: (a) a set of flow units, (b) whether flow units are contained in flow patterns or not, (c) experimental area, and (d) two mined flow patterns.

Figure 10. Massive synthetic spatiotemporal OD flow data and labeled flow patterns. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) four mined OD flow patterns.

Figure 10. Massive synthetic spatiotemporal OD flow data and labeled flow patterns. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) four mined OD flow patterns.

Figure 11. Comparative experiment 1. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) two-dimension map of OD flows, (d) result of SpatialflowL, (e) result of flow ST-DBSCAN, (f) result of WST-FP.

Figure 11. Comparative experiment 1. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) two-dimension map of OD flows, (d) result of SpatialflowL, (e) result of flow ST-DBSCAN, (f) result of WST-FP.

Figure 12. Comparative experiment 2. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) two-dimension map of OD flows. (d)result of SpatialflowL, (e) result of flow ST-DBSCAN, (f) result of WST-FP.

Figure 12. Comparative experiment 2. (a) volume-weighted spatiotemporal OD flow data, (b) labeled spatiotemporal OD flow, (c) two-dimension map of OD flows. (d)result of SpatialflowL, (e) result of flow ST-DBSCAN, (f) result of WST-FP.

Table 1. Attributes of flow units.

5.1.1. Small-scale synthetic dataset

If the shared edge or corner is used here as a spatial proximity rule, and half an hour as the time interval, we can see from that {f1, f2, f3, f4} is detected as a flow pattern, denoted as WST-FP1. {f5, f6, f7} is detected as another flow pattern, denoted as WST-FP2. In , f8 is spatially adjacent to the flow unit in WST-FP1 but not temporally adjacent. F9 is neither adjacent in space nor time to any pattern. Therefore, f8 and f9 are neither contained in WST-FP1 nor WST-FP2. Although other OD flow units meet the proximity rule in space and time, the interaction volumes of some may have a small contribution to the regions where the pattern is located. At this time, these OD flow units also cannot be regarded as part of the pattern. In the end, we expect to obtain a flow pattern result as shown in .

In this experiment, the time interval is set to 30 min, the merging threshold of interaction strength is set to 0.6, and the spatial proximity relationship adopts the shared edge or corner rule. The analysis result is consistent with the expected result. The resulting parameters of WST-FP1 and WST-FP2 are shown in . Other time interval thresholds and thresholds of interaction strength can also be used for pattern detection, which will influence the results.

Table 2. Result parameters of flow pattern based on a small-scale dataset (time period unit: minute).

5.1.2. Large-scale synthetic dataset

To further verify the algorithm proposed in this paper, a spatiotemporal network with 40 rows, 30 columns, and time periods from 12:00 to 13:40 is designed. A large-scale set of OD flow units containing random values of interaction volume is generated, as shown in . This dataset also contains some labeled OD flow units that constitute the preset flow patterns, as shown in . The proposed algorithm is used to discover spatiotemporal flow patterns from this dataset. The merge threshold is set to 0.0009, and the time step is set to 10 min. The analysis results are shown in . The mining results are similar to the preset flow patterns. Partially inconsistent grids are affected by the time step and merging threshold. This experiment proves the effectiveness of the algorithm constructed in this study in the flow pattern mining of large-scale volume-weighted spatiotemporal OD flows.

The evaluation parameters of the flow pattern are key to understanding and interpreting the characteristics of each pattern. shows a list of the evaluation parameters of the four patterns included in the dataset in . The indicator v-rate explains the degree of coverage of the individual flow patterns. The pattern results show WST-FP2 accounts for the largest proportion of the total interaction volume in the entire study area during the model time period. WST-FP 3 accounts for the smallest proportion. The weight is the total number of OD flows between the two grids in this case. The s-rate reflects the closeness of the association between the origin region and the destination region of the flow pattern. Among the four patterns, the order of degree of closeness from strong to weak is WST-FP 4, WST-FP 2, WST-FP 1, and WST-FP 3.

Table 3. Result parameters of flow pattern based on a large-scale datasets (time period unit: minute).

According to the principle of Section 4.2 about flow pattern result evaluation parameters, for the same flow pattern, the v-rate and s-rate values tend to be inversely correlated. Since the c-value is the square root of the product of v-rate and s-rate, it characterizes the balance between coverage and closeness of a single flow pattern. Here, WST-FP 4 has the largest balance value, while WST-FP 3 has the smallest. O_Duration and D_Duration, O_Count and D_Count represent the count of areal units and duration in the origin and destination regions of each flow pattern, respectively.

5.1.3. Comparison with two state-of-the-art clustering methods

To validate the efficacy of the proposed algorithm WST-FP, we compare it with two state-of-the-art clustering methods, specifically the spatial flow L-function (SpatialflowL) based approach and the flow ST-DBSCAN (Birant and Kut Citation2007; Rus et al. Citation2022) based method. Whereas SpatialflowL (Shu et al. Citation2021) is solely able to identify interregional association patterns in space, the flow ST-DBSCAN algorithm identifies interregional associations concurrently in both the temporal and spatial dimensions. Therefore, the latter approach is capable of detecting spatiotemporal interregional association patterns.

illustrates the visualization of the first comparative dataset, where the darker color of an OD flow indicates greater interaction volume. As evidenced in , this dataset also encompasses certain labeled OD flows that constitute predetermined flow patterns. contains two clusters in total, with Cluster 1 exhibiting a higher density than Cluster 2. presents the visualization of the flow patterns in in a two-dimensional view. By integrating ), it can be observed that the two clusters overlap spatially but segregate temporally.

Given that SpatialflowL cannot incorporate the temporal dimension, two clusters that segregate temporally yet locate closely in space and certain noise flows in spatial proximity are identified as a cluster, as illustrated in . Although the ST-DBSCAN algorithm can concurrently discern the temporal and spatial dimensions, merely Cluster 1 of higher density is detected in the present experiment due to the subjectivity in threshold estimation, as evidenced in . delineates the outcomes extracted by the WST-FP algorithm proposed in this paper. Since the interaction volumes of OD flows in Cluster 1 and Cluster 2 are substantial and the interaction volumes of flow units in Cluster 1 and Cluster 2 constitute a large proportion of the total volumes of all other OD flows departing from the origin region or ending in the destination region, their strengths can be inferred to be strong according to Definition 6. Therefore, the WST-FP algorithm is capable of identifying both clusters simultaneously.

Similar to the above, demonstrates the visualization of the second comparative dataset, where a darker color indicates greater interaction volume of a flow. As evidenced in and (c), although Cluster 1 exhibits higher density, numerous flows other than those in Cluster 1 depart from the origin region of Cluster 1, and many flows other than Cluster 1 end in the destination region of Cluster 1. This results in Cluster 1 having high density yet low strength. In contrast, Cluster 2 has high strength since only a small number of other flows depart from the origin region of Cluster 2 to other regions and only a few flows from other regions end in the destination region of Cluster 2.

Given that SpatialflowL cannot incorporate the temporal dimension, Cluster 1 and Cluster 2 identified contain some noise flows in close spatial proximity, as illustrated in . In the present experiment, the ST-DBSCAN algorithm also detects only Cluster 1 of higher density due to the subjectivity in threshold estimation, as shown in . The WST-FP algorithm primarily takes into account the relative strength of association between origins and destinations. Therefore, the WST-FP algorithm is able to identify Cluster 2, which exhibits lower density yet higher strength, as evidenced in .

evaluates the patterns identified by the three methods in . It can be seen that the patterns identified by SpatialflowL have the highest v-rate values compared to flow ST-DBSCAN and WST-FP algorithms. This is because SpatialflowL only considers the spatial continuity between flows, thus including flows that are not temporally continuity. On the other hand, the s-rate not only considers the volume between the origin and destination regions, but also the total volume of the origin and destination regions. This is highly similar to the notion of flow stength, and WST-FP identifies patterns based on flow stength. Therefore, compared to SpatialflowL and flow ST-DBSCAN, the patterns identified by WST-FP have the highest s-rate values. The c-value comprehensively considers both v-rate and s-rate. From , it can be seen that the patterns we identified have the maximum c-value.

Table 4. Evaluation results of three methods in .

5.2. Test with migration flow data in real world

5.2.1. Study area and data description

In this study, the Chinese mainland is selected as the study area, as shown in . There are approximately 300 prefecture-level cities, which are tertiary administrative regions. The data used are the flow data of people traveling by plane within China every day. The flow data are obtained through statistics, with prefecture-level cities of tertiary administrative divisions as basic spatial units. The data are obtained from Tencent Open Platform of Location Big Data. The daily human mobility data throughout 2018 are used, i.e. the flow unit data shown by OD lines in are the visualized results of all statistics in one day of 2018, and approximately 2.5 million OD flow data were obtained throughout 2018. Each record contains the occurrence date, origin, destination, and interaction volume of the flow unit. Prefecture-level cities are the units of origins and destinations. Statistics from the China Aviation Administration Company indicate the country’s total passenger flow by air in 2018 was 610 million person-times, including outbound and inbound tourists. That is, the data accounted for 27.87% of the total. is cited from Zhang et al. (Citation2018) and the same dataset as in Zhang’s paper is used to validate the algorithm in this paper.

Figure 13. Study area and flow data visualization during one day (each flow unit contains origin city, destination city, and passenger count).

Figure 13. Study area and flow data visualization during one day (each flow unit contains origin city, destination city, and passenger count).

5.2.2. Result and evaluation

A topological proximal rule was also used to model the spatial relationship between regions in a real-world dataset. Two regions were marked as spatial neighbors when they had a shared edge or corner. In this case, the duration threshold is set to 2 days, that is, flows were regarded as occurring in proximal time when they happened 2 days earlier or later than the current time. The flow merging threshold was set to 0.009. Approximately 50 flow patterns were found under the threshold constraints, and 21 of these patterns are visualized on the map, as shown in . Here, for comparative analysis, flow patterns with closer distances between origin and destination regions are placed in , and farther ones in .

Figure 14. Results of strength-weighted spatiotemporal flow patterns in a medium-close distance. (a) Mapping of flow patterns, (b) duration of each flow pattern.

Figure 14. Results of strength-weighted spatiotemporal flow patterns in a medium-close distance. (a) Mapping of flow patterns, (b) duration of each flow pattern.

Figure 15. Results of strength-weighted spatiotemporal flow patterns in a medium-long distance. (a) Mapping of flow patterns, (b) duration of each flow pattern.

Figure 15. Results of strength-weighted spatiotemporal flow patterns in a medium-long distance. (a) Mapping of flow patterns, (b) duration of each flow pattern.

The 11 flow patterns shown in can be analyzed based on the spatial distribution and spatial relationship of the origin and destination regions of a single flow pattern. The overall distribution characteristics of all flow patterns can also be analyzed. They can be further described by combining the evaluation parameters of each flow pattern in . shows there is rarely a large difference in the number of spatial units in the origin and destination regions of a single flow pattern. For example, if oc and dc denote the number of spatial units constituting the origin and destination regions in a single flow pattern, oc and dc in WST-FP1 are 7 and 3, respectively. The oc and dc in WST-FP2 are 5 and 5, and the oc and dc in WST-FP3 are 4 and 5, respectively. Furthermore, the origin or destination of most flow patterns is near the frontier. In terms of distance, even two regions with very close distances may have flow patterns formed, e.g. WST-FP3, WST-FP4, WST-FP7, and WST-FP11 flow patterns have very small distances between the origin and destination regions.

Table 5. Evaluation results of WST-FPs in Figure 14.

In terms of spatial distribution, more flow patterns formed in border areas but less in inland areas. The flow patterns are the fewest in the central interior region and the most in the southern frontier region. In terms of duration, most flow patterns have a relatively short duration, and only a few have a particularly long-time span. For example, WST-FP4 lasted from 16 February 2018 to 7 July 2018, located on the northern side of the study area. WST-FP6 lasted from 1 January 2018 to 8 October 2018, located on the southern side of the study area. The time span of these two patterns is very long, lasting approximately 5 and 10 months, respectively. Another interesting phenomenon is most of these short- and medium-distance flow patterns occur in the first half of the year.

The advantage of the evaluation indicators is that they can reflect the different characteristics of the flow pattern in a more quantitative statistical way, which are difficult to directly visualize on the map. shows the coverage degree of flow patterns 1, 2, 3, and 8 is much higher than that of other flow patterns. Among them, WST-FP8 has the highest coverage degree (v-rate). Combined with the spatial location of this model in , the origin and destination regions of WST-FP8 are located in the capital Beijing and the Pearl River Delta Economic Development Zone, respectively. The ones with the highest closeness degree (s-rate) are flow patterns 9 and 10, both of which have similar spatial locations and are interactions between East China and Northeast China, but in opposite directions. Obviously, the balance (c-value) of the 11 flow patterns shown in does not differ much.

The characteristics of the spatial distribution, spatial relationships, and time periods of the flow patterns in have many similarities with . These medium- and long-distance flow patterns are also mainly distributed near China’s national border, and the central region still lacks flow patterns. Throughout the western and southern sides of the study area, most flow patterns are east-west oriented, e.g. flow patterns 1, 2, 4, and 5. On the east side of the study area, the flow patterns are mainly north-south oriented, such as flow patterns 6, 8, 9, and 10. The flow patterns in also have similar characteristics to in terms of spatial distribution and spatial relationships. In terms of flow pattern duration, most still appear in the first half of the year, and longer-lasting flow patterns are still the minority. For example, the only flow patterns appearing in the second half of the year are WST-FP 6 and WST-FP 10, and the only longer-lasting flow patterns are WST-FP 3, WST-FP 5, and WST-FP 10. shows the evaluation results of WST-FPs in . Except for WST-FP10, the duration of all these flow patterns is short. Most patterns in have larger s-rate values, indicating a stronger interaction between the origin and destination of these flow patterns.

Table 6. Evaluation results of WST-FPs in .

The northwest part of the study area belongs to a low population density region, while the southeast part belongs to a high population density region. Based on the definitions of flow density and flow strength, it can be found that flow pattern extraction methods based on flow density tend to identify flow patterns from high population density regions. In contrast, flow pattern extraction methods based on flow strength are not limited by population density, because flow strength is a relative measure, while flow density is absolute. To demonstrate the characteristics of the method proposed in this paper, SpatialflowL, flow ST-DBSCAN, and the WST-FP method proposed in this paper are used to extract flow patterns from the real dataset.

Some results are shown in . Clearly, in low population density regions, the WST-FP method can uncover flow patterns as shown in . The other two methods fail to detect flow patterns in the same area, as shown in . In high population density regions, the WST-FP method can extract flow patterns such as those shown in . The SpatialflowL method identifies similar patterns, as in , but this is a spatial flow pattern rather than a spatiotemporal flow pattern, thus the time periods when the pattern occurs cannot be obtained. The flow ST-DBSCAN method obtains two patterns as shown in , and their time periods can be determined. However, compared to the pattern in , their implications are completely different. The flow patterns obtained by the method proposed in this paper have large flow strength, while those obtained by the other two methods have large flow density.

Figure 16. Comparative experiments based on real-world datasets. (a) Mapping of flow patterns, (b) duration of each flow pattern. (a) result of SpatialflowL, (b) result of flow ST-DBSCAN and (c) result of WST-FP in low-density population regions. (d) result of SpatialflowL, (e) result of flow ST-DBSCAN and (f) result of WST-FP in high-density population regions.

Figure 16. Comparative experiments based on real-world datasets. (a) Mapping of flow patterns, (b) duration of each flow pattern. (a) result of SpatialflowL, (b) result of flow ST-DBSCAN and (c) result of WST-FP in low-density population regions. (d) result of SpatialflowL, (e) result of flow ST-DBSCAN and (f) result of WST-FP in high-density population regions.

6. Discussion and conclusion

6.1. Discussion

6.1.1. Revisiting the distinction between interaction volume and strength

It is important to understand the difference between interregional flow density and flow strength. To illustrate the critical role of flow strength in solving practical problems, as well as how it differs from flow density, here is an illustrative example. Suppose there are three regions: Region A, Region B, and Region C, as shown in ) can be seen as a spatially networked abstraction of real-world supply and demand relationships. Region C provides many vital resources to both Region A and Region B. Specifically, the resource flow from Region C to Region A is 500 (the 500 here can be approximated as the value of the flow density), while the total resource flow to Region A from all other regions is only 100. In comparison, while the resource flow from Region C to Region B is also high at 600, the total resource flow to Region B from other regions is far higher, at 5000. If a disaster strikes Region C cutting it off from interaction, this would severely impact production and livelihoods in Region A. For instance, if Region C has an accident and cannot supply resources to other regions, the impact on Region B is relatively minor since the resources it receives from C are only 10.7% of its total. However, the same accident would have an enormous impact on Region A, since the resources it receives from Region C make up 83.3% of its total. This highlights how flow strength reflects the importance and dependency of a certain interaction for a region, distinct from just the magnitude of flow density. Grasping flow strength accurately is critical for analyzing regional networks and responding to contingencies.

Figure 17. An example demonstrating the practical application of flow strength. (a) interregional supply and demand networks in the real world, (b) interregional flow networks following abstraction.

Figure 17. An example demonstrating the practical application of flow strength. (a) interregional supply and demand networks in the real world, (b) interregional flow networks following abstraction.

6.1.2. Application prospect of the algorithm

The conventional clustering results based on unweighted OD flows primarily reflect density characteristics, whereas the clustering results derived from weighted OD flows in this study capture the strength of association between regions. Unlike flow density, which reveals absolute closeness between regions in existing methods, flow strength in our method reveals local relative closeness between regions. This study introduces a flow cluster algorithm that progresses from traditional unweighted clustering to weighted clustering and from spatial clustering to spatiotemporal clustering, presenting a novel approach for integrated spatiotemporal flow pattern mining. Furthermore, this method can be viewed as a new map generalization technique for spatial OD flow data. The primary objective of this method is to address questions such as identifying regions with strong interactions or associations, determining the boundaries of origin and destination in flow patterns with significant associations, and understanding when these associations occur. This approach holds considerable potential in urban planning, transportation analysis, and regional planning, which involve spatial flow elements, such as human flow, logistics, traffic flow, and information flow from a broader flow space perspective.

The notion of strength-based local relative closeness between regions enables the revelation of the priority and importance of connections between regions. In practical decision-making scenarios, where limited resources are available for regional development, it becomes crucial to ascertain which connections between regions are the most significant and should be prioritized for strengthening or protection. Local relative closeness facilitates the comparison of the closeness between different regions and their adjacent regions, enabling the identification of the most critical regional associations and offering decision-makers a basis for informed choices. For example, in urban transportation planning, determining the direction of new routes requires understanding which areas have the closest links. Local relative closeness can identify these areas, guiding the establishment of connectivity between them. Similarly, in disease prevention and control, determining the order of isolating high-risk areas necessitates considering the closeness of their connections to infected areas. Local relative closeness can guide the prioritization of isolating areas with the closest links to infected regions, effectively curbing the spread of the disease.

In summary, compared to simple absolute closeness measures, local relative closeness provides more accurate and realistic results for analyzing inter-regional relationships. In practice, this approach can assist planners in adopting more scientifically grounded and targeted strategies, such as implementing isolation measures in disease control or formulating transportation network plans in urban planning.

6.1.3. Limitations and future directions

The strength-weighted spatiotemporal flow clustering method proposed in this study exhibits notable advantages in terms of efficiency, accuracy, and robustness. However, there are still certain limitations that warrant exploration in future research directions. Firstly, this method encounters challenges related to the modified area (time) unit problem when mining pattern results. The spatial area unit and temporal interval unit used for analysis need to be predefined, which can impact the interpretation and generalizability of the findings. Secondly, the current definition of flow patterns in this study does not account for cases where the origin and destination regions of a single flow pattern partially or completely overlap. Exploring the inclusion of such patterns within the scope of flow patterns would be a valuable avenue for future investigation. Thirdly, different flow patterns may spatially overlap to some extent, making it difficult to visualize them effectively on the same map. Developing an effective flow pattern visualization method to address this challenge remains an open research problem. Addressing these aspects will be critical for future research in this field. The resolution of the modified area unit problem, the inclusion of overlapping origin-destination regions in flow patterns, and the development of visualization techniques for overlapping flow patterns are key areas that require further exploration and innovation.

7. Conclusion

Spatial interaction is a concept that often manifests itself through the association between different regions. The analysis of spatial interaction involves examining key attributes of interaction events, such as their start time, end time, duration, and interaction strength. These attributes are crucial for understanding and studying spatial interaction patterns. When considering the proximity between regions, two aspects are commonly evaluated: absolute closeness and local relative closeness. Absolute closeness refers to the direct connection between regions, without taking into account the relationship between a region and its adjacent regions. On the other hand, local relative closeness considers the proximity of the connection between a region and its surrounding regions relative to other connections in the area. To effectively reveal the laws governing regional interactions, it is important to consider the interaction strength and time associated with OD flows. These flows serve as fundamental indicators in clustering and analyzing OD data, enabling a more accurate understanding of flow patterns and enhancing overall knowledge in the field. By incorporating these indicators, researchers can gain deeper insights into the spatial interaction dynamics between regions.

This paper proposes algorithms for efficiently mining flow patterns with spatiotemporal continuity based on interaction strength from large-scale OD flows. It also introduces metrics such as coverage, closeness, and tradeoff, which serve as means to evaluate the effectiveness and accuracy of flow patterns. The primary focus of the research is on addressing challenges related to the merging rule of spatiotemporal OD flow pairs weighted by interaction strength, the calculation of flow strength during the merging process, and the evaluation and interpretability of flow patterns using indicators.

When measuring absolute closeness, the common approach is to consider simple flow or the number of interactions between regions. However, in the case of local relative closeness, it becomes necessary to account for the neighborhood environment and the interaction range of a region. This requires the adoption of more complex indicators, such as flow strength, to assess the proximity between regions. The proposed algorithm is characterized by its efficiency, as it has a time complexity of less than O(n2) when constructing a spatiotemporal index. It is also designed to be applicable to various types of OD flow data, provided they include interaction volume and time attributes. The algorithm is well-parameterized, requiring only two input parameters: the spatiotemporal proximity rule and the strength reachability threshold. The robustness and practicality of the method are demonstrated through case experiments conducted using both synthetic and real datasets. These experiments serve to validate the effectiveness and applicability of the proposed algorithm in real-world scenarios.

Data and codes availability statement

The data and codes (python) that support the findings of this study are available on “https://github.com/,” with the identifier at the private link: https://github.com/gissuifeng/WeightedSpatiotemporalFlowCluster

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [42201455]; Lvyangjinfeng Excellent Doctoral Program of Yangzhou (Grant No. YZLYJFJH2021YXBS103)

References

  • Andrienko, G., N. Andrienko, G. Fuchs, and Wood, J. 2017. “Revealing Patterns and Trends of Mass Mobility Through Spatial and Temporal Abstraction of Origin-Destination Movement Data.” IEEE Transactions on Visualization and Computer Graphics 23 (9): 2120–25. https://doi.org/10.1109/TVCG.2016.2616404.
  • Andris, C., X. Liu, and J. Ferreira. 2018. “Challenges for Social Flows.” Computers Environment And Urban Systems 70:197–207.
  • Birant, D., and A. Kut. 2007. “ST-DBSCAN: An Algorithm for Clustering Spatial-Temp Oral Data.” Data & Knowledge Engineering 60 (1): 208–221. https://doi.org/10.1016/j.datak.2006.01.013.
  • Bogataj, D., M. Bogataj, and S. Drobne. 2019. “Interactions Between Flows of Human Resources in Functional Regions and Flows of Inventories in Dynamic Processes of Global Supply Chains.” International Journal Of Production Economics 209:215–225.
  • Chen, Y., H. Z. Qian, X. Wang, D. Wang, and L. Han. 2022. “A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships Between Points of Interest.” ISPRS International Journal of Geo-Information 11 (10): 498. https://doi.org/10.3390/ijgi11100498.
  • Chen, Y., J. Xu, and M. Z. Xu. 2015. “Finding Community Structure in Spatially Constrained Complex Networks.” International Journal of Geographical Information Science 29 (6): 889–911. https://doi.org/10.1080/13658816.2014.999244.
  • Crivellari, A., and B. Resch. 2022. “Investigating Functional Consistency of Mobility-Related Urban Zones via Motion-Driven Embedding Vectors and Local POI-Type Distributions.” Computational Urban Science 2 (1): 19. https://doi.org/10.1007/s43762-022-00049-8.
  • Emch, M., E. D. Root, S. Giebultowicz, M. Ali, C. Perez-Heydrich and M. Yunus. 2012. “Integration of Spatial and Social Network Analysis in Disease Transmission Studies.” Annals of the Association of American Geographers 102 (5): 1004–1015. https://doi.org/10.1080/00045608.2012.671129.
  • Gao, S., Y. Liu, Y. L. Wang, and X. J. Ma. 2013. “Discovering Spatial Interaction Communities from Mobile Phone Data.” Transactions in GIS 17 (3): 463–481. https://doi.org/10.1111/tgis.12042.
  • Giordano, A., T. Cole, and M. Le Noc. 2022. “Spatial Social Networks for the Humanities: A Visualization and Analytical Model.” Transactions in Gis 26 (4): 1683–1702. https://doi.org/10.1111/tgis.12938.
  • Graser, A., J. Schmidt, F. Roth, and N. Brandle. 2019. “Untangling Origin-Destination Flows in Geographic Information Systems.” Information Visualization 18 (1): 153–172. https://doi.org/10.1177/1473871617738122.
  • Guo, S. H., T. Pei, S. Y. Xie, C. Song, J. Chen, Y. Liu, and, and H. Shu . 2021. “Fractal Dimension of Job-Housing Flows: A Comparison Between Beijing and Shenzhen.” Cities 112:103120. https://doi.org/10.1016/j.cities.2021.103120.
  • Guo, X. G., Z. J. Xu, J. Q. Zhang, J. Lu, and H. Zhang. 2020. “An OD Flow Clustering Method Based on Vector Constraints: A Case Study for Beijing Taxi Origin-Destination Data.” ISPRS International Journal of Geo-Information 9 (2): 128. https://doi.org/10.3390/ijgi9020128.
  • Hasanpour Jesri, S. O., and M. A. Shirazi. 2022. “Predicting Dynamic Origin-Destination Matrix by Time Series Pattern Recognition.” International Journal of Transportation Engineering 10 (2): 999–1013.
  • Kim, K., K. Oh, Y. K. Lee, S. Kim, and J. Y. Jung. 2014. “An Analysis on Movement Patterns Between Zones Using Smart Card Data in Subway Networks.” International Journal of Geographical Information Science 28 (9): 1781–1801. https://doi.org/10.1080/13658816.2014.898768.
  • Koylu, C., G. Tian, and M. Windsor. 2023. “Flowmapper.Org: A Web-Based Framework for Designing Origin-Destination Flow Maps.” Journal of Maps 19 (1). https://doi.org/10.1080/17445647.2021.1996479.
  • Liu, F., W. Bi, J. J. Tang, and W. Hao. 2022. “Understanding the Correlation Between Destination Distribution and Urban Built Environment from Taxi GPS Data.” Transactions in Gis 26 (4): 1821–1846. https://doi.org/10.1111/tgis.12908.
  • Liu, Q. L., J. Yang, M. Deng, C. Song, and W. K. Liu. 2022. “Snn_flow: A Shared Nearest-Neighbor-Based Clustering Method for Inhomogeneous Origin-Destination Flows.” International Journal of Geographical Information Science 36 (2): 253–279. https://doi.org/10.1080/13658816.2021.1899184.
  • Louail, T., M. Lenormand, M. Picornell, O. García Cantú, R. Herranz, E. Frias-Martinez, J. J. Ramasco, and M. Barthelemy. 2015. “Uncovering the Spatial Structure of Mobility Networks.” Nature Communications 6 (1). https://doi.org/10.1038/ncomms7007.
  • Luo, D., O. Cats, and H. van Lint. 2017. “Constructing Transit Origin–Destination Matrices with Spatial Clustering.” Transportation Research 2652 (1): 39–49. https://doi.org/10.3141/2652-05.
  • Lu, Z., Q. L. Zhang, X. R. Du, D. S. Wu, and F. Gao. 2016. “A Fuzzy Social Network Centrality Analysis Model for Interpersonal Spatial Relations”, Knowledge-Based Systems, 105: 206–213.
  • Nie, L., and D. D. Jiang. 2015. “A Compressive Sensing-Based Network Tomography Approach to Estimating Origin-Destination Flow Traffic in Large-Scale Backbone Networks.” International Journal of Communication Systems 28 (5): 889–900. https://doi.org/10.1002/dac.2713.
  • Pei, T., A. Jasra, D. J. Hand, A. X. Zhu, and C. H. Zhou. 2009. “DECODE: A New Method for Discovering Clusters of Different Densities in Spatial Data.” Data Mining and Knowledge Discovery 18 (3): 337–369. https://doi.org/10.1007/s10618-008-0120-3.
  • Pei, T., W. Y. Wang, H. C. Zhang, T. Ma, Y. Y. Du, and C. H., Zhou. 2015. “Density-Based Clustering for Data Containing Two Types of Points.” International Journal of Geographical Information Science 29 (2): 175–193. https://doi.org/10.1080/13658816.2014.955027.
  • Randriamanamihaga, A. N., E. Come, L. Oukhellou, and G. Govaert. 2014. “Clustering the Velib’ Dynamic Origin/Destination Flows Using a Family of Poisson Mixture Models”, Neurocomputing, 141:124–138.
  • Rus, A. M. M., Z. A. Othman, A. Abu Bakar, and S. Zainudin. 2022. “A Hierarchical ST-DBSCAN with Three Neighborhood Boundary Clustering Algorithm for Clustering Spatio–Temporal Data.” International Journal of Advanced Computer Science & Applications 13 (12): 614–626. https://doi.org/10.14569/IJACSA.2022.0131274.
  • Shu, H., T. Pei, C. Song, X. Chen, S. Guo, Y. Liu, J. Chen, X. Wang, and C. H. Zhou. 2021. “L-Function of Geographical Flows.” International Journal of Geographical Information Science 35 (4): 689–716. https://doi.org/10.1080/13658816.2020.1749277.
  • Song, C., T. Pei, T. Ma, Y. Y. Du, H. Shu, S. H. Guo, and Z. D. Fan. 2019. “Detecting Arbitrarily Shaped Clusters in Origin-Destination Flows Using Ant Colony Optimization.” International Journal of Geographical Information Science 33 (1): 134–154. https://doi.org/10.1080/13658816.2018.1516287.
  • Tao, R., and J.-C. Thill. 2016. “Spatial Cluster Detection in Spatial Flow Data.” Geographical Analysis 48 (4): 355–372. https://doi.org/10.1111/gean.12100.
  • Van Nuffel, N. 2007. “Determination of the Number of Significant Flows in Origin-Destination Specific Analysis: The Case of Commuting in Flanders.” Regional Studies 41 (4): 509–524. https://doi.org/10.1080/00343400701281808.
  • Wang, C. Z., F. H. Wang, and T. Onega. 2021. “Network Optimization Approach to Delineating Health Care Service Areas: Spatially Constrained Louvain and Leiden Algorithms.” Transactions in GIS 25 (2): 1065–1081. https://doi.org/10.1111/tgis.12722.
  • Wan, Y., T. Pei, C. H. Zhou, Y. Jiang, C. X. Qu, and Y. L. Qiao. 2012. “ACOMCD: A Multiple Cluster Detection Algorithm Based on the Spatial Scan Statistic and Ant Colony Optimization.” Computational Statistics & Data Analysis 56 (2): 283–296. https://doi.org/10.1016/j.csda.2011.08.001.
  • Xu, Y., P. Santi, and C. Ratti. 2022. “Beyond Distance Decay: Discover Homophily in Spatially Embedded Social Networks.” Annals of the American Association of Geographers 112 (2): 505–521. https://doi.org/10.1080/24694452.2021.1935208.
  • Yao, X., D. Zhu, Y. Gao, L. Wu, P. C. Zhang, and Y. Liu. 2018. “A Stepwise Spatio-Temporal Flow Clustering Method for Discovering Mobility Trends.” IEEE Access, 6:44666–44675.
  • Ye, X. Y., and C. Andris. 2021. “Spatial Social Networks in Geographic Information Science.” International Journal of Geographical Information Science 35 (12): 2375–2379. https://doi.org/10.1080/13658816.2021.2001722.
  • Zhang, Q., and T. G. Chu. 2016. “Structure Regularized Traffic Monitoring for Traffic Matrix Estimation and Anomaly Detection by Link-Load Measurements.” IEEE Transactions on Instrumentation and Measurement 65 (12): 2797–2807. https://doi.org/10.1109/TIM.2016.2599426.
  • Zhang, Y. P., L. Liu, and H. Wang, (2019). “A New Perspective on the Temporal Pattern of Human Activities in Cities: The Case of Shanghai”, Cities, 87:196–204.
  • Zhang, H. P., X. X. Zhou, X. Gu, L. Zhou, G. Ji, and G. Tang. 2018. “Method for the Analysis and Visualization of Similar Flow Hotspot Patterns Between Different Regional Groups.” ISPRS International Journal of Geo-Information 7 (8): 328. https://doi.org/10.3390/ijgi7080328.
  • Zhou, X. X., H. P. Zhang, G. L. Ji, and G. A. Tang. 2019. “A Multi-Density Clustering Algorithm Based on Similarity for Dataset with Density Variation.” IEEE Access, 7:186004–186016.