169
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Modelling and application of a spectral clustering method for shared bicycle trajectories

, , , &
Article: 2326008 | Received 27 Nov 2023, Accepted 27 Feb 2024, Published online: 18 Mar 2024

Abstract

Geographic flow clustering analysis can effectively reveal human behavioral patterns in movement. Traditional methods for studying human movement patterns are mostly based on first-order quantity analyses of point data, such as hotspots, density or clustering. Currently, relatively few second-order spatial analysis methods based on geographic flows exist. Thus, we developed a new geographic flow method based on spectral clustering and applied it to trajectory data analysis. This article uses the bike-sharing trajectories data in Shanghai in August 2016, spectral clustering analysis was conducted on the group flow patterns before, during and after rainfall, on weekdays and weekends and in the morning and evening peak. Spectral clustering was verified to exhibit better clustering effect by comparing the clustering indices of different clustering methods. This study enriches the analysis method of geographical flows, and the human mobility patterns revealed by its analysis can provide references for formulating urban green travel policies.

1. Introduction

Geographic flows, also known as origin–destination (OD) flows, are defined as meaningful interactions between geographic entities across different spatial locations (Tao and Thill Citation2019), such as migration (LeSage and Fischer Citation2010), daily commutes (Guo et al. Citation2012) and goods transportation (Zhu and Guo Citation2014). Existing studies typically quantify flows using conceptual models featuring origins (O) and destinations (D) (Fotheringham Citation1984), which are often visually represented by directed line segments connecting OD points. Current OD flow data are predominantly categorized into point- and area-based OD flows (Graser et al. Citation2019). The former only records the OD points of the trajectories without considering the actual route, such as GPS tracking data. By contrast, the latter is restricted to movements within areas, such as trade volumes between countries, representing only a conceptual link. This study focuses on point-based OD flow data.

With the advancements in multimedia, social networks and global positioning systems, the accessibility and comprehensiveness of human mobility geographic flow data have significantly improved. However, increasing data redundancy and noise pose challenges in identifying spatial patterns within existing flow data. To analyse large-scale geographic flows among various locations and entities and reveal the hidden patterns of human mobility, it is imperative to engage in spatiotemporal pattern recognition research on geographic flows (Pei et al. Citation2020). These studies on flow patterns primarily address two aspects: relatedness and heterogeneity of flows. Relatedness refers to the spatial proximity of origins or destinations, while heterogeneity denotes the randomness of the flow’s origins and destinations. Metrics for assessing flow relatedness include Moran’s I statistic for flows (Liu et al. Citation2015); and flow heterogeneity is indicated by metrics, such as the K-function (Tao and Thill Citation2016), local K-function (Berglund and Karlstrom Citation1999), and L-function (Shu et al. Citation2021). Both lines of research are intimately associated with cluster analysis of geographic flows.

Cluster analysis is an exploratory method that does not require pre-defined human classification criteria. It is automatically categorized based on the sample data (Qiao et al. Citation2022). Cluster analysis has been applied in various fields, such as geographic information systems (GIS) (Charreire et al. Citation2012; Zhao et al. Citation2019; Fang et al. Citation2021 Zhou et al. Citation2023), bioinformatics (Zou et al. Citation2020; Karim et al. Citation2021) and the financial industry (Lorenzo and Arroyo Citation2022, Citation2023; Ma and Tanizaki Citation2022). Clustering geographic flow data can effectively reveal the spatiotemporal patterns of human mobility behavior behind the data. From a geospatial correlation perspective, cluster analysis is an effective method for revealing spatial heterogeneity. Traditional cluster analysis methods have a well-established framework and can quickly and effectively cluster based on the characteristics of entities. Cluster analysis methods are mainly divided into hierarchical, density-based, grid and model-based clustering. Hierarchical clustering methods include k-means (Huang Citation1998), FCM (fuzzy c-means) algorithm (Dunn Citation1974) and EM (expectation–maximization) algorithms (Dempster et al. Citation1977); density-based methods include kernel density (Zhou et al. Citation2022), density-based spatial clustering of applications with noise (DBSCAN) (Bäcklund et al. Citation2011) and fuzzy clustering (Yang Citation1993); grid-based methods include statistical information grid (STING) (Gallego et al. Citation2011) and clustering in quest (CLIQUE) (Bureva et al. Citation2021); model-based clustering methods commonly involve probabilistic models (Smyth Citation2000) and neural networks (Du Citation2010); other types of clustering include kernel clustering (Lu et al. Citation2014) and quantum clustering (Aïmeur et al. Citation2007). These methods are primarily based on a first-order variable analysis of point patterns, mostly modeling within 2D network spaces and Euclidean spaces. In this process, the origins of the flows are abstracted as points in space, and the flows are abstracted as connections between point pairs, leading to blurred spatial information. Geographic flow data, which are second-order variables, require different approaches. To study the big data of geographic flows between different locations and entities, scholars have treated geographic flows as a whole in a 4D space, enabling a cluster analysis of flows (Pei et al. Citation2020). For instance, Tao et al. (Citation2017) proposed a density-based hierarchical spatial flow-clustering method (flowHDBSCAN) that effectively extracts flow clusters and reveals hierarchical data structures (Tao et al. Citation2017). Yan et al. (Citation2023) introduced global and local versions of the spatiotemporal flow L-function, showing good performance in identifying spatiotemporal flow clusters with arbitrary shapes and densities (Yan et al. Citation2023). Song et al. (Citation2019) developed a spatial scan statistic method based on ant colony optimization (ACO) to detect OD flow clusters of arbitrary shapes, known as ‘AntScan_flow’ (Song et al. Citation2019) and Fang et al. (Citation2021) introduced a new OD flow clustering detection method based on the OPTICS algorithm, capable of identifying OD flow clusters at different aggregation scales (Fang et al. Citation2021). Although the methods described above have demonstrated significant effectiveness in the clustering analysis of flow data, their performance may be suboptimal when dealing with flow data samples with irregular shapes. This limitation can be attributed to the fact that these algorithms typically operate under certain assumptions regarding the shape and distribution of data, and assumptions that irregularly shaped data may not meet (Oyelade et al. Citation2019). Moreover, in the pursuit of an optimal clustering solution, there is a possibility of encountering what is known as a ‘local optimum.’ This is because these algorithms are based on heuristics and tend to converge rapidly towards local optima during the search process, overlooking the global distribution of the data (Birant and Kut Citation2007; Kim and Jung Citation2017). Additionally, methods, such as BiFlowLISA (Tao and Thill Citation2020), flow centrality quotient (Zhou et al. Citation2023), cross K-function (Tao and Thill Citation2019), FCLPs (flow-centric local patterns) (Cai and Kwan Citation2022) and trend surface analysis (Guo et al. Citation2023) have been used for the spatial clustering of binary geographic flows, helping to reveal flow patterns, spatial dependencies and interactions in geographical spaces. However, these methods have limitations in revealing global flow patterns and fail to fully consider the spatial structure and complexity of flows.

Spectral clustering is a method capable of effectively revealing complex global structures and patterns. Fundamentally, it constitutes a clustering algorithm based on graph theory and linear algebra. Recently, it has been widely applied in various fields owing to its excellent clustering performance (Jia et al. Citation2014). This technique principally involves representing a dataset as a graph, followed by a clustering analysis through an eigenvalue decomposition of the graph that relies on eigenvectors and eigenvalues of the data similarity matrix. Compared to traditional clustering methods, spectral clustering can converge to a global optimum and is more inclusive of data, thereby addressing some of the deficiencies of traditional clustering approaches (Nascimento and de Carvalho Citation2011). Inspired by this, this study employed randomly sampled Mobike bicycle trajectory data from Shanghai in August 2016 to propose a spectral clustering method based on geographic flow. The proposed method extends the first-order spectral clustering approach to the second-order variable space of geographic flow data, thereby enhancing cluster identification, particularly in scenarios in which flow clusters are irregularly shaped or overlapping. This significantly improves the capability to identify flow clusters in complex flow datasets. The remainder of this article is organized as follows. Section 2 discusses the data and methodology. Section 3 presents the findings of the study. Section 4 presents related discussions and Section 5 concludes the article.

2. Materials and methods

2.1. Study area and data

Shanghai is the largest economic center in China and one of the first cities to launch and operate bicycle-sharing systems. As of November 2022, there were approximately 890,000 shared-ride bicycles in Shanghai, and the city is implementing a digital management programme to improve road traffic law-abiding behavior. Therefore, it is important to understand the cycling patterns of bike-share users in Shanghai, and a large dataset of bike-share rides was available for this study. In this study, 13 urban areas (except Chongming District, Qingpu District, Fengxian District, Songjiang District and Jinshan District) with the highest bike-share usage in Shanghai were selected as the study areas. These areas contain urban centers, commercial districts and high-tech production areas and are thus representative of a wide range of urban spaces.

The data used in this study were primarily divided into two parts, namely Mobike data and impact factor data. Bike-share trajectory data were derived from publicly available data from a previous study (Li et al. Citation2020). This study selected data from 1 August 2016 to 16 August 2016, comprising 39,467 valid entries (). The data included fields such as the order number, user identification number, vehicle identification number, start and end times of the ride, coordinates of the starting and ending points (latitude and longitude) and cycling route. Furthermore, map matching was performed on the data prior to clustering analysis, and historical weather condition information for Shanghai was obtained from the Weather Network website (http://lishi.tianqi.com/). During the study period, rain occurred from August 2 to 5 and August 8 to 9, and the weather was mainly cloudy and sunny at other times. Land use data were obtained from Landsat 8 remote sensing image data from the Geospatial Data website (http://www.gscloud.cn/).

Figure 1. The study area of (a) Shanghai, showing its location within China and (b) the research area within Shanghai. (c) The region bounded by the polygon shows the area with highly dense trajectories·.

Figure 1. The study area of (a) Shanghai, showing its location within China and (b) the research area within Shanghai. (c) The region bounded by the polygon shows the area with highly dense trajectories·.

2.2. Methodology

The proposed algorithm comprises the following main modules (): (1) distance measurement of flows, (2) flow clustering based on spectral clustering and (3) evaluation metrics. The first part constructed the distance and adjacency matrices of the geographic flows for the preprocessed geographic flow data. In the second part, the flow clusters were clustered according to the spectral clustering methods adopted in this study. The third section compares the clustering results with those of other methods and analyses the results according to four aspects: morning and evening peaks, weekdays and holidays, before and after rainy days and land use for human cycling patterns.

Figure 2. Workflow of the spatio-temporal pattern analysis of Mobike data.

Figure 2. Workflow of the spatio-temporal pattern analysis of Mobike data.

2.2.1. Geographic flow distance

Flow distance is commonly used to measure the proximity between flows. Most clustering techniques rely on distance measures, which produce different final clustering results. Existing flow-clustering metrics are primarily divided into the following categories. The first and most commonly used flow-distance measure was proposed by Tao and Thill (Citation2016). Here, the flow is defined as a four-dimensional object expressed as (xi,yi,ui and vi), where xi and yi represent the latitude and longitude of point O and ui and  vi represent the latitude and longitude of point D. The distance between two flows was then calculated (formula (1)), with α and β as constants (α > 0, β > 0, α + β = 2; by default, α = β = 1). The effect of flow length on the results was also explored. The second flow distance measure combines flow data with traffic section data (Liu et al. Citation2021). Each starting point was matched with the nearest section data for the flow length, and the network distance between the flows was calculated as the similarity distance. The third measure uses the Hausdorff distance to treat flow data as a trajectory and calculate the similarity between flows (Qiao et al. Citation2022). (1) d=a[(xixj)2+(yiyj)2]+β[(uiuj)2+(vivj)2](1)

Based on the research mentioned above, this study explored a new metric method applied in the field of geospatial flows, namely, the Haversine Formula, to quantify the similarity between flows (Markou and Kassomenos Citation2010). In cases where only latitude and longitude were available, geospatial distances were calculated in radians using the following formula: (2) SO=Racos(sinyi sinyj+cosyi cosyj cosΔX)(2)

First, EquationEquation (2) is applied to compute the distances between all starting points of the flows, where SO represents the matrix of distances between all starting points, yi and yj denote the latitudes of the starting points of flows Fi 1and Fj, respectively, Δλ is the difference in longitude between the starting points of flows, R is the Earth’s radius. The same formula was used to calculate the distance matrix SD​ between all endpoints of the flow: (3) fij=(SO+SD)/k(3)

Second, we calculated the sum of the squares of the Euclidean distances from all flow origins to their destinations to generate the flow length matrix K. Finally, using EquationEquation (3), we considered both the distances of origins and destinations as well as the lengths of the flows to obtain the inter-flow distance matrix fij. To mitigate the impact of extreme values, this study arranges the f values corresponding to each flow in descending order, and then selects the nth largest value from each column to be stored in matrix M, which is used for the subsequent construction of the geospatial flow proximity matrix. In this study, through repeated experiments, the clustering effect was optimized when n=7.

2.2.2. Spectral clustering of flow

Spectral clustering, which evolved from graph theory and is widely applied in data clustering because of its superior performance, forms the basis for applying spectral theory to the field of geographic flows. By integrating the definition of geographic flows proposed by Pei et al. with the spectral clustering method (Pei et al. Citation2020), we constructed the adjacency matrix of flows as follows: (4) Aij=exp((fij)22MiMj)(4) where fij represents the distance matrix between geospatial flows and Mi and Mj are the ith and jth elements in matrix M, respectively. This calculation method considers the relative magnitude and distribution of flow distances, allowing nodes that are more closely connected (i.e. those with smaller flow distances) to receive greater weight in the adjacency matrix.

By constructing this adjacency matrix, the degree matrix D between the geospatial flows can be computed. Finally, a Laplacian matrix for the geospatial flows was calculated. The Laplacian matrix is divided into standard and nonstandard forms, and this study utilizes the standardized Laplacian matrix with the following calculation formula: (5) L=D12(DAij)D12(5)

In this study, the eigenvectors corresponding to the first K largest eigenvalues of matrix L form matrix P. Each row of the matrix is normalized to derive matrix G. Each row of G is considered as a point in a K-dimensional space. Finally, the k-means clustering algorithm was applied for classification.

The logical relationship behind the transformation of these matrices captures the characteristics of the data through the structure of the graph and the similarity between data points. The degree matrix D represents the degree of each node, and the Laplace matrix L contains the topological and similarity information of the graph. By analysing the features of the Laplacian matrix, the data points can be divided into different clusters because the feature vectors encode the position and similarity of the data points in the graph.

2.2.3. Evaluation index

To compare the clustering performance of the spectral clustering of geographical streams with other methods, the Silhouette coefficient (SC) (Bagirov et al. Citation2023), Davis-Bouldin index (DBI) (Ros et al. Citation2023) and Calinski-Harabasz index (CHI) (Kermani et al. Citation2015) were used as clustering indices to compare the clustering performance with other clustering methods.

The SC method combines both cohesion and separation to evaluate the performance of clusters, and is widely used in cluster evaluations (Rousseeuw Citation1987). Therefore, we used SC as an evaluation index to compare the performance of the spectral clustering of geographic flows with that of other methods, which was calculated as follows: (6) SC=baMAX(a,b)(6) where a is the average distance between other samples in the same category and b is the average clustering between different samples in the nearest category. The range of the contour coefficient was [–1, 1], with larger values representing better clustering performance and more compact clusters. Clustering is unreasonable when the contour coefficient is equal to −1.

DBI is an index used to evaluate cluster quality that measures the tightness and separation of clusters, whose calculation formula is as follows: (7) DBI=1ni=1nmax(ji)(σi+σjd(ci,cj))(7) where n is the number of clusters, ci is the center of class i, d(ci,cj) is the distance between the cluster centers of i and j, and σi is the average distance between all points in class i and the center. The smaller the DBI value, the higher the tightness of the clusters and the better the degree of separation between clusters, indicating a better clustering effect.

CHI measures the tightness within the class by calculating the sum of squares of the distance between each point in the class and the center of the class and measures the separation degree of the dataset by calculating the sum of squares of the distance between various center points and the center point of the dataset. CHI was calculated as the ratio of the degree of separation to the degree of tightness using the following formula: (8) CHI=tr(Bk)mktr(Wk)k1(8) where m is the number of training samples, k is the number of classes, Bk is the covariance matrix between classes, Wk is the covariance matrix of the data within classes and tr is a matrix trace. The larger the CHI, the closer the class itself is and the more dispersed the classes are; that is, the better the clustering results.

2.3. Synthetic test

To validate the capabilities of the proposed method, it was tested using a synthetic dataset. The synthetic dataset is shown in . Specifically, 500 OD flow data points were randomly generated within a 100 × 100 2D space, as shown in . These data points were distributed around several cluster centers and included completely random OD flows to simulate the randomness present in real-world data. In addition, this study randomly generated 500 irregularly distributed OD flows covering the entire 100 × 100 2D space, as depicted in .

Figure 3. Synthetic flows. (a) Randomly generated flows around clustering centers; (b) irregularly generated flows.

Figure 3. Synthetic flows. (a) Randomly generated flows around clustering centers; (b) irregularly generated flows.

To achieve better clustering results for the different cases, the elbow method was employed to determine the optimal number of clusters for the best clustering effect (Shi et al. Citation2021). The core idea is to identify the point at which an increase in the number of clusters (K-value) leads to a significant decrease in the total within-cluster variance (sum of squared errors [SSE]). The mathematical principles are as follows: (9) SSE=i=1nj=1kwij||xicj||2(9) where n is the total number of geospatial flows, k is the number of clustering centers, xi is a data point, cj is a clustering center, wij is an indicator variable that equals 1 if data point xi is assigned to clustering center cj and 0 otherwise, and ||xicj|| is the Euclidean distance between point xi and clustering center cj. In the elbow method, as k increases, the SSE typically decreases, and after a certain point, adding more clustering centers leads to a markedly slower decrease in the SSE. This point is analogous to the ‘elbow’ of the arm and is often considered a reasonable choice for the optimal number of clusters. By applying the elbow method to the aforementioned two cases, as shown in with the red box, it was found that the optimal number of clusters for both was 5.

Figure 4. Trends in SSE for different cases.

Figure 4. Trends in SSE for different cases.

In this study, the spectral clustering of geospatial flows was used to cluster the two cases mentioned above, and the results are shown in . It can be seen that Case 1 exhibits five distinct clustering clusters through spectral clustering of geospatial flows, and Case 2 also shows certain aggregation characteristics through spectral clustering of geospatial flows, indicating that the method is effective. Therefore, we employed the spectral clustering method for geospatial flows (see Section 3.2) to analyse the clustering of Mobike bicycle OD flow data in Shanghai to reveal the patterns of human mobility behavior.

Figure 5. Clustering results for different cases.

Figure 5. Clustering results for different cases.

3. Results

In this section, we analyse the Mobike data in Shanghai explored from three perspectives: time, space, and land use. The global aspects of bike-share mobility patterns were mined to reveal the inner laws of human travel.

3.1. Characteristics of geographic flows with different time granularity

Bike-sharing rides showed an overall trend for short durations and distances, with rainy days affecting both ride frequency and duration. As shown in , on rainy days from August 2 to 5, 88.59% of the rides lasted less than 30 min, with 62.22% lasting less than 15 min. Only 11.40% of the rides lasted between 30 and 120 min. On cloudy days from August 9 to 12, 89.29% of the rides lasted less than 30 min, with 63.20% lasting less than 15 min and 10.70% lasting 30–120 min. During the weekend of August 13–14, 86.32% of the rides lasted less than 30 min, 58.14% lasted less than 15 min and 13.67% lasted between 30 and 120 min. Interestingly, although the total number of rides decreased by 43.30% on rainy days compared to cloudy days, the percentage of long rides was greater, and the percentage of short rides was smaller on rainy days than on cloudy days. The weekend rides from August 13 to 14 tended to last longer than the weekday rides from August 2 to 5 and 9 to 12.

Table 1. Statistics for the number of rides at different time bicycle riding hours.

From Monday to Friday, there were two peaks in the morning (07:00 and 09:00) and evening (17:00 and 19:00), compared with the more evenly distributed rides on Saturdays and Sundays (). This difference is likely to result from weekday working schedules, which give rise to high morning and evening demands. People were generally active starting at 06:00 h, with a significantly higher level of riding activity at 17:00 h than in the morning. Fewer people rode during midday, possibly because of the short lunch break.

Figure 6. Distribution of the number of rides at different times for the study period.

Figure 6. Distribution of the number of rides at different times for the study period.

3.2. Spatial characteristics of geographic flows based on spectral clustering

3.2.1. Human movement patterns before, during and after a rainy day

This study analysed human biking behavior on a rainy day (August 5th) and two cloudy days (August 1st and August 9th), utilizing a spectral clustering approach based on geographic flow data. The purpose of this study was to explore patterns in human travel origins and destinations before, during, and after a rainy day. Initially, the elbow method was employed to determine the optimal number of clusters for the three periods. As illustrated in (indicated by the red boxes), the optimal number of OD flow clusters was five for the pre-rainy day (August 1st), four for the rainy day (August 5th) and six for the post-rainy day (August 9th).

Figure 7. Trends in OD flow SSE before, during and after a rainy day.

Figure 7. Trends in OD flow SSE before, during and after a rainy day.

The clustering results are shown in . To provide a more intuitive visualization of human movement patterns, we selectively highlighted the most prominent flows within each cluster. These influential flows, illustrated by the black arrows in , were carefully chosen for emphasis. These were chosen based on the characteristic features of spectral clustering, that is, the flows with the highest eigenvalues. This method was consistently applied in subsequent case studies to illustrate human movement patterns. On the cloudy day of August 1st (), the overall human movement pattern was predominantly between the inner ring areas. The clustering results indicate a preference for destinations including Huangpu, Jing’an, and Putuo Districts. This can be attributed to August 1st being Monday, the first day of the workweek and these districts hosting numerous corporate and educational institutions. During the rainy day of August 5th (), biking significantly decreased, with an 18.02% reduction compared to August 1st and a 70.21% decrease compared to August 9th. The clustering results show movements mainly within the inner ring, with Xu Hui, Chang Ning and Jing’an Districts being common destinations. On the cloudy day of August 9th (), the general movement trend shifted from the inner ring to the middle ring, with a preference for Jing’an, Chang Ning and Hong Kou Districts.

Figure 8. Clustering results of cycling patterns the day (a) before rain (August 1; cloudy weather), (b) during rain (August 5) and (c) after rain (August 9; cloudy weather).

Figure 8. Clustering results of cycling patterns the day (a) before rain (August 1; cloudy weather), (b) during rain (August 5) and (c) after rain (August 9; cloudy weather).

Specifically, from , we can see that on August 1st, Cluster 1 primarily encompassed the Putuo District. Human movement was mainly from Shanghai Open University (Zhabei Campus) towards Shanghai West Railway Station, and from Shanghai Railway Station towards Shanghai West Railway Station. Noticeable clustering occurred near the Qilianshan South, Qilianshan, and Zhongtan road subway stations. Cluster 2 was concentrated in Zhabei and Hongkou Districts, with movements predominantly from the Dabai Tree Subway Station towards Yixian Road and from Wujiaochang Town towards Changzhong Road, forming distinct clusters near Yingao West Road Station, Shanghai Circus City and Wenshui Road. Cluster 3 focused on the Yangpu District, with significant clusters near Jiangwan Stadium, Shiguang Road and Jiangpu Road Subway Stations. Cluster 4 was centered in the Huangpu District, where people primarily moved from the People’s Square Subway Station towards the Laoximen Subway Station, with most subway stations in this cluster showing noticeable clustering. Cluster 5 encompassed Jing’an and Xuhui Districts, with the main movement from Jing’an District People’s Government towards Xuhui District People’s Government, and significant clustering around Xujiahui and Shanghai Stadium Subway Stations.

In , for the rainy day on August 5th, Cluster 1 mainly covered the Putuo and Changning Districts. Human movements were predominantly from Xinjing Town towards the Changning District People’s Government and from Railway New Village towards Shanghai West Railway Station, with prominent clustering near the Liziyuan, Zhenxin New Village, and Dahua San Road Subway Stations. Cluster 2 was centered in Zhabei and Hongkou Districts, with significant clusters around Gongkang Road, Chifeng Road, Zhongshan North Road and Hongkou Foofootball Stadium Subway Stations. Cluster 3 in Yangpu District saw the main movements from Tongji University towards the Nenjiang Road Subway Station, with noticeable clusters around Shiguang Road, Nenjiang Road and Wujiaochang Subway Stations. Cluster 4, concentrated in Jing’an, Huangpu and Xuhui Districts, observed movements predominantly from Huangpu District towards Jing’an and Xuhui Districts, with significant clustering near Madang Road, Jing’an Temple, Xujiahui and Laoximen Subway Stations.

For the post-rainy day on August 9th, as seen in , Cluster 1 mainly involved Putuo, Changning and Jing’an Districts. Movements were primarily from the Zhenping Road Subway Station towards the Putuo District People’s Government and from the Xingzhi Road Subway Station towards Dachang Town, with noticeable clustering near the Songhong Road, Liziyuan, and Changshou Road Subway Stations. Cluster 2 focused on the Zhabei District, forming significant clusters near Lasegang, Xingzhi Road Subway Station, Yingao West Road and Jiangwan Town Subway Stations. Cluster 3, concentrated in Yangpu District, saw human movement mainly from Jiangwan Stadium towards Shiguang Road and from Shanghai University of Finance and Economics towards Jiangwan Stadium. Cluster 4 in Hongkou District had noticeable clustering near the Hongkou Foofootball Stadium, Chifeng Road, Dabai Tree Road and Tongji University Subway Stations. Cluster 5, involving Huangpu and Jing’an Districts, observed human movement mainly from Jing’an Temple towards the Natural History Museum and from the Bund towards People’s Square Subway Station, with significant clustering near East Nanjing Road and Laoximen Subway Stations. Cluster 6, primarily in the Xuhui District, saw movements from Xujiahui towards the Shanghai South Railway Station.

Through these cluster analyses, it was observed that rainy conditions led to a decrease in the number, distance and duration of bike rides. However, the overall direction of the OD flows remains largely unchanged, with most daily travel clusters within the same regional areas. Jing’an District emerged as a popular area, with a significant number of bike rides heading there before, during and after the rainy day.

3.2.2. Movement patterns during weekdays and weekends

This study analysed the clustering of human movement on rest days (August 13th and 14th) and workdays (August 11th and 12th and August 15th and 16th) using a spectral clustering method based on geographic flow. The elbow method helped determine the optimal number of OD flow clusters for these periods. As shown in the red boxes in the , the best cluster number for weekend OD flows was five, while for workdays before and after the weekend; the optimal numbers were five and four, respectively.

Figure 9. Trends in OD flow SSE before, during and after the weekend.

Figure 9. Trends in OD flow SSE before, during and after the weekend.

The clustering results are presented in . From , on August 11th and 12th, Cluster 1 was mainly concentrated in the Putuo District. Human movement was primarily from Jinshajiang Road Subway Station to Qingjian Garden and from Shanghai West Station to Central Ring Hu-Jia Expressway Bridge, with significant clustering near Qilianshan Road, Qilianshan South Road, Zhenru, and Liziyuan Subway Stations. Cluster 2 focused on Zhabei District, northern Hongkou District, northwestern Yangpu District and the junction of these districts with Baoshan District. Movements in Cluster 2 predominantly spanned from the Wenshui Road Subway entrance of the Central Ring-Gonghexin Road Overpass to the vicinity of the Town West Residence Committee in Jiangwan Town and from Gongkang Road Subway Station to Yingao West Road Subway Station, forming noticeable clusters near Changjiang South Road, Shanghai Circus City, Yanchang Road and Dabai Tree Subway Stations. Cluster 3 was mainly located in the southern part of Hong Kou District and most of Yangpu District, where people moved northward from the Quyang Road Subway Station and from around Hailun Road to Jiangpu Road Subway Station, with significant clustering near the North Bund, Ningguo Road Subway Station and Yanji Middle Road Subway Station. Cluster 4, focusing in Huangpu District and Jing’an District, saw movements from the Great World and the First National Congress of the Communist Party of China vicinity to Tianzifang and from Jing’an Temple to Shanghai Station, with notable clusters near Xujiahui, People’s Square, Damuqiao Road Subway Station and Madang Road Subway Station. Cluster 5 was primarily in Xuhui District, with movements from Shanghai Stadium Subway Station to Shanghai South Station and from Wuzhong Road to Caohejing Development Zone Subway Station, forming significant clusters near the Lianhua Road, Shuicheng Road, and Guilin Road Subway Stations.

Figure 10. Clustering results of cycling patterns on (a) the weekdays of August 11 and 12, (b) the weekend of August 13 and 14 and (c) the weekdays of August 15 and 16.

Figure 10. Clustering results of cycling patterns on (a) the weekdays of August 11 and 12, (b) the weekend of August 13 and 14 and (c) the weekdays of August 15 and 16.

shows that on August 13th and 14th, Cluster 1 was mainly in Putuo District, where human movement was primarily from Liziyuan Subway Station to Qian’an Road Subway Station and from Dahua First Road to Shanghai Station, with significant clustering near Zhenping Road, Changshou Road Subway Stations, and Xianghe Park. Cluster 2 focuses on Zhabei District, the northern part of Hong Kou District and the junction with Baoshan District, with movements from Tonghexincun Subway Station to Beijiao Station and from Liangcheng Park to Jiangwan Town Subway Station, forming noticeable clusters near the Pengpu Xincun Subway Station, Yindu Second Village and Danning Lingshi Park. Cluster 3, concentrated in the southern part of Hong Kou District and Yangpu District, saw movements from Wujiaochang Town to Guowei Road Overpass, Tongji University to Sichuan North Road Subway Station and Huangxing Road to Wujiaochang, with significant clusters near the Ningguo Road Subway Station, Jiangpu Road Subway Station and Lu Xun Park. Cluster 4 was mainly in Huangpu District, with movements from the Bund to Jing’an District, from Damuqiao Road Subway Station to the First National Congress of the Communist Party of China, and from Laoximen to Nanpu Bridge, forming noticeable clusters near Huaihai Middle Road, Luban Road and West Nanjing Road. Cluster 5, primarily in Xuhui District, saw human movement from Donghua University towards Xujiahui, with significant clustering near Zhongshan Park, Fudan University and Shanghai Stadium.

From , it is apparent that on August 15th and 16th, Cluster 1 was mainly in Putuo District, with movement from Shanghai Station to Dahua Road and from Fengqiao Road to Jiading District, forming noticeable clusters near Nanxiang Town, Zhenping Road Subway Station and Tongchuan Road. Cluster 2, concentrated in Zhabei District, the northern part of Hong Kou District, and Yangpu District, saw movements from the vicinity of Shanghai University of Finance and Economics to Jiangwan Town Subway Station and from the Xingyuan Community to the Central Ring-Gonghexin Road Overpass, with significant clustering near Shanghai Circus City, Pengpu Xincun Subway Station and Jiangwan Stadium. Cluster 3, which focused on the junction of Huangpu, Zhabei and Hongkou Districts, as well as the southern part of Yangpu District, involved human movement from the vicinity of Sichuan North Road to the Jiangpu Road Subway Station and from Huangxing Park to Nenjiang Road, with notable clustering near Tibet Road, Yangpu Park and Jiangpu Park. Cluster 4 was mainly in Xuhui District and its junction with Huangpu, Jing’an and Changning District. Human movement in this cluster was predominantly from Dapuqiao Subway Station vicinity to Caojiadu, from Xujiahui to Zhongshan Park, and from Shanghai South Station to Luoxiu Road Subway Station, with significant clustering near Hongqiao Road Subway Station, Jing’an Temple and Xujiahui.

The final results revealed that during weekends, people tend to opt for longer and more distant bike rides, with the overall OD flow direction moving from the inner ring to the outer ring. Conversely, on workdays, movement was predominantly within the inner ring, heading towards major employment areas and subway stations. This pattern reflects a distinct variation in human mobility preferences and behaviors between rest and workdays.

3.2.3. Movement patterns during the morning and evening peaks

This study utilized a spectral clustering method based on geographic flow and analysed the clustering situation during the morning and evening peak hours of two workweeks: August 2nd–5th and August 9th–12th. The elbow method was employed to determine the optimal number of OD flow clusters for the four periods. As indicated by the red boxes in , the optimal cluster numbers for the morning and evening peaks from August 2nd to 5th were identified as 4 and 5, respectively, whereas for August 9th to 12th, they were similarly determined to be 4 and 5, respectively.

Figure 11. Morning and evening peak OD flow SSE trends.

Figure 11. Morning and evening peak OD flow SSE trends.

The clustering results, as depicted in , show that there were four prominent clusters during the morning peak periods (). Specifically, Cluster 1, predominantly in the Putuo District, saw humans preferring shorter bike rides, with significant clustering near the Qilianshan Road, Liziyuan, Caoyang Road, Qilianshan South Road and Xingzhi Road Subway Stations. Cluster 2, mainly covering Zhabei District and the northern parts of Hongkou and Yangpu Districts, moved from Jiangwan Stadium towards Jiangwan Town and from Henggao Home towards Yingao West Road Subway Station, with notable clusters around Zhongshan North Road, Hongkou Foompionship Stadium, Yingao East Road, Dabai Tree and Xinjiangwancheng Subway Stations. Cluster 3 focused on the southern parts of Yangpu and Hongkou Districts and involved movements from the vicinity of Guowei Road Overpass to Shiguang Road Subway Station and from Fengcheng Third Village to Tongji University, with significant clustering near Yanji Middle Road and Jiangpu Road Subway Stations. Cluster 4, concentrated in Huangpu District, Jing’an District and Xuhui District, saw human movement from Donghua University of Technology towards Shanghai South Station and from Ruijin South Road towards Shanghai Stadium, with clustering near Tibet South Road, Zhongshan Park and Jing’an Temple Subway Stations.

Figure 12. Clustering results of cycling during morning and evening peaks for (a) the morning peak during the first weekday (August 1–5), (b) the evening peak during the first weekday (August 1–5), (c) the morning peak during the second weekday (August 8–12) and (d) the evening peak during the second weekday (August 8–12).

Figure 12. Clustering results of cycling during morning and evening peaks for (a) the morning peak during the first weekday (August 1–5), (b) the evening peak during the first weekday (August 1–5), (c) the morning peak during the second weekday (August 8–12) and (d) the evening peak during the second weekday (August 8–12).

Five distinct clusters emerged during the evening peak period (). Cluster 1 in Putuo District, in the morning, indicated a preference for short-distance biking, with clustering near Taopu Village, Qilianshan Road, the Liziyuan Subway Station and Dahua Wanjia. Cluster 2, spanning Zhabei District and the northern parts of Hongkou and Yangpu Districts, showed movements from Jinyuan Community towards Hongkou Foompionship Stadium and from Jiangwan Town towards Wujiaochang, with clustering around the Changjiang South Road, Xinjiangwancheng, and Pengpu Xincun Subway Stations. Cluster 3 in Yangpu District involved movements from Wujiaochang Town towards the Guowei Road Overpass, with clustering near Shiguang Road, Nenjiang Road, Jiangwan Stadium and Huangxing Road Subway Stations. Cluster 4, covering Huangpu District, southern parts of Zhabei District and Hongkou District, showed movement from the Natural History Museum towards People’s Square and from Luban Road towards the First National Congress of the Communist Party of China, with clustering near the Great World, Baoshan Road and Xiaonanmen Subway Stations. Cluster 5, mainly in Xuhui District and the eastern part of Changning District, saw movements from Zhongshan Park towards Jing’an District and from Xujiahui towards Fudan University Medical College, with significant clustering near Shanghai Stadium, Hongqiao Road and Zhongshan Park Subway Stations.

In summary, during the morning peak hours, human movement tended to be directed towards the city center and within the inner ring, whereas during the evening peak, the flow direction was generally from the inner ring to the middle and outer rings. In the morning, people predominantly move from residential areas to commercial buildings or subway stations, whereas in the evening, the destinations are more residential, cultural and entertainment. The evening peak biking direction mostly moved from the inner ring to the middle and outer rings, with the land use types mainly being residential areas and near subway stations. This pattern aligns with real-world circumstances and is likely caused by traffic congestion during peak hours. To save travel time, people prefer using Mobikes or a combination of biking and subways as their mode of transportation.

3.3. Movement patterns across land use types

According to a study by Shen et al. (Citation2020), a strong link exists between land-use types and human travel activities. Therefore, in this study, the land-use types in the study area were classified and combined with the origin points of Mobike trips to explore the influence of land-use types on human movement patterns in Shanghai. In this study, we referred to the land use rules proposed by Gong et al. (Citation2020) to classify land use into five types: residential, business, industrial, transportation, public administration and services.

The trip origin points were generally consistent over time in terms of land-use types, with most bicycle use in residential areas, a moderate number of trips in business and public administration and services areas and the fewest origin points at transportation locations (). However, clusters within the same category exhibited different spatial and temporal characteristics. For example, for the commercial and residential types, the clusters exhibited different sizes and peaks. The overall proportion of trips to industrial and transportation areas was significantly higher during the morning peak than the evening peak, whereas trips to residential areas were higher during the evening peak than during the morning peak. Travel was most common between residential areas, business areas and public administration and services, with fewer trips between industrial areas. This could be due to the long working hours in these areas, thereby decreasing trips and people choosing to ride short distances between neighborhoods or business districts for shopping. As shown in , clusters A and G were trips between public administration and services to residential areas, clusters B and C were trips between different residential areas, clusters D and F were trips between business and residential areas, and cluster E depicted trips between residential areas, business areas, public administration and services.

Figure 13. Distribution map of riding patterns in different land-use types.

Figure 13. Distribution map of riding patterns in different land-use types.

Table 2. Percentage of rides encompassing different land-use types during the morning and evening peak.

4. Discussion

4.1. Consistency comparison

To demonstrate the consistency and differences in the clustering results more clearly, this study selected data from August 15th as the subject for comparison. A comparison experiment was conducted using the flowHDBSCAN and spectral clustering methods for the geographic flows proposed in this research. Because flowHDBSCAN is a density-based clustering algorithm that does not require pre-specification of the number of clusters, the elbow method cannot be utilized to determine its optimal cluster count. Instead, we continually adjusted and optimized the parameters of flowHDBSCAN, ultimately selecting a ‘min_ sample’ parameter of 15 and ‘min_cluster_size’ parameter of 20. The min_ sample parameter represents the minimum number of points required within a specific radius (defined by the ‘max_eps’ parameter, which was set to infinity in this study), including the point itself. The min_cluster_size parameter, on the other hand, defines the minimum number of samples that a cluster must contain, resulting in an optimal cluster count of 57. For a more accurate comparison, 57 clusters were also selected for the spectral clustering of geographic flows. The clustering results are shown in , where represents the flowHDBSCAN results and shows the spectral clustering results, indicating that the clusters detected by flowHDBSCAN are also represented in the spectral clustering of geographic flows. However, the clusters identified by spectral clustering that were not detected by flowHDBSCAN are highlighted by the red circles in . Additionally, the clustering results reflected the predominance of short-distance bike rides, particularly those concentrated in central Shanghai. To enhance the scientific validity of the clustering results, we compared our experimental outcomes with those of Li et al. (Citation2020). In Li et al.’s research, hotspots for bicycle riding were primarily located in the core areas of Shanghai, which is consistent with the findings of this study, thus validating the reliability of clustering conducted in this research.

Figure 14. Comparison of spectral clustering and flowHDBSCAN clustering results.

Figure 14. Comparison of spectral clustering and flowHDBSCAN clustering results.

4.2. Comparison of clustering methods

Existing studies on flow performance comparisons are mainly divided into two categories: comparisons with real-world results and comparisons with other methods. Common comparison methods include the SC, QMeasure (Lee et al. Citation2007), clustering correctness (CC) index (Tao et al. Citation2017) and computational complexity (Zhu et al. Citation2019). In comparison with real-world outcomes, scholars typically analyse derived travel patterns and integrate them with other datasets, such as different transportation data, road networks, mobile phone data and land use. If the experimental results aligned with common human knowledge, the feasibility of the experiment was confirmed. Comparisons between different methods are also common, with new research approaches often compared to existing methods in terms of algorithmic time, complexity and performance.

This study highlighted the superior performance of a spectral clustering algorithm for geographic flows through comparisons with different methods. Using data from August 15th as an example, this section begins with a comparison with traditional clustering methods, including k-means and kernel density. The optimal number of clusters for these methods is determined by the elbow method, as illustrated by the red boxes in , which shows that the optimal number of clusters for the spectral clustering of geographic flows is six, whereas both the k-means and kernel density methods have an optimal number of seven clusters. Since Gaussian mixture models (GMM) is a probability-based clustering method without a direct ‘internal distance’ measure like the SSE in k-means, using the traditional elbow method (commonly applied to k-means) to determine the optimal number of clusters for GMM encountered some challenges. Therefore, this study employed the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to evaluate the optimal number of clusters for the GMM, as shown by the red boxes in , indicating an optimal number of seven clusters for the GMM.

Figure 15. (a) Trend of SSE (sum of squared errors) for ‘flow_spectral,’ k-means, kernel density and ‘flow_dissimilarity’; (b) trend of AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for Gaussian mixture models.

Figure 15. (a) Trend of SSE (sum of squared errors) for ‘flow_spectral,’ k-means, kernel density and ‘flow_dissimilarity’; (b) trend of AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for Gaussian mixture models.

Building on this foundation, this study compared different clustering methods at their optimal number of clusters using the SC, Davies–Bouldin index and CHI, as shown in . The SC for the spectral clustering method of geographic flows was significantly higher than that of other traditional methods, and the Davies–Bouldin index was lower than that of traditional clustering methods, indicating higher compactness and better separation between clusters in the spectral clustering method. However, the CHI for the kernel density clustering method was higher than that for the spectral clustering method of geographic flows because of its ability to adapt to different density regions within the data (Tran et al. Citation2006), effectively handling situations with varying density gradients. Overall, the spectral clustering method for geographic flows outperformed the kernel density clustering method in terms of overall clustering performance.

Table 3. Comparison of clustering results between ‘flow_spectral’ clustering method and traditional clustering methods.

Furthermore, comparisons were conducted using geographic flow clustering methods. In the realm of geographic flow clustering, techniques, such as BiFlowLISA, flow quotient, cross k-function and FCLPs are utilized for spatial clustering of multivariate geographic flows. However, our research focused solely on univariate geographical-flow spatial clustering, rendering a comparison with these methods challenging. Additionally, the global and local versions of the spatiotemporal flow L-function incorporate temporal information (Yan et al. Citation2023), whereas methods, such as the ACO-based spatial scanning statistical method and ‘SNN_flow,’ emphasize exploring the local features of spatial data (Song et al. Citation2019; Liu et al. Citation2021). By contrast, the geographic flow spectral clustering method proposed herein, which is grounded in graph theory and matrix eigenvalues, concentrates on the decomposition and clustering of the overall data structure, making comparisons with these methods inappropriate. Consequently, flowHDBSCAN (Tao et al. Citation2017), OPTICS (Fang et al. Citation2021) and the flow dissimilarity clustering method proposed by Tao and Thill (Citation2016) were selected for comparison with the proposed geographic flow spectral clustering method. As illustrated by the red box in , the optimal cluster count for the ‘flow_Dissimilarity’ method was six, whereas for flowHDBSCAN, as determined in Section 4.1, it was 57. For the OPTICS method, continuous parameter adjustment and optimization led to the selection of a min_ sample parameter of 5, min_cluster_size parameter of 0.1 and an xi of 0.3, with xi utilized to define the stability of clustering boundaries, resulting in an optimal cluster count of 121. Based on this, a comparison of these clustering methods at their optimal cluster counts in terms of the SC, Davies–Bouldin index and CHI, as shown in , reveals that the geographic flow spectral clustering method’s SC and CHI are significantly higher than those of the other three flow clustering methods, with a lower Davies–Bouldin index, indicating superior clustering performance compared to flowHDBSCAN, flow_Dissimilarity and OPTICS.

Table 4. Comparison of clustering results between ‘flow_spectral’ clustering method and other flow clustering methods.

4.3. Limitations

Although this study showed daily bike-share riding patterns to a certain extent, these patterns are ultimately unpredictable. First, only the trajectory and land-use data of Mobike were used, thereby falling short of achieving a comprehensive understanding of travel behavior. Future studies should seek to include information on road networks, road conditions, cell phone navigation and smart card data, as these can affect riding distances and routes. Second, the development of information technology, machine learning (Zheng et al. Citation2021), graph neural networks (Wang et al. Citation2022), topology (Alemayehu and Bitsuamlak Citation2022) and other methods are constantly advancing in the field of flow. Thus, in subsequent studies, performance comparisons should be made in this respect. Additionally, only location information was considered in the clustering; however, the attributes of the flow data also affected the clustering results. Future studies should examine these aspects in depth to improve the accuracy of the clustering results.

5. Conclusion

In this study, we propose a new method of geographic flow clustering, that is, spectral clustering of geographic flows, designed to detect patterns of geographic flows in the field of geographic information. Utilizing the SC, Davies–Bouldin index and CHI as clustering metrics, this study compares the clustering performance of the new method with traditional clustering methods and related geographic flow clustering approaches. The results indicate that the proposed method demonstrates a robust clustering performance. Additionally, this study analyses bike-sharing data in Shanghai by applying flow spectral clustering to explore urban mobility patterns. The specific experimental conclusions are as follows.

  1. Cluster analysis of the flow data for the three days before, during, and after heavy rain showed that rain markedly affects ride frequency, duration, and distance. There were 52.41% fewer rides on rainy days when compared to those of cloudy days, and 89.04% of these rides lasted less than 30 min. Additionally, cluster analysis of the spectrum of geographic flows revealed that rainy days had a minimal impact on the direction of travel, as the trend throughout the day continued to be between residential areas, commercial areas, and subway stations.

  2. Morning and evening peak clustering analyses revealed opposing trends. Clustering during the morning peak showed that 49.99% of cyclists traveled from residential to commercial areas or subway stations, whereas during the evening peak, 28.57% of users travelled from subway stations to residential areas. The overall distance traveled during the evening peak was lower than that during the morning peak, which may be related to the fatigue level after a long working day.

  3. On weekdays, 71.42% of the users moved between residential and commercial areas to subway stations. On weekends, 28.57% of the users moved from schools to public administration and service land used for recreation and culture.

Author contributions

Wenwen Xing and Youjun Tu contributed equally to this article. Writing – original draft, Visualization, Writing – review and editing, Wenwen Xing; Methodology and Software, Youjun Tu; Data curation Yuxing Gao; Funding acquisition, Supervision AND Conceptualization, Junli Li and Zongyi He; Funding acquisition, Junli Li.

Acknowledgements

The data in this study were obtained from publicly available data in Wenwen Li’s article, ‘Understanding intra-urban human mobility through an exploratory spatiotemporal analysis of bike-sharing trajectories’, and we would like to thank the authors of this article for providing the data. The support of Cheng Zhou for the code of this study is also appreciated.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the Key Natural Science Research Project of Higher Education Institutions in Anhui Province (No. 2023AH051013). The article was also supported by Natural Science Foundation of Anhui Province (No. 2108085MD129), and the Anhui New Era Education Quality Engineering Project (Graduate Education) (No. 2022zyxwjxalk039) and the Education Quality Engineering Project of Anhui Province (No. 2021xxkc038).

References

  • Aïmeur E, Brassard G, Gambs S. 2007. Quantum clustering algorithms. Proceedings of the 24th International Conference on Machine Learning; June 20–24; Oregon State University. New York, NY: Association for Computing Machinery; p. 1–8. doi: 10.1145/1273496.1273497.
  • Alemayehu TF, Bitsuamlak GT. 2022. Autonomous urban topology generation for urban flow modelling. Sustain Cities Soc. 87:104181. doi: 10.1016/j.scs.2022.104181.
  • Bäcklund H, Hedblom A, Neijman N. 2011. A density-based spatial clustering of application with noise. Data Min TNM033. 33:11–30.
  • Bagirov AM, Aliguliyev RM, Sultanova N. 2023. Finding compact and well-separated clusters: clustering using silhouette coefficients. Pattern Recognit. 135:109144. doi: 10.1016/j.patcog.2022.109144.
  • Berglund S, Karlstrom A. 1999. Identifying local spatial association in flow data. J Geograph Syst. 1(3):219–236. doi: 10.1007/s101090050013.
  • Birant D, Kut A. 2007. ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowledge Eng. 60(1):208–221. doi: 10.1016/j.datak.2006.01.013.
  • Bureva V, Traneva V, Zoteva D, Tranev S. 2021. Generalized net model simulation of cluster analysis using CLIQUE: clustering in quest. In: Dimov I, Fidanova S, editors. Advances in high performance computing. Berlin, Germany: Springer International Publishing; p. 48–60.
  • Cai J, Kwan MP. 2022. Discovering co-location patterns in multivariate spatial flow data. Int J Geograph Inform Sci. 36(4):720–748. doi: 10.1080/13658816.2021.1980217.
  • Charreire H, Weber C, Chaix B, Salze P, Casey R, Banos A, Badariotti D, Kesse-Guyot E, Hercberg S, Simon C, et al. 2012. Identifying built environmental patterns using cluster analysis and GIS: relationships with walking, cycling and body mass index in French adults. Int J Behav Nutr Phys Act. 9(1):1. doi: 10.1186/1479-5868-9-59.
  • Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B (Methodological). 39(1):1–22. doi: 10.1111/j.2517-6161.1977.tb01600.x.
  • Du KL. 2010. Clustering: a neural network approach. Neural Netw. 23(1):89–107. doi: 10.1016/j.neunet.2009.08.007.
  • Dunn JC. 1974. Well-separated clusters and optimal fuzzy partitions. J Cybernet. 4(1):95–104. doi: 10.1080/01969727408546059.
  • Fang M, Tang L, Kan Z, Yang X, Pei T, Li Q, Li C. 2021. An adaptive origin-destination flows cluster-detecting method to identify urban mobility trends. arXiv preprint arXiv:2106.05436. doi: 10.48550/arXiv.2106.05436.
  • Fotheringham AS. 1984. Spatial flows and spatial patterns. Environ Plan A. 16(4):529–543. doi: 10.1068/a160529.
  • Gallego FJ, Batista F, Rocha C, Mubareka S. 2011. Disaggregating population density of the European Union with CORINE land cover. Int J Geograph Inform Sci. 25(12):2051–2069. doi: 10.1080/13658816.2011.583653.
  • Gong P, Chen B, Li X, Liu H, Wang J, Bai Y, Chen J, Chen X, Fang L, Feng S, et al. 2020. Mapping essential urban land use categories in China (EULUC-China): preliminary results for 2018. Sci Bull. 65(3):182–187. doi: 10.1016/j.scib.2019.12.007.
  • Graser A, Schmidt J, Roth F, Brändle N. 2019. Untangling origin-destination flows in geographic information systems. Inf Vis. 18(1):153–172. doi: 10.1177/1473871617738122.
  • Guo B, Pei T, Song C, Shu H, Wu M, Guo S, Jiang J, Du P. 2023. Trend surface analysis of geographic flows. Int J Geograph Inform Sci. 37(1):118–137. doi: 10.1080/13658816.2022.2129660.
  • Guo D, Zhu X, Jin H, Gao P, Andris C. 2012. Discovering spatial patterns in origin-destination mobility data. Trans GIS. 16(3):411–429. doi: 10.1111/j.1467-9671.2012.01344.x.
  • Huang Z. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowledge Discov. 2(3):283–304. doi: 10.1023/A:1009769707641.
  • Jia H, Ding S, Xu X, Nie R. 2014. The latest research progress on spectral clustering. Neural Comput & Applic. 24(7–8):1477–1486. doi: 10.1007/s00521-013-1439-2.
  • Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, Decker S. 2021. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform. 22(1):393–415. doi: 10.1093/bib/bbz170.
  • Kermani S, Samadzadehaghdam N, EtehadTavakol M. 2015. Automatic color segmentation of breast infrared images using a Gaussian mixture model. Optik. 126(21):3288–3294. doi: 10.1016/j.ijleo.2015.08.007.
  • Kim J, Jung I. 2017. Evaluation of the gini coefficient in spatial scan statistics for detecting irregularly shaped clusters. PLoS One. 12(1):e0170736. doi: 10.1371/journal.pone.0170736.
  • Lee JG, Han J, Whang KY. 2007. Trajectory clustering: a partition-and-group framework. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data; June 11–14; Beijing, China. New York, NY: Association for Computing Machinery; p. 593–604. doi: 10.1145/1247480.1247546.
  • LeSage JP, Fischer MM. 2010. Spatial econometric methods for modeling origin-destination flows. In: Fischer MM, Getis A, editors. Handbook of applied spatial analysis: software tools, methods and applications. Berlin, Germany: Springer; p. 409–433.
  • Li W, Wang S, Zhang X, Jia Q, Tian Y. 2020. Understanding intra-urban human mobility through an exploratory spatiotemporal analysis of bike-sharing trajectories. Int J Geograph Inform Sci. 34(12):2451–2474. doi: 10.1080/13658816.2020.1712401.
  • Liu Q, Yang J, Deng M, Song C, Liu W. 2021. SNN_flow: a shared nearest-neighbor-based clustering method for inhomogeneous origin-destination flows. Int J Geograph Inform Sci. 36(2):253–279. doi: 10.1080/13658816.2021.1899184.
  • Liu Y, Tong D, Liu X. 2015. Measuring spatial autocorrelation of vectors. Geog Anal. 47(3):300–319. doi: 10.1111/gean.12069.
  • Lorenzo L, Arroyo J. 2022. Analysis of the cryptocurrency market using different prototype-based clustering techniques. Financ Innov. 8(1):7. doi: 10.1186/s40854-021-00310-9.
  • Lorenzo L, Arroyo J. 2023. Online risk-based portfolio allocation on subsets of crypto assets applying a prototype-based clustering algorithm. Financ Innov. 9(1):25. doi: 10.1186/s40854-022-00438-2.
  • Lu Y, Wang L, Lu J, Yang J, Shen C. 2014. Multiple kernel clustering based on centered kernel alignment. Pattern Recogn. 47(11):3656–3664. doi: 10.1016/j.patcog.2014.05.005.
  • Ma D, Tanizaki H. 2022. Intraday patterns of price clustering in Bitcoin. Financ Innov. 8(1):4. doi: 10.1186/s40854-021-00307-4.
  • Markou MT, Kassomenos P. 2010. Cluster analysis of five years of back trajectories arriving in Athens, Greece. Atmos Res. 98(2–4):438–457. doi: 10.1016/j.atmosres.2010.08.006.
  • Nascimento MCV, de Carvalho ACPLF. 2011. Spectral methods for graph clustering – A survey. Eur J Oper Res. 211(2):221–231. doi: 10.1016/j.ejor.2010.08.012.
  • Oyelade J, Isewon I, Oladipupo O, Emebo O, Omogbadegun Z, Aromolaran O, Uwoghiren E, Olaniyan D, Olawole O. 2019. Data clustering: algorithms and its applications. 2019 19th International Conference on Computational Science and Its Applications (ICCSA); July 1–4; St. Petersburg, Russia. Piscataway, NJ: Institute of Electrical and Electronics Engineers; p. 71–81. doi: 10.1109/ICCSA.2019.000-1.
  • Pei T, Shu H, Guo S, Song C, Chen J, Liu Y, Wang X. 2020. The concept and classification of spatial patterns of geographical flow. J Geo- Inform Sci. 22(1):30–40. doi: 10.12082/dqxxkx.2020.190736.
  • Qiao D, Yang X, Liang Y, Hao X. 2022. Rapid trajectory clustering based on neighbor spatial analysis. Pattern Recog Lett. 156:167–173. doi: 10.1016/j.patrec.2022.03.010.
  • Ros F, Riad R, Guillaume S. 2023. PDBI: a partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing. 528:178–199. doi: 10.1016/j.neucom.2023.01.043.
  • Rousseeuw PJ. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 20:53–65. doi: 10.1016/0377-0427(87)90125-7.
  • Shen P, Ouyang L, Wang C, Shi Y, Su Y. 2020. Cluster and characteristic analysis of Shanghai metro stations based on metro card and land-use data. Geo-Spat Inf Sci. 23(4):352–361. doi: 10.1080/10095020.2020.1846463.
  • Shi C, Wei B, Wei S, Wang W, Liu H, Liu J. 2021. A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. J Wireless Com Netw. 2021(1):31. doi: 10.1186/s13638-021-01910-w.
  • Shu H, Pei T, Song C, Chen X, Guo S, Liu Y, Chen J, Wang X, Zhou C. 2021. L-function of geographical flows. Int J Geograph Inform Sci. 35(4):689–716. doi: 10.1080/13658816.2020.1749277.
  • Smyth P. 2000. Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput. 10(1):63–72. doi: 10.1023/A:1008940618127.
  • Song C, Pei T, Ma T, Du Y, Shu H, Guo S, Fan Z. 2019. Detecting arbitrarily shaped clusters in origin-destination flows using ant colony optimization. Int J Geograph Inform Sci. 33(1):134–154. doi: 10.1080/13658816.2018.1516287.
  • Tao R, Thill JC. 2019. Flow Cross K-function: a bivariate flow analytical method. Int J Geograph Inform Sci. 33(10):2055–2071. doi: 10.1080/13658816.2019.1608362.
  • Tao R, Thill JC. 2020. BiFlowLISA: measuring spatial association for bivariate flow data. Comput Environ Urban Syst. 83:101519. doi: 10.1016/j.compenvurbsys.2020.101519.
  • Tao R, Thill JC, Depken C, Kashiha M. 2017. FlowHDBSCAN: a hierarchical and density-based spatial flow clustering method. Proceedings of the 3rd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics; November 7–10; Redondo Beach CA USA. New York, NY: Association for Computing Machinery; p. 1–8.
  • Tao R, Thill JC. 2016. Spatial cluster detection in spatial flow data. Geog Anal. 48(4):355–372. doi: 10.1111/gean.12100.
  • Tran TN, Wehrens R, Buydens LMC. 2006. KNN-kernel density-based clustering for high-dimensional multivariate data. Comput Stat Data Anal. 51(2):513–525. doi: 10.1016/j.csda.2005.10.001.
  • Wang T, Ni S, Qin T, Cao D. 2022. TransGAT: a dynamic graph attention residual networks for traffic flow forecasting. Sustain Comput Inform Syst. 36:100779. doi: 10.1016/j.suscom.2022.100779.
  • Yan X, Pei T, Shu H, Song C, Wu M, Fang Z, Chen J. 2023. Spatiotemporal flow L-function: a new method for identifying spatiotemporal clusters in geographical flow data. Int J Geograph Inform Sci. 37(7):1615–1639. doi: 10.1080/13658816.2023.2204345.
  • Yang MS. 1993. A survey of fuzzy clustering. Math Comput Modell. 18(11):1–16. doi: 10.1016/0895-7177(93)90202-A.
  • Zheng Z, Ling X, Wang P, Xiao J, Zhang F. 2020. Hybrid model for predicting anomalous large passenger flow in urban metros. IET Intell Transp Syst. 14(14):1987–1996. doi: 10.1049/iet-its.2020.0054.
  • Zhao P, Liu X, Shen J, Chen M. 2019. A network distance and graph-partitioning-based clustering method for improving the accuracy of urban hotspot detection. Geocarto Int. 34(3):293–315. doi: 10.1080/10106049.2017.1404140.
  • Zhou M, Yang M, Chen Z. 2023. Flow colocation quotient: measuring bivariate spatial association for flow data. Comput Environ Urban Syst. 99:101916. doi: 10.1016/j.compenvurbsys.2022.101916.
  • Zhou Z, Si G, Sun H, Qu K, Hou W. 2022. A robust clustering algorithm based on the identification of core points and KNN kernel density estimation. Expert Syst Appl. 195:116573. doi: 10.1016/j.eswa.2022.116573.
  • Zhu X, Guo D. 2014. Mapping large spatial flow data with hierarchical clustering. Trans GIS. 18(3):421–435. doi: 10.1111/tgis.12100.
  • Zhu X, Guo D, Koylu C, Chen C. 2019. Density-based multi-scale flow mapping and generalization. Comput Environ Urban Syst. 77:101359. doi: 10.1016/j.compenvurbsys.2019.101359.
  • Zou Q, Lin G, Jiang X, Liu X, Zeng X. 2020. Sequence clustering in bioinformatics: an empirical study. Brief Bioinform. 21(1):1–10. doi: 10.1093/bib/bby090.