459
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Identifying city communities in China by fusing multisource flow data

, &
Pages 4247-4264 | Received 13 Apr 2023, Accepted 04 Oct 2023, Published online: 11 Oct 2023

ABSTRACT

The identification of city communities is essential for the regional planning and management of metropolitan areas. City communities could be identified from the perspective of mobile flows, and integrating the advantages of multisource flow data is essential to measure intercity connectivity. In this research, a multisource flow fusion method, which avoids the one-sidedness of a single flow, is proposed to combine the characteristics of population flow, traffic flow, and information flow. Moreover, the silhouette coefficient and hierarchical clustering algorithm are used to determine the number of city communities and the final clustering result. The results show that (1) although population flow, cargo flow, and information flow are positively correlated, there are also some differences among them; (2) the fusion flow between cities in China presents an obvious diamond structure, among which Beijing, Shanghai, Guangzhou, and Chengdu are the four points of the diamond, and Wuhan is located almost in the centre of the diamond structure; and (3) city communities are identified based on multisource flow and hierarchical clustering algorithm, which meet the principles that closely connected cities are in the same community and nonclosely connected cities are in different communities.

1. Introduction

Communities are usually groups of objects with a higher probability of being connected to each other than to members of other groups (Fortunato and Hric Citation2016). A city community is a body that is highly integrated by one or more core cities and a number of small and medium-sized cities in a specific geographical region and relies on dense transportation networks such as highways, subways and urban rail transit. In recent years, with the continuous progress of economic globalisation, the city community is becoming the main body of world economic development and the spatial unit representing countries in global competition and the international division of labour (Fang and Yu Citation2017). Hence, the identification of city communities not only helps to understand the relationship between different cities but also contributes to reasonably determining the geographic space of city communities, which is also the primary premise for carrying out research on regional planning and management of metropolitan areas.

In recent years, there has been much research on the identification of city communities. City communities can be defined from the perspective of functional linkages rather than morphological dimensions (Ding et al. Citation2022), and inter-city flows provide new perspectives for evaluating spatial interactions of cities (Gong et al. Citation2021). Population flow (Kraft et al. Citation2022), traffic flow (Zhang et al. Citation2023), and information flow (Krings et al. Citation2009; South et al. Citation2022) are the most commonly used mobile flows to measure intercity connectivity. Population flow, traffic flow, and information flow are usually from different data sources and separately reflect the different connections between cities. In recent years, multisource flows were used in many fields, such as the estimation of regional economic development (Li et al. Citation2020a), analysing patterns of urban development (Zhen et al. Citation2019), and analysing the coordinated development of cities (Yin et al. Citation2023). Therefore, integrating the advantages of multisource data is essential to measure intercity connectivity (Lin, Wu, and Li Citation2019).

By the end of 2020, the urbanisation rate of China reached 63.89%, according to the Seventh National Census Bulletin released by the National Bureau of Statistics of China. With the rapid development of urbanisation, China has developed a series of city communities. Many scholars have explored city communities in China. For example, twenty-three city communities were identified using passenger travel data from the perspective of passenger transport during ordinary times (Xu et al. Citation2017), seven city communities were identified from the perspective of population flow during the country's spring festival (Wei et al. Citation2018), nineteen urban economic regions were identified from the perspective of highway passenger flow (Chen et al. Citation2018), nine groups were divided based on inbound tourist flows in China (Qin et al. Citation2019), four sub-communities were divided based on the energy flows in China (Tang et al. Citation2019), 21 underlying megaregions were identified based on the highway flows (Chen Citation2021), and five communities were divided base on carbon emissions association flow in China during the regulation period (2015–2017) (Cai, Wang, and Zhu Citation2022). These studies identified city communities in China from a single flow data and the results are somewhat one-sided. Different data sources may cover various aspects of the topic, and combining them helps to gain a more nuanced and in-depth understanding of the communities of cities in China.

In this study, multisource data are used to identify city communities in China. The main contribution of this article is the construction of a multisource flow data fusion method to identify city communities, which avoids the one-sidedness of a single flow. This paper proceeds as follows. Section 2 introduces the related literature. Then, the study area and data sources are introduced in Section 3. We then briefly describe the multisource flow data fusion method and city community detection method in Section 4. The distribution of intercity multisource flows and the results of city community detection are presented in Section 5. A comparison of the different flows and a comparison of city communities’ division results are conducted in Section 6. Finally, we discuss the results and conclude the paper.

2. Literature review

The emergence of social media data has facilitated our research on the acquisition of population flows. Social media data generally include a short textual message, a photo, and the time and location indicating when and where the message was posted (Liu et al. Citation2014). Social media data with rich, timely, and accessible spatial information (Jing et al. Citation2023), such as Twitter (Bakillah, Lia, and Liang Citation2015; Hasnat and Hasan Citation2018), Zhihu Weibo (Zhu et al. Citation2020), and Sina Weibo (Zhang et al. Citation2021), are applied to detect population flows. However, substantial population biases vary across different social media platforms (Ruths and Pfeffer Citation2014). Cell phone signalling data can also be used to detect mobile flows (Gao et al. Citation2013). Baidu migration data use the changes in hundreds of millions of mobile phone positioning data to map and visualise the migration trajectory of people. Baidu migration data can be used to realise full, dynamic, and instantaneous visualisation of the trajectory and characteristics of population flows by analysing LBS (location-based service) big data (Shen et al. Citation2022). The advantage of Baidu migration data is that they can intuitively determine the source and destination of population flows in China.

Field survey (Li et al. Citation2020b; Román et al. Citation2014) that is an important method to obtain traffic flow can capture traffic conditions accurately, whereas, the cost and time factor limit the scope of the study or the frequency of data collection. Timetable data present a clear schedule of transportation services. The number of scheduled flights (Liu, Derudder, and Garcia Citation2013), the number of scheduled high-speed train (Chen Citation2017), and the number of scheduled inter-city coach (Wang, Du, and Huang Citation2020) were used to explore the travel flow. Using timetable data to study traffic flow can provide valuable insights into the scheduling and patterns of transportation activities.

In the information age, well-developed internet networks facilitate the information flow (Li et al. Citation2022a). Numerous studies explored the possibility of uncovering information flow from web search engines. A measure of information flow based on Google Trends (Devriendt et al. Citation2011; Wu and Deng Citation2015), and a measure of information flow based on Baidu Index (Zhen et al. Citation2019; Zheng et al. Citation2022), were conducted. Both Google Trends and Baidu Index offer data for different regions and time range, allowing users to understand how search interests vary geographically. Therefore, the information connections among cities can be reflected based on Google Trends and Baidu Index.

In recent years, various community detection algorithms have been proposed, such as the Girvan-Newman algorithm (Girvan and Newman Citation2002), label propagation algorithm (Jokar and Mosleh Citation2019), Louvain method (Blondel et al. Citation2008), hierarchical clustering algorithm (Chowdhary Citation2017), Infomap method (Hong and Yao Citation2019; Toth, Helic, and Geiger Citation2022), and spectral clustering algorithm (Lierde, Chow, and Chen Citation2020; Ng, Jordan, and Weiss Citation2001). Of these algorithms, hierarchical clustering is deterministic, which means it is reproducible. Thus, the hierarchical clustering method is effective and widely used in non-overlapping community detection (Chen et al. Citation2021; Lu and Dong Citation2023). The hierarchical clustering algorithm seeks to build a hierarchy of clusters, whereas we need to propose a method to determine the number of clusters.

3. Study area and data sources

3.1. Study area

In this research, we take China as the research area. China is located in eastern Asia on the west coast of the Pacific Ocean. China is generally classified into three different geographic areas, specifically eastern, central and western regions. Prefecture-level cities, provincial municipalities or counties, and municipalities directly under the central government in China were selected as research units. Because of missing data from Hong Kong, Macao and Taiwan, a total of 367 cities were selected for this study ().

Figure 1. Study Area.

Figure 1. Study Area.

For the last two decades, China's population has continued to grow. By the end of 2021, China's population had reached 1.4126 billion (http://www.stats.gov.cn), making it the world's most populous country. In 2021, China built 4208 km of new railways, 9028 km of new expressways, and 7 new civil transport airports (http://www.stats.gov.cn). The development of transportation and economics has promoted the flow of the population and the development of logistics. By the end of 2021, the number of internet users in China was 1.032 billion, and the internet penetration rate reached 73.0% (http://www.stats.gov.cn). The increasing penetration rate of the internet, as well as the growth of the number of internet users, has pushed the interaction of information flows.

3.2. Data sources

Population, traffic and information flows reflect the flow of population, traffic and information, respectively, and constitute the important content of intercity connections. Therefore, multisource flow data, including flows of populations, traffic, and information, are used to detect city communities.

The population flow was calculated based on Baidu migration data (http://qianxi.baidu.com/). Since Baidu did not provide data for all days in a complete year, considering that there are no holidays in November and the impact of COVID-19 is small, Baidu migration data in November 2021 are selected as the data source. The population flow was at a normal level in November 2021.

The traffic flow was calculated based on the flight schedule, train schedule, and bus schedule. Flight schedule data was downloaded from the Trip.com Group (www.ctrip.com) that is one of the largest online travel agencies in the world, providing services of flight bookings. Railway schedule data was downloaded from 12,306.cn (https://www.12306.cn/index/) that is the official website of China Railway Corporation for booking train tickets in China. Bus schedule data was downloaded from keyunzhan (https://www.keyunzhan.com/) that is a popular bus ticket website in China.

The information flow was calculated based on the Baidu Index (https://index.baidu.com/v2/index.html#/) that is a tool provided by Baidu. One can enter keywords in the Baidu Index search bar, selecting the searched time range, and region, obtaining the keyword’s daily search volume within the corresponding period and region (Wang et al. Citation2023). The search volume in any city for any other city based on Baidu Index search bar was downloaded.

4. Methods

4.1. Multisource flow data fusion method

4.1.1. Intercity population flow calculation method

In this study, Baidu migration data were used to measure the intercity population flow. Baidu migration data record the daily population flow of each city and the population flow proportion to other cities. For two cities i and j, the value of population outflow from city i to city j can be calculated by Equation (1). (1) PFOutij=k=1nPFOutSumk×Perk(ij)(1) where PFOutSumk is the total population flow from city i on Day k, and Perk(ij) represents the percentage of population outflow from city i to city j on Day k.

For two cities i and j, the value of population flow between city i and city j can be calculated by Equation (2). (2) PFij=PFOutij+PFOutji(2) where PFOutij is the value of population outflow from city i to city j, and PFOutji is the value of population outflow from city j to city i.

4.1.2. Intercity traffic flow calculation method

In this research, flight schedule, train schedule, and bus schedule were used to calculated traffic flow. For two cities i and j, the value of traffic flow from city i to city j can be calculated by Equation (3). (3) TFOutij=WFS×FSijr=1ns=1nFSrs+WTS×TSijr=1ns=1nTSrs+WBS×BSijr=1ns=1nBSrs(3) where FSij, TSij, and BSij are separately the number of flights, trains, and buses from city i to city j, and WFS, WTS, and WBS are separately the weight of flight, train, and bus. In 2021, the sum of passenger traffic by civil aviation, railways, and highways accounted for 98.03%, and passenger traffic by civil aviation, railways, and highways accounted for 5.31%, 31.46%, and 61.27%, respectively. Therefore, the weight of flight, train, and bus is 5.31%, 31.46%, and 61.27%, respectively.

For two cities i and j, the value of traffic flow between city i and city j can be calculated by Equation (4). (4) TFij=TFOutij+TFOutji(4) where TFOutij is the value of traffic flow from city i to city j, and PFOutji is the value of traffic flow from city j to city i.

4.1.3. Intercity information flow calculation method

In this research, Baidu Index was used to calculate the information flow. For two cities i and j, the value of information flow between city i and city j can be calculated by Equation (5). (5) IFij=IFOutij+IFOutji(5) where IFOutij is the search volume of city j within city i, and IFOutii is the search volume of city i within city j.

4.1.4. Population flow-traffic flow-information flow fusion method

The purpose of multisource flow fusion is to combine and optimise each single information source to output more effective information. To compare the weights of different flows, these flow data are first normalised. Based on the normalised flow data and the weights of flows, the fusion flow can be calculated by Equation (6). (6) MFij=w1Norm(PFij)+w2Norm(CFij)+w2Norm(IFij)w1+w2+w3(6) (7) Norm(Fij)=FijMin{F11,F12,F13,,Fnn}Max{F11,F12,F13,..,Fnn}Min{F11,F12,F13,.,Fnn}(7) where w1, w2, and w3 represent the weights of population flow, traffic flow, and information flow, respectively; and MFij represents the value of the fusion of multisource flow between city i and city j. Norm(PFij), Norm(CFij), and Norm(IFij) can be calculated according to Equations (7), and their values are between 0 and 1. The size of the MFij value represents the closeness of the city's connection, and its value is between 0 and 1. The larger the value of MFij, the more closely connected the cities are; conversely, the smaller the value, the less connected the cities are. Since the entropy weight method can accurately reflect the importance of each flow and is not affected by subjective factors, the entropy weight method was used to determine the weights of the flows in this study.

4.2. City community detection method

Hierarchical clustering attempts to divide the dataset at different levels to form a tree-like clustering structure. The AGNES (Agglomerative Nesting) is a hierarchical clustering algorithm with a bottom-up aggregation strategy, and it first regards each sample in the dataset as an initial cluster and then determines the two nearest clusters for merging at each step of the algorithm. This process is repeated continuously until the preset number of clusters is reached.

The key of the AGNES algorithm is to determine the distance between clusters and the number of clusters (k for short). Since the average distance of two clusters better reflects the similarity between two clusters (Equation (8)), the distance between two clusters is defined as the average distance between each city in one cluster to every city in the other cluster. In Equation (8), ni and nj represent the number of cities in cluster Ci and the number of cities in cluster Cj, respectively, and dist(x,y) is the flow volume between a city in cluster Ci and a city in cluster Cj. (8) davg(Ci,Cj)=1ni×njxCiyCjdist(x,y)(8)

Presently, the silhouette coefficient has been widely used to evaluate the quality of clustering. Therefore, we try to calculate the silhouette coefficient with different k values to select a relatively optimised k. The silhouette coefficient value for each object is a measure of how similar an object is to its own cluster compared to other clusters (Belyadi and Haghighat Citation2021). The silhouette coefficient value for the i-th object, si, is defined as Equation (9). In Equation (9), ai is the average distance from the i-th object to the other objects in the same cluster as the i-th object, and bi is the minimum average distance from the i-th object to objects in a different cluster, minimised over clusters. The silhouette coefficient ranges between −1 and 1, and a high value indicates that the object is well matched to its own cluster and poorly matched to neighbouring clusters. (9) si=biaimax(ai,bi)(9)

5. Results

5.1. The distribution of intercity multisource flows

To reveal the spatial patterns of multisource flow between different cities, it is necessary to show the spatial distribution of population flow, traffic flow, information flow, and the fusion of multisource flows. The weights of population flow, traffic flow and information flow are calculated based on the entropy weight method, and the weights of the population flow, traffic flow, and information flow are 0.521759, 0.395262, and 0.082979, respectively. In this study, we use lines to represent the flows between cities, and each line on the map represents a flow connecting two cities. Because almost all cities have population, traffic and information flows between them, loading all lines will affect the visualisation effectiveness of the map. Therefore, the distribution of the top 2000 population flows, traffic flows, information flows, and the fusion of multisource flows are selected. Although 2000 flows account for only 2.98% of the total 67,161 flows, the sum of the top 2000 population flows, traffic flows, information flows, and the fusion of multisource flows account for 88.22% of the sum of total population flows, 71.82% of the sum of total traffic flows, 17.18% of the sum of total information flows, and 26.42% of the sum of the fusion of multisource flows. The distribution of the top 2000 population flows, traffic flows, information flows, and the fusion of multisource flow are represented by (a)–(d).

Figure 2. The distribution of intercity (a) population flows, (b) traffic flows, (c) information flows, and (d) fusion flows.

Figure 2. The distribution of intercity (a) population flows, (b) traffic flows, (c) information flows, and (d) fusion flows.

From , the following conclusions can be drawn. (1) Both the single flow and the fusion flow between cities in China present an obvious diamond structure (the four solid red lines in form a diamond structure). The four points of the diamond represent Beijing, Shanghai, Guangzhou, and Chengdu. Beijing, Shanghai, Guangzhou, and Chengdu are the core cities of the Beijing-Tianjin-Hebei region (including Beijing, Tianjin and Hebei) (Jing-Jin-Ji), the Yangtze River Delta Region (including Shanghai, Jiangsu, Zhejiang and Anhui) (YRD), the Guangdong-Hong Kong-Macao Greater Bay Area (including Hong Kong, Macao and 9 cities in Guangdong) (GBA), and the Chengdu-Chongqing Economic Circle (including 15 cities in Sichuan Province and 27 districts (counties) in Chongqing) (CCEC), respectively. (2) Wuhan is located almost in the centre of the diamond structure and maintains close ties with the above four major urban agglomerations. (3) The spatial distributions of the four flows show similar characteristics. On the whole, the flow distribution is imbalanced in space, and the flows are mainly located inside the diamond structure, whereas flows outside the diamond structure show weaker ties. (4) The four types of flows are not completely equivalent but also have some slight differences. The gap between population flows is the largest, the gap between information flows is the smallest. (5) The fusion of multisource flow, which is the synthesis of three kinds of flows, avoids the one-sidedness of a single flow and provides a more reasonable perspective to study intercity flows.

Figure 3. Diamond structure formed by the flows.

Figure 3. Diamond structure formed by the flows.

5.2. The results of city community detection

Since a total of 367 cities were selected for this study and agglomerative hierarchical clustering is a bottom-up approach, we calculated the silhouette coefficient of all the steps between 1 and 366 to select a relatively optimised k. shows the silhouette coefficient of each step. A higher silhouette coefficient indicates that the clustering outcomes exhibit greater disparities between clusters and lesser disparities within each cluster, whereas, the silhouette coefficient is not the sole criterion for evaluating clustering results. When all cities are grouped into two clusters, it has the highest clustering coefficient according to , whereas, grouping all cities into two clusters has little practical significance. In order to better understand the city community detection process and achieve better clustering results, four data points where the curve in changed significantly were picked up. So, the results of city community detection in the steps of 50, 270, 300, and 350 are illustrated in .

Figure 4. Silhouette coefficient of each step.

Figure 4. Silhouette coefficient of each step.

Figure 5. The results of city community detection based on the fusion flow.

Figure 5. The results of city community detection based on the fusion flow.

From , the following conclusions can be drawn. (1) In the step 50, a number of city communities, which generally consist of fewer cities, are identified. Most city communities are located in eastern region of China. (2) In the step 270, four large city communities identified are basically consistent with the boundaries of four national level urban agglomerations, including Jing-Jin-Ji, YRD, GBA, and CCEC. In the eastern and central regions of China, number of city communities are identified, and these communities are essentially within provincial boundaries. In the western region, especially Xinjiang, Qinghai, and Tibet, some small city communities are identified. (3) In the step 300, a large community, including Jing-Jin-Ji, most cities of YRD, and Shandong, are identified. Except for cities in Xinjiang, Qinghai, and Tibet, cities in other regions are basically included in large communities. (4) In the step 350, cities are merged into 17 ( = 367–350) city communities. Six large communities are identified, namely most cities in Xinjiang, most cities in Qinghai, all cities in Tibet, all cities in Gansu and Ningxia, most cities in Hainan, as well as other regions besides the above-mentioned regions.

Generally, cities in the same province are merge into a city community earlier, which shows that most cities in the same province are relatively closely connected. Provincial capitals are usually the economic and political centre of a province and have strong attractions to cities in the same province. In addition, one city community may involve cities in multiple provinces. Therefore, it is very important to further break the provincial administrative barriers and promote the development of regional integration in these city communities.

Therefore, the results of city community detection may help to address challenges related to population growth, transportation, and other urban issues, so as to achieve coordinated urban development in these city communities. First, population should be encouraged to move in the city communities to optimise the allocation of human resources and promote the common development of different cities within the urban agglomeration. Second, efficient and interconnected transportation systems should to further developed to facilitate the flow of elements within the city community. Third, infrastructure and services, such as water supply, and energy distribution, are encouraged to be shared to optimise resource utilisation.

6. Discussion

6.1. Comparison of population flow, traffic flow, information flow, and fusion flow

To reveal the correlation between population flow, traffic flow, information flow, and fusion flow. Pearson's correlation coefficients are calculated. shows Pearson's correlation coefficients between population flow, traffic flow, information flow, and fusion flow. In , ‘*’ denotes that the p value is less than 0.01.

Table 1. Correlation between population flow, traffic flow, information flow, and fusion flow.

From , we draw the following conclusions. (1) From the perspective of a single data flow, there is a positive correlation between population flow and traffic flow, between population flow and information flow, and between traffic flow and information flow. Both the correlation between population flow and information flow and between traffic flow and information flow are much smaller than that between population flow and traffic flow, which may be due to a large difference between population flow and information flow and between traffic flow and information flow. For the above reasons, the use of any single flow may not fully reflect the flows between cities. (2) There is a strong positive correlation between fusion flow and population flow, between fusion flow and traffic flow, and between fusion flow and information flow. Fusion flow has achieved a better integration of the population flow, traffic flow, and information flow and can better reflect the flow between cities.

6.2. Comparison of city community detection based on different flows

Although there is no unique and widely accepted definition of community, a community can be generally defined as a group of entities closer to each other in comparison to other entities of the dataset (Bagrow and Bollt Citation2005; Bedi and Sharma Citation2016; Fortunato and Hric Citation2016). Based on the definition of community, we proposed two principles to evaluate the rationality of the division results of city communities. The first principle is that closely connected cities should be in the same community. The second principle is that the nonclosely connected cities should be in different communities. Based on the second principle, cities that are far away and less connected should fall into different clusters.

To test the rationality of the division results of city communities, we compare the results of city community detection based on population flow, traffic flow, information flow, and fusion flow. The results of city community detection in the steps of 300 and 350 based on population flow, traffic flow, and information flow are illustrated in , , and , respectively.

Five city communities spanning three or more provinces are identified in the step 350 of , namely a city community including Jilin, Liaoning and eastern Inner Mongolia, a city community including Shanxi, Shaanxi, Ningxia, eastern Gansu, and western Inner Mongolia, a city community including Jing-Jin-Ji, YRD, Shandong, and Henan, a city community including Sichuan, Chongqing, and Guizhou, and a city community including Guangdong, Guangxi and Hunan.

Figure 6. The results of city community detection based on population flow.

Figure 6. The results of city community detection based on population flow.

Six city communities spanning three or more provinces are identified in the step 300 of , namely a city community including Jilin, Liaoning and eastern Inner Mongolia, a city community including northern Shaanxi, northern Ningxia, and central Inner Mongolia, a city community including southern Shaanxi, southern Ningxia, and southeastern Gansu, a city community including Jing-Jin-Ji, Shandong, Henan, Jiangsu, northern Zhejiang, and Shanghai, a city community including eastern Sichuan, Chongqing, Guizhou, and northeast Yunnan, and a city community including Guangdong, Guangxi, Hunan, Jiangxi, Fujian, southern Zhejiang.

Figure 7. The results of city community detection based on traffic flow.

Figure 7. The results of city community detection based on traffic flow.

A dominant city community is identified in the step 300 of . This city community contains a city in Shaanxi, a city in Sichuan, two cities in Liaoning, Chongqing and most of the eastern and central regions except Heilongjiang, Jilin, most cities in Liaoning, and Hainan.

Figure 8. The results of city community detection based on information flow.

Figure 8. The results of city community detection based on information flow.

By comparing , the city communities show some similarities and differences. (1) City communities are often first identified in the eastern region. Large city communities usually identified earlier in the four major urban agglomerations. (2) Population flow may be more likely to be affected by provincial boundaries and travel distances. Traffic flow may be less constrained by provincial boundaries, and the reason may be closely related to the interconnectivity of highways, railways, and flights across the whole country. Compared with people flow and traffic flow, information flow is less restricted by region, and this may be due to the fact people can exchange information without leaving their homes. (3) By comparing the four flows, we can conclude that the fusion flow that encompasses a wide range of aspects is the best choice for city community detection.

6.3. Comparison of city community detection based on different algorithms

Moreover, we compare the results of city community detection based on different algorithms. Louvain and spectral clustering algorithms are widely used to identify communities of nodes in a graph based on the edges connecting them. Therefore, Louvain and spectral clustering algorithms are also used to identify city communities. shows the results of 17 city communities identified by Louvain algorithm and spectral clustering algorithm.

Figure 9. The results of city community detection based on (a) Louvain algorithm, (b) spectral clustering algorithm.

Figure 9. The results of city community detection based on (a) Louvain algorithm, (b) spectral clustering algorithm.

By comparing and , following conclusions can be drawn. (1) In (a), cities in the same communities are very discrete in geographical space. (2) Compared to (a), cities in the same communities are more concentrated in geographical space, whereas, none of the four major urban agglomerations has formed a city community in (b). (3) By comparing the three algorithms, we concluded that hierarchical clustering is more suitable for identified city communities in China.

7. Conclusions

In this study, multisource flow data, including population, traffic, and information flows, are used to detect city communities in China. Although the three flows are positively correlated, there are also some differences among them. A fusing method is innovatively constructed, which is used to combine and optimise each single information source and can output more effective information. The fusion flow between cities in China presents an obvious diamond structure, among which Beijing, Shanghai, Guangzhou, and Chengdu are the four points of the diamond, and Wuhan is located almost in the centre of the diamond structure. Hierarchical clustering algorithm was used to divide the 367 cities into different communities. The comparison of the division results of city communities shows that multisource flow and hierarchical clustering for city community detection is a better one. The results of our research on the identification of city communities may be helpful for regional planning and management of metropolitan areas.

A few limitations were identified in our study that should be resolved in future studies. Because of limitations on population flow, traffic flow and information flow availability, Baidu migration data was used to calculate intercity population flow, flight schedule, train schedule, and bus schedule were used to calculate intercity traffic flow, and Baidu Index was used to calculate intercity information flow. Although these data can better represent the population flow, traffic flow and information flow, there may still be some deviation from the real data. Although multisource flow data, including population, traffic, and information flows, are involved in this research, other flows depicting the connection between cities are not considered.

Moreover, the division of city communities is very complex. Not only should mobile flows be considered, but other factors, such as environmental (Hashmi et al. Citation2021) and policy factors (Li et al. Citation2022b), should also be taken into account in the process of city community division. In the future, we will explore a more reasonable division scheme based on the above research. Furthermore, cities will continue to develop, and their development may be uneven. Therefore, the division of city communities also needs continuous optimisation to adapt to changes in urban development.

Acknowledgments

We would like to thank the editors and anonymous referees for their constructive comments.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are openly available in Figshare (https://figshare.com/articles/dataset/Multisource_flow_data/22598074; DOI: 10.6084/m9.figshare.22598074).

Additional information

Funding

This work was supported by the Natural Science Foundation of Chongqing (No. CSTB2022NSCQ-MSX0336), and Special Fund for Youth Team of the Southwest University (No. SWU-XJPY202307).

References

  • Bagrow, J. P., and E. M. Bollt. 2005. “Local Method for Detecting Communities.” Physical Review E 72 (4): 046108. https://doi.org/10.1103/PhysRevE.72.046108.
  • Bakillah, M., R. Lia, and S. Liang. 2015. “Geo-located Community Detection in Twitter with Enhanced Fast-Greedy Optimization of Modularity: The Case Study of Typhoon Haiyan.” International Journal of Geographical Information Science 29 (2): 258–279. https://doi.org/10.1080/13658816.2014.964247.
  • Bedi, P., and C. Sharma. 2016. “Community Detection in Social Networks.” Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery 6 (3): 115–135. https://doi.org/10.1002/widm.1178.
  • Belyadi, H., and A. Haghighat. 2021. Machine Learning Guide for Oil and Gas Using Python: A Step-by-Step Breakdown with Data, Algorithms, Codes, and Applications.. Oxford: Gulf Professional Publishing.
  • Blondel, V., J. Guillaume, R. Lambiotte, and E. Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks.” Journal of Statistical Mechanics: Theory and Experiment 2008 (10): P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.
  • Cai, H., Z. Wang, and Y. Zhu. 2022. “Understanding the Structure and Determinants of Intercity Carbon Emissions Association Network in China.” Journal of Cleaner Production 352:131535. https://doi.org/10.1016/j.jclepro.2022.131535.
  • Chen, Z. 2017. “Impacts of High-Speed Rail on Domestic Air Transportation in China.” Journal of Transport Geography 62:184–196. https://doi.org/10.1016/j.jtrangeo.2017.04.002.
  • Chen, W. 2021. “Delineating the Spatial Boundaries of Megaregions in China: A City Network Perspective.” Complexity 2021:2574025. https://doi.org/10.1155/2021/2574025.
  • Chen, J., Y. Li, X. Yang, S. Zhao, and Y. Zhang. 2021. “VGHC: A Variable Granularity Hierarchical Clustering for Community Detection.” Granular Computing 6 (1): 37–46. https://doi.org/10.1007/s41066-019-00195-1.
  • Chen, W., W. Liu, W. Ke, and N. Wang. 2018. “Understanding Spatial Structures and Organizational Patterns of City Networks in China: A Highway Passenger Flow Perspective.” Journal of Geographical Sciences 28 (4): 477–494. https://doi.org/10.1007/s11442-018-1485-x.
  • Chowdhary, A. 2017. “Community Detection: Hierarchical Clustering Algorithms.” International Journal of Creative Research Thoughts 5 (4): 2320–2882.
  • Devriendt, L., A. Boulton, S. Brunn, B. Derudder, and F. Witlox. 2011. “Searching for Cyberspace: The Position of Major Cities in the Information age.” Journal of Urban Technology 18 (1): 73–92. https://doi.org/10.1080/10630732.2011.578410.
  • Ding, S., M. Zhang, Y. Xing, and J. Lu. 2022. “Revealing Urban Community Structures by Fusing Multisource Transportation Data.” Journal of Transportation Engineering Part A-Systems 148 (9): 04022060. https://doi.org/10.1061/JTEPBS.0000704.
  • Fang, C., and D. Yu. 2017. “Urban Agglomeration: An Evolving Concept of an Emerging Phenomenon.” Landscape and Urban Planning 162:126–136. https://doi.org/10.1016/j.landurbplan.2017.02.014.
  • Fortunato, S., and D. Hric. 2016. “Community Detection in Networks: A User Guide.” Physics Reports 659:1–44. https://doi.org/10.1016/j.physrep.2016.09.002.
  • Gao, S., Y. Liu, Y. Wang, and X. Ma. 2013. “Discovering Spatial Interaction Communities from Mobile Phone Data.” Transactions in GIS 17 (3): 463–481. https://doi.org/10.1111/tgis.12042.
  • Girvan, M., and M. Newman. 2002. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences 99 (12): 7821–7826. https://doi.org/10.1073/pnas.122653799.
  • Gong, J., S. Li, X. Ye, Q. Peng, and S. Kudva. 2021. “Modelling Impacts of High-Speed Rail on Urban Interaction with Social Media in China's Mainland.” Geo-spatial Information Science 24 (4): 16. https://doi.org/10.1080/10095020.2021.1972771.
  • Hashmi, S., H. Fan, Z. Fareed, and F. Shahzad. 2021. “Asymmetric Nexus Between Urban Agglomerations and Environmental Pollution in Top Ten Urban Agglomerated Countries Using Quantile Methods.” Environmental Science and Pollution Research 28 (11): 13404–13424. https://doi.org/10.1007/s11356-020-10669-4.
  • Hasnat, M., and S. Hasan. 2018. “Identifying Tourists and Analyzing Spatial Patterns of Their Destinations from Location-Based Social Media Data.” Transportation Research Part C 96:38–54. https://doi.org/10.1016/j.trc.2018.09.006.
  • Hong, Y., and Y. Yao. 2019. “Hierarchical Community Detection and Functional Area Identification with osm Roads and Complex Graph Theory.” International Journal of Geographical Information Science 33 (8): 1569–1587. https://doi.org/10.1080/13658816.2019.1584806.
  • Jing, F., Z. Li, S. Qiao, J. Zhang, B. Olatosi, and X. Li. 2023. “Using Geospatial Social Media Data for Infectious Disease Studies: A Systematic Review.” International Journal of Digital Earth 16 (1): 130–157. https://doi.org/10.1080/17538947.2022.2161652.
  • Jokar, E., and M. Mosleh. 2019. “Community Detection in Social Networks Based on Improved Label Propagation Algorithm and Balanced Link Density.” Physics Letters A 383 (8): 718–727. https://doi.org/10.1016/j.physleta.2018.11.033.
  • Kraft, S., M. Halás, P. Klapka, and V. Blažeka. 2022. “Functional Regions as a Platform to Define Integrated Transport System Zones: The use of Population Flows Data.” Applied Geography 144:102732. https://doi.org/10.1016/j.apgeog.2022.102732.
  • Krings, G., F. Calabrese, C. Ratti, and V. D. Blondel. 2009. “Urban Gravity: A Model for Inter-City Telecommunication Flows.” Journal of Statistical Mechanics: Theory and Experiment 2009 (7): L07003. https://doi.org/10.1088/1742-5468/2009/07/l07003.
  • Li, P., Y. Feng, X. Tong, R. Wang, S. Zhai, Y. Tang, and W. Liu. 2022a. “Spatial Planning-Constrained Modeling of Urban Growth in the Yangtze River Delta Considering the Element Flows.” GIScience & Remote Sensing 59 (1): 1491–1508. https://doi.org/10.1080/15481603.2022.2118345.
  • Li, B., S. Gao, Y. Liang, Y. Kang, T. Prestby, Y. Gao, and R. Xiao. 2020a. “Estimation of Regional Economic Development Indicator from Transportation Network Analytics.” Scientific Reports 10 (1):2647. https://doi.org/10.1038/s41598-020-59505-2.
  • Li, L., S. Ma, Y. Zheng, and X. Xiao. 2022b. “Integrated Regional Development: Comparison of Urban Agglomeration Policies in China.” Land Use Policy 114:105939. https://doi.org/10.1016/j.landusepol.2021.105939.
  • Li, X., J. Tang, X. Hu, and W. Wang. 2020b. “Assessing Intercity Multimodal Choice Behavior in a Touristy City: A Factor Analysis.” Journal of Transport Geography 86:102776. https://doi.org/10.1016/j.jtrangeo.2020.102776.
  • Lierde, H., T. Chow, and G. Chen. 2020. “Scalable Spectral Clustering for Overlapping Community Detection in Large-Scale Networks.” IEEE Transactions on Knowledge and Data Engineering 32 (4): 754–767. https://doi.org/10.1109/TKDE.2019.2892096.
  • Lin, J., Z. Wu, and X. Li. 2019. “Measuring Inter-City Connectivity in an Urban Agglomeration Based on Multi-Source Data.” International Journal of Geographical Information Science 33 (5): 1062–1081. https://doi.org/10.1080/13658816.2018.1563302.
  • Liu, X., B. Derudder, and C. Garcia. 2013. “Exploring the Co-evolution of the Geographies of air Transport Aviation and Corporate Networks.” Journal of Transport Geography 30 (3): 26–36. https://doi.org/10.1016/j.jtrangeo.2013.02.002.
  • Liu, Y., Z. Sui, C. Kang, and Y. Gao. 2014. “Uncovering Patterns of Inter-Urban Trip and Spatial Interaction from Social Media Check-in Data.” PLoS ONE 9 (1): e86026. https://doi.org/10.1371/journal.pone.0086026.
  • Lu, Z., and Z. Dong. 2023. “A Gravitation-Based Hierarchical Community Detection Algorithm for Structuring Supply Chain Network.” International Journal of Computational Intelligence Systems 16 (1): 110. https://doi.org/10.1007/s44196-023-00290-x.
  • Ng, A., M. Jordan, and Y. Weiss. 2001. “On Spectral Clustering: Analysis and an Algorithm.” Advances in Neural Information Processing Systems 14:849–856.
  • Qin, J., C. Song, M. Tang, Y. Zhang, and J. Wang. 2019. “Exploring the Spatial Characteristics of Inbound Tourist Flows in China Using Geotagged Photos.” Sustainability 11 (20): 5822. https://doi.org/10.3390/su11205822.
  • Román, C., J. Martin, R. Espino, E. Cherchi, J. Ortuzar, L. Rizzi, R. González, and F. Amador. 2014. “Valuation of Travel Time Savings for Intercity Travel: The Madrid-Barcelona Corridor.” Transport Policy 36:105–117. https://doi.org/10.1016/j.tranpol.2014.07.007.
  • Ruths, D., and J. Pfeffer. 2014. “Social Media for Large Studies of Behavior.” Science 346 (6213): 1063–1064. https://doi.org/10.1126/science.346.6213.1063.
  • Shen, J., Z. Huang, W. Zhou, and D. Zhao. 2022. “Revealing Population flow Patterns in the Sichuan-Chongqing Region, China, During the COVID-19 Epidemic in 2020.” Annals of GIS 28 (4): 533–545. https://doi.org/10.1080/19475683.2022.2090435.
  • South, T., B. Smart, M. Roughan, and L. Mitchell. 2022. “Information Flow Estimation: A Study of News on Twitter.” Online Social Networks and Media 31:100231. https://doi.org/10.1016/j.osnem.2022.100231.
  • Tang, M., J. Hong, G. Liu, and G. Shen. 2019. “Exploring Energy Flows Embodied in China's Economy from the Regional and Sectoral Perspectives via Combination of Multi-Regional Input-Output Analysis and a Complex Network Approach.” Energy 170:1191–1201. https://doi.org/10.1016/j.energy.2018.12.164.
  • Toth, C., D. Helic, and B. C. Geiger. 2022. “Synwalk: Community Detection via Random Walk Modelling.” Data Mining and Knowledge Discovery 36 (2):739–780. https://doi.org/10.1007/s10618-021-00809-w.
  • Wang, J., D. Du, and J. Huang. 2020. “Inter-City Connections in China: High-Speed Train vs. Inter-City Coach.” Journal of Transport Geography 82:102619. https://doi.org/10.1016/j.jtrangeo.2019.102619.
  • Wang, L., L. Xin, Y. Zhu, Y. Fang, and L. Zhu. 2023. “Associations Between Temperature Variations and Tourist Arrivals: Analysis Based on Baidu Index of hot-Spring Tourism in 44 Cities in China.” Environmental Science and Pollution Research 30:43641–43653. https://doi.org/10.1007/s11356-023-25404-y.
  • Wei, Y., W. Song, C. Xiu, and Z. Zhao. 2018. “The Rich-Club Phenomenon of China's Population Flow Network During the Country's Spring Festival.” Applied Geography 96:77–85. https://doi.org/10.1016/j.apgeog.2018.05.009.
  • Wu, J., and Y. Deng. 2015. “Intercity Information Diffusion and Price Discovery in Housing Markets: Evidence from Google Searches.” The Journal of Real Estate Finance and Economics 50 (3):289–306. https://doi.org/10.1007/s11146-014-9493-9.
  • Xu, J., A. Li, D. Li, Y. Liu, Y. Du, T. Pei, M. Ting, and C. Zhou. 2017. “Difference of Urban Development in China from the Perspective of Passenger Transport Around Spring Festival.” Applied Geography 87:85–96. https://doi.org/10.1016/j.apgeog.2017.07.014.
  • Yin, H., Z. Zhang, Y. Wan, Z. Gao, Y. Guo, and R. Xiao. 2023. “Sustainable Network Analysis and Coordinated Development Simulation of Urban Agglomerations from Multiple Perspectives.” Journal of Cleaner Production 413 (10): 137378. https://doi.org/10.1016/j.jclepro.2023.137378.
  • Zhang, B., S. Cheng, Y. Zhao, and F. Lu. 2023. “Inferring Intercity Freeway Truck Volume from the Perspective of the Potential Destination City Attractiveness.” Sustainable Cities and Society 98:104834. https://doi.org/10.1016/j.scs.2023.104834.
  • Zhang, Z., A. Fang, L. Cui, Z. Pan, W. Zhang, C. Tan, and C. Wang. 2021. “Towards Exploring the Influence of Community Structures on Information Dissemination in Sina Weibo Networks.” Discrete Dynamics in Nature and Society 2021:8325302. https://doi.org/10.1155/2021/8325302.
  • Zhen, F., X. Qin, X. Ye, H. Sun, and Z. Luosang. 2019. “Analyzing Urban Development Patterns Based on the Flow Analysis Method.” Cities 86:178–197. https://doi.org/10.1016/j.cities.2018.09.015.
  • Zheng, W., N. Du, Q. Zhang, and X. Xang. 2022. “Using Geodetector to Explore the Factors Affecting Evolution of the Spatial Structure of Information Flow in the Middle Reaches of the Yangtze River Urban Agglomeration.” GeoJournal 87 (6):4511–4529. https://doi.org/10.1007/s10708-021-10509-z.
  • Zhu, Z., T. Zhou, C. Jia, W. Liu, B. Liu, and J. Cao. 2020. “Community Detection Across Multiple Social Networks Based on Overlapping Users.” Transactions on Emerging Telecommunications Technologies 33 (6): e3928. https://doi.org/10.1002/ett.3928.