Full article: Scale effects-aware bottom-up population estimation using weakly supervised learning

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Fine-scale population estimation (FPE) is crucial for urban management. After training, the bottom-up FPE models can be applied independently of census data. However, given the lack of real fine-scale population data, the existing bottom-up methods typically apply models trained on coarse-grained census data to FPE directly, causing estimation bias induced by scale effects. Traffic analysis zones (TAZs) balance geo-semantics and fine granularity, but their potential as population analysis units has not been fully exploited. Hence, we developed a weakly-supervised TAZ-scale bottom-up population estimation method (WSTP). Specifically, to mitigate scale effects, the weakly-supervised training procedure involves fine-scale feature input, model prediction, spatial aggregation, and coarse-scale supervision is designed to ensure that WSTP consistently focuses on FPE throughout the training and prediction phases. To enable weakly-supervised training using census data, we treated TAZs as graph nodes and designed a Spatial Aggregation Layer to aggregate TAZ-scale population predictions into communities. Given the diverse distribution patterns across age groups, we decomposed the FPE task by age groups. The experiments showed that WSTP significantly outperformed the baselines with R² values of 0.821 and 0.785 at the community and TAZ scales, respectively, indicating that WSTP can mitigate scale effects and produce high-resolution, accurate population data.

KEYWORDS:

1. Introduction

Fine-scale population data play crucial roles in various aspects of urban management, including disaster emergency response, resource allocation, and public health assurance. However, the fine-scale population data for each region typically originate from the national-level census, which has shortcomings at a large spatial scale and long updating period, and thus these data are insufficient to meet the demands of fine-grained urban management (Leasure et al. Citation2020; Neal et al. Citation2022; Metzger et al. Citation2022; Swanwick et al. Citation2022). In recent years, to address the problem of untimely updating of census data, some scholars have attempted to forecast future population distribution based on historical census data (Chi, Zhou, and Voss Citation2011; Chi and Voss Citation2011; Chi and Wang Citation2017; Hauer Citation2019; Wilson et al. Citation2022; Baker, Swanson, and Tayman Citation2023). Meanwhile, to acquire more spatially fine-scale population distribution, numerous studies have investigated fine-scale population estimation using multi-source geographic data as auxiliary data, and by employing machine learning algorithms to model the implicit relationships between auxiliary data features and census data, thereby achieving fine-scale population spatialization or population estimation.

Grids and buildings have been widely used as fine-scale population analysis units (PAUs) in previous research. Grids, such as 100 and 1000 m grids, are preferred because their regular shapes and fixed sizes facilitate scale conversion, as well as providing notable advantages during fusion with raster data, such as nighttime light data (Jacobs et al. Citation2018; Weber et al. Citation2018; Neal et al. Citation2022; Metzger et al. Citation2022; Mei et al. Citation2022). Grids are extensively utilized in population spatialization for broad geographic regions, such as the global data sets WorldPop (Bondarenko et al. Citation2020), LandScan (Bright, Coleman, and Dobson Citation2000), GHS-POP (Freire et al. Citation2016) and GPW (Warszawski et al. Citation2017), and China-wide data sets such as CnPop (Bai et al. Citation2018), PoiPop (Ye et al. Citation2019) and FinePop (Tu et al. Citation2022). However, grid cells are affected by problems due to incomplete geo-semantics and the inability to express the real geographic environment (Shang et al. Citation2021), and they are unsuitable for regions with limited spatial extents. Buildings are realistic spatial units where individuals reside, and they possess intact geo-semantics and a highly fine-grained spatial scale (Yao et al. Citation2017; Shang et al. Citation2021; Chen et al. Citation2021). However, using buildings as PAUs requires accurate and extensive building data, which makes broad generalization challenging.

Traffic analysis zones (TAZs) derived from road networks and administrative boundaries are the basic spatial units used for studying the commuting patterns of residents (Miller and Shaw Citation2001). TAZs usually contain buildings and their surrounding environment to provide a finer spatial scale than administrative districts, thereby achieving a balance between intact geo-semantics and fine spatial scale. Population distribution and road networks are two pivotal interrelated factors that affect urban form and structure in the long term (Zhao, Zhao, and Sun Citation2019). Therefore, taking TAZs as PAUs is more consistent with urban morphology (Hu et al. Citation2022), and this approach can alleviate the reliance on building data for fine-scale population estimation to some extent. However, previous population modeling studies that used TAZs as PAUs mainly focused on distributing population flows among TAZs rather than estimating the distribution of the static residential population (Jain and Tiwari Citation2017; Weerasinghe and Bandara Citation2023).

The existing methods for population estimation can be broadly assigned to two categories: top-down methods and bottom-up methods. Top-down methods typically employ dasymetric mapping techniques and auxiliary data features to allocate census data to finer spatial units, such as 100 m grids (Stevens et al. Citation2015; Zong et al. Citation2019; Grippa et al. Citation2019; Mei et al. Citation2022; Boo et al. Citation2022) and buildings (Yao et al. Citation2017; Shang et al. Citation2021; Chen et al. Citation2021). These ancillary data features, such as the regional area, built-up area, and points of interest (POI) kernel density, are used to represent natural and human environments. Consequently, top-down methods assume the availability of census data as a prerequisite for application, which limits their applicability to regions with recent census data.

Bottom-up methods typically apply machine learning algorithms to establish regression models that link ancillary data features with census data. Some bottom-up methods use fine-scale microcensus data for model training, before subsequently applying these trained models to a broader geographical scope (Weber et al. Citation2018; Leasure et al. Citation2020; Georganos et al. Citation2022; Neal et al. Citation2022). These approaches have obtained promising results but their generalizability is limited due to the scarce and challenging acquisition of microcensus data. Other bottom-up methods utilize administrative district-level census data as training data by employing the ‘fine-scale feature aggregation, model prediction, and supervision’ training approach (Stevens et al. Citation2015; Shang et al. Citation2021; Chen et al. Citation2021; Wang et al. Citation2021). During training, these methods aggregate the features of all PAUs within an administrative district as model inputs, and then align the model's output with this administrative district’s population count. However, during the prediction phase, these models take the fine-scale features of PAUs that need to be predicted as model inputs, which results in a discrepancy in the scale of the model’s input between the training and prediction stages, thereby making it susceptible to scale effects or data distribution shift issues (Metzger et al. Citation2022). Some studies have adopted convolutional neural networks and weakly supervised learning techniques to guarantee that the model always takes features of the same scale as inputs (Jacobs et al. Citation2018; Metzger et al. Citation2022), however, these methods can only be used on grid units and can not be extended to spatially discrete and irregularly shaped PAUs, e.g. buildings and TAZs.

Different regions exhibit distinct population age structures (Cheng et al. Citation2019) and this divergence, known as spatial heterogeneity, is particularly sensitive to changes in spatial scale. Consequently, different age groups typically exhibit dissimilar spatial distribution patterns. For instance, a higher degree of population aging is often found in rural areas compared with urban areas. Nevertheless, the current population estimation methods primarily focus on estimating the distribution of the total population but without specifically modeling the populations in different age groups, thereby neglecting the potential for introducing more supervision signals into model training by utilizing age-specific population data as fine-grained labels. Furthermore, at finer spatial scales, the disparities in the population distributions among distinct age groups tend to become more pronounced. For instance, the disparities in the proportion of individuals aged 15–24 years between university dormitory clusters and regular residential neighborhoods are usually more noticeable than the differences observed among administrative districts. Hence, it is crucial to consider the influence of scale effects when conducting fine-grained population modeling by age groups based on census data.

To address the issues discussed above, we propose a weakly supervised bottom-up population estimation method (WSTP). Specifically, WSTP utilizes TAZs as the PAUs. To mitigate scale effects, the weakly-supervised training steps comprising fine-scale feature input, model prediction, spatial aggregation, and coarse-scale supervision are applied to ensure that the model consistently focuses on TAZ-scale population estimation during the training and prediction phases. Since TAZs are spatially discrete and irregularly shaped, we treat them as graph nodes and design a Spatial Aggregation Layer to spatially aggregate the TAZ-scale population predictions to the community scale, thereby enabling weakly-supervised training based on coarse-grained census data. Furthermore, WSTP decomposes census data into fine-grained age group labels to improve the specificity of modeling for each age group and introduce more supervision signals for weakly supervised training. WSTP further leverages multi-task learning to concurrently model the TAZ-scale population for each age group.

The remainder of this paper is organized as follows. In Section 2, we introduce our WSTP method. In Section 3, we present the study area and multi-source geographic data. In Section 4, we present the TAZ-scale population data produced by WSTP and assess its accuracy. In Section 5, we evaluate and discuss the scale effects that affect traditional models, as well as explaining the necessity of modeling the population by age groups, the changes in performance with WSTP when building data is unavailable, and the desirability of incorporating spatial autocorrelation (SA) in TAZ-scale population modeling. We give our conclusions in the final section.

2. Methodology

The proposed TAZ-scale population estimation method called WSTP mainly comprises two modules: the model prediction layer (MPL) and spatial aggregation layer (SAL). WSTP is organized into three stages: data preprocessing, model training and evaluation, and model prediction, as illustrated in . In the training and prediction phases, MPL consistently takes TAZ-scale features as inputs and predicts TAZ-scale populations, thereby mitigating scale effects. SAL is exclusively employed in the training and validation phases to aggregate the MPL's outputs for all TAZs within the same community to yield community-scale population predictions.

Figure 1. Proposed WSTP method.

Our WSTP method allows the TAZ-scale distribution to be modeled for either the total population or the population of each age group by adjusting the MPL's output dimension and configuring the training labels. As suggested in a previous study (Cáceres et al. Citation2013), we categorize the population into six age groups: 0–4, 5–14, 15–24, 25–44, 45–64, and 65 + years.

2.1. Model prediction layer

The feature extraction and prediction processes within the Model Prediction Layer (MPL) consistently operate at the TAZ scale to mitigate potential issues regarding scale effects and distribution shifts. To handle the limited training data and prevent model overfitting, a three-layer fully connected neural network with a limited number of parameters is employed as the MPL and denoted as $f (X θ)$ , where $X$ denotes the model input, which is the standardized TAZ feature vector; $θ$ represents the parameters that need to be optimized during training, which comprise the weights $W$ and the bias $b$ in each fully-connected layer $LL$ . (1) $L L_{i} (X) = {\begin{matrix} ReLU (W_{i} X + b_{i}) & i = 0, 1 \\ W_{i} X + b_{i} & i = 2 \end{matrix}$ (1) (2) $BatchNorm (X) = γ ⊙ \frac{X - {\hat{μ}}_{B}}{{\hat{σ}}_{B}} + β$ (2) (3) $\hat{Y} = f (X θ) = SoftPlus (L L_{2} (L L_{1} (BatchNorm (L L_{0} (X)))))$ (3)

In EquationEquation (1(1) $L L_{i} (X) = {\begin{matrix} ReLU (W_{i} X + b_{i}) & i = 0, 1 \\ W_{i} X + b_{i} & i = 2 \end{matrix}$ (1) ), ReLU is the activation function for the first two fully connected layers. In EquationEquation (3(3) $\hat{Y} = f (X θ) = SoftPlus (L L_{2} (L L_{1} (BatchNorm (L L_{0} (X)))))$ (3) ), $f (X θ)$ employs BatchNorm as a regularization measure after the first fully connected layer. BatchNorm (Ioffe and Szegedy Citation2015) allows the independent normalization of various features within a single batch, which helps to speed up the convergence of model training. In EquationEquation (2(2) $BatchNorm (X) = γ ⊙ \frac{X - {\hat{μ}}_{B}}{{\hat{σ}}_{B}} + β$ (2) ), ${\hat{μ}}_{B}$ and ${\hat{σ}}_{B}$ are the sample mean and sample standard deviation of the current input batch $X$ , respectively, and $γ$ and $β$ are the scaling and bias parameters that need to be optimized during training. $f (X θ)$ uses SoftPlus (Zheng et al. Citation2015) as the last activation function, which is a smoothed approximation of the ReLU function and can be used to constrain the output of $f (X θ)$ as always positive. In our method, the prediction target of our WSTP, i.e. the TAZ-scale population, is always positive. Therefore, adopting SoftPlus as the activation function for the prediction layer of WSTP is equivalent to imposing a priori constraint of ‘the output is always positive’ on the model, which improves model convergence and accuracy.

2.2. Spatial aggregation layer

Authentic TAZ-scale population data are difficult to obtain. To enable weakly supervised training of our TAZ-scale population estimation model using community-scale census data, we introduce the Spatial Aggregation Layer (SAL) to spatially aggregate the TAZ-scale population predictions output by the MPL to the community scale. The model's loss is then computed using community-scale census populations as the ground-truth labels.

The SAL is similar to the Regional Aggregation Layer (RAL) proposed by Jacobs et al. (Citation2018), but the key difference is that the RAL is tailored for aggregating spatially continuous, regular geographic units (such as grids), whereas our SAL is specifically designed for the aggregation of spatially discrete, irregular geographic units (i.e. TAZs). The SAL can be readily implemented using conventional convolution techniques. It should be noted that both TAZs and communities are irregular spatial units, and TAZs are distinctly separated by road networks. Therefore, we conceptualize TAZs as graph nodes and represent the spatial containment relationships between TAZs and communities as adjacency edges. Subsequently, we employ graph convolution techniques to implement the SAL.

We assume that there are $m$ communities $C = {C_{0}, C_{1}, \dots, C_{m - 1}}$ in the training data set, comprising a total of $n$ TAZs: $T = {T_{0}, T_{1}, \dots, T_{n - 1}}$ . The spatial analysis method is used to identify the containment relationships between communities and TAZs. As a result, each community can be represented as an index list of the $k$ TAZs that it contains, i.e. $C_{i} = {i 0, i 1, \dots, ik} \subseteq {0, 1, \dots, n - 1}$ . SAL then leverages the index lists to aggregate TAZ-scale predictions to the community scale.

During training, MPL takes the feature vector $X \in R^{n \times q}$ of $n$ TAZs as the input and outputs the population prediction $\hat{Y} \in R^{n \times p}$ for these TAZs, where $q$ is the number of features and $p$ is the MPL’s output dimension. When the prediction target is the total population, we set $p$ to 1. However, when predicting the population of each age group, we define $p$ as the number of age groups, i.e. six. The predicted populations $\hat{Y}$ are then aggregated to the community scale using SAL. The i-th community’s predicted value ${\hat{y}}_{C_{i}} \in R^{p}$ is calculated using EquationEquation (4(4) ${\hat{y}}_{C_{i}} = Sum (\hat{Y} [C_{i}]) = Sum (\hat{Y} [[i 0, i 1, \dots, ik]])$ (4) ). (4) ${\hat{y}}_{C_{i}} = Sum (\hat{Y} [C_{i}]) = Sum (\hat{Y} [[i 0, i 1, \dots, ik]])$ (4) To enable the direct use of the MPL’s output as the predicted TAZ-scale population during prediction, we employ the summation operation (Sum) for aggregation during training to maintain the actual relationship between the TAZ-scale population and community-scale population.

2.3. Model training

The weakly supervised training approach adopted in WSTP is a form of inexact supervision learning (Zhou Citation2018), where only geospatially coarse-grained labeled data, i.e. community-scale population data, are used for model training. This weakly supervised training method is based on an assumption that TAZs with similar features also have similar populations, and a practical constraint that the sum of populations in all TAZs within a community equals the population of that community. Furthermore, in our method, population estimation is decomposed into multiple tasks according to age groups, thereby generating fine-grained labeled data in the population attribute space. This strategy compensates for the geospatially coarse-grained nature of the census data and provides richer supervision signals for the weakly supervised training of WSTP.

To obtain a more uniform data set partition, Jenks Natural Breaks are employed to stratify communities into three groups based on their total population. Subsequently, communities from each stratum are randomly distributed into the training, validation, and test sets at a ratio of 3:1:1. During model training, the SAL is used to aggregate the model's TAZ-scale population predictions into community-scale predictions ${\hat{y}}_{C_{i}} \in R^{p}$ according to EquationEquation (4(4) ${\hat{y}}_{C_{i}} = Sum (\hat{Y} [C_{i}]) = Sum (\hat{Y} [[i 0, i 1, \dots, ik]])$ (4) ). The community-scale census data $y_{C_{i}} \in R^{p}$ are then used as the ground truth to compute the model's losses. As depicted in EquationEquation (5(5) $L = \sum_{i = 0}^{m - 1} ∥ y_{C_{i}} - {\hat{y}}_{C_{i}} ∥$ (5) ), we utilize the mean absolute error (MAE) as the loss function: (5) $L = \sum_{i = 0}^{m - 1} ∥ y_{C_{i}} - {\hat{y}}_{C_{i}} ∥$ (5) where $m$ represents the number of communities in the training set. The model is implemented using the PyTorch framework (Paszke et al. Citation2019) with the SGD optimizer. A ReduceLROnPlateau learning rate scheduler is employed, where the initial and final learning rates are set to 1e-4 and 1e-8, respectively. The batch size is set to 32 communities. In addition, an EarlyStopping mechanism (Goodfellow, Bengio, and Courville Citation2016) is configured with a patience threshold of 50 epochs, and thus if the model's validation loss remains stagnant for 50 consecutive epochs, the training process will be terminated to prevent model overfitting.

2.4. TAZ-Scale population estimation

As described above, by adjusting the output dimension of the MPL and the training labels, WSTP can be applied to modeling the TAZ-scale distribution of either the total population or the population of each age group. Clearly, the TAZ-scale population prediction can also be obtained by summing up the population predictions for each age group. A detailed comparison of the performance of direct population modeling and age group-based population modeling is presented in the discussion section.

The proposed SAL is exclusively applied in the model training and evaluation phases. In the prediction stage, the output $\hat{Y}$ of the MPL, i.e. the predicted TAZ-scale population, is the exact output of our WSTP.

WSTP is considered a bottom-up approach, but when applied to regions with up-to-date and accurate population census data, its predictions can be utilized as allocation weights. These weights can then be employed in dasymetric mapping to redistribute the population from districts to TAZs, as shown in EquationEquation (6(6) ${\hat{f}}_{dasy} (t) = y_{C_{i}} \times \frac{{\hat{y}}_{t}}{{\hat{y}}_{C_{i}}}$ (6) ), and WSTP transforms into a top-down approach in this context: (6) ${\hat{f}}_{dasy} (t) = y_{C_{i}} \times \frac{{\hat{y}}_{t}}{{\hat{y}}_{C_{i}}}$ (6) where ${\hat{y}}_{t} = \hat{Y} [t]$ represents the population of the t-th TAZ predicted by MPL, and $\hat{Y}$ is calculated according to EquationEquation (3(3) $\hat{Y} = f (X θ) = SoftPlus (L L_{2} (L L_{1} (BatchNorm (L L_{0} (X)))))$ (3) ); ${\hat{y}}_{C_{i}}$ is the predicted population of the community to which the t-th TAZ belongs, and output by the SAL; and $y_{C_{i}}$ is the census population of the community.

2.5. Accuracy evaluation

Real TAZ-scale population data are difficult to obtain, so the TAZ-scale population predicted by WSTP is aggregated to the community scale. The accuracy is then evaluated by comparing these community-scale predictions with the community-scale census data. All accuracy evaluations are performed within the test set, as partitioned according to Section 2.3. The evaluation metrics employed comprise R², MAE, symmetric mean absolute percentage error (sMAPE), and root mean squared error (RMSE), which are calculated as follows: (7) $R^{2} ({\hat{y}}_{C}, y_{C}) = 1 - \frac{\sum_{i = 0}^{m - 1} {(y_{C_{i}} - {\hat{y}}_{C_{i}})}^{2}}{\sum_{i = 0}^{m - 1} {(y_{C_{i}} - {\bar{y}}_{C})}^{2}}$ (7) (8) $MAE ({\hat{y}}_{C}, y_{C}) = \frac{1}{m} \sum_{i = 0}^{m - 1} | y_{C_{i}} - {\hat{y}}_{C_{i}} |$ (8) (9) $sMAPE ({\hat{y}}_{C}, y_{C}) = 100 \cdot \frac{2}{m} \sum_{i = 0}^{m - 1} \frac{| y_{C_{i}} - {\hat{y}}_{C_{i}} |}{| y_{C_{i}} | + | {\hat{y}}_{C_{i}} |}$ (9)

(10) $RMSE ({\hat{y}}_{C}, y_{C}) = \sqrt{\frac{1}{m} \sum_{i = 0}^{m - 1} {(y_{C_{i}} - {\hat{y}}_{C_{i}})}^{2}}$ (10) where $m$ is the number of communities in the test set, ${\bar{y}}_{C}$ is the average population of these communities, and $y_{C_{i}}$ and ${\hat{y}}_{C_{i}}$ are the true population and predicted population for the i-th community, respectively. Higher R² values indicate better model fitting, whereas higher MAE, sMAPE, and RMSE values indicate larger model errors.

3. Study area and data

The central districts of Wuhan in Hubei Province, China, were selected as our study area. The multi-source data set included population census, building, land cover, POIs, and mobile signaling data. TAZs with both intact geo-semantics and fine spatial scale were treated as PAUs.

3.1. Stude area

Wuhan, abbreviated as ‘Han’, is located in the central hinterland of China in the eastern part of Hubei Province at the confluence of the Yangtze River and Han River. Wuhan also known as Jiangcheng, is one of the mega-cities and national central cities of the People's Republic of China, the capital of Hubei Province. As shown in , there are 13 districts in Wuhan and the central urban area contain seven of them. The division between the central urban area and the suburban area in Wuhan city is a result of historical factors. The central urban area of Wuhan is also known as the central districts, which generally refers to seven well-developed districts, namely Jiang'an, Jianghan, Qiaokou, Hanyang, Wuchang, Qingshan, and Hongshan, characterized by concentrated construction, comprehensive municipal facilities, and well-established public services.

Figure 2. (a) Wuhan and the study area; (b) Communities in the study area; (c) TAZs in the study area.

The central districts of Wuhan cover a total area of approximately 493.76 km². According to the Seventh National Population Census of China in 2020, the resident population of the central districts is approximately 6.43 million, which accounts for roughly 52.21% of the population of Wuhan. In 2020, the resident population of Wuhan reached 12.3 million, an increase of 2.5 million from 2010. In terms of population mobility, the city's total inflow population has increased from 2.6 million in 2010 to 3.9 million. The increasing residential and mobile populations have put great pressure on urban planning, public transport, and medical resources in the central districts of Wuhan, which highlights the importance of fine-scale population distribution data for the central districts of Wuhan. The central districts of Wuhan have a complex urban form, including residential areas, commercial areas, medical service areas, leisure areas, industrial areas, and educational areas. In addition, the population distribution within the central districts exhibits significant spatial heterogeneity, thereby providing a representative case for fine-scale population estimation.

3.2. Data and features

lists the features and their data sources used in our experiments. The seventh population census of Wuhan, denoted as the Census, was employed for both model training and accuracy assessment. The WorldPop dataset was used for comparative evaluation with our model. These data sets were collected from years close to the 2020 Census to minimize potential modeling biases due to significant disparities in the data collection times.

Table 1. Multi-source geospatial data and features.

Download CSV Display Table

Urban population distribution is influenced by a variety of natural, social, economic, and political factors. We aim to characterize these factors as much as possible through multi-source geospatial data features. For the buildings data, we keep only urban and rural residential buildings. The quantity of residential buildings is strongly correlated with the population size (Metzger et al. Citation2022). Therefore, we calculate the total area of buildings, the number of buildings, and the total number of floors, respectively, to characterize the residential carrying capacity of the TAZ.

For land use and land cover data (LULC), we calculate the areas covered by green, water, and roads in the TAZ to represent the natural and human environment of the TAZ; For POI data, we calculate the number of each type of POI within the TAZ and the shortest distance between the TAZ and schools, parks, and transport stops to express the socio-economic level and accessibility of residents within the TAZ.

For mobile positioning data, we obtained the residential and working population data at a 50 m grid scale for the period from January to October 2020 from Baidu (https://huiyan.baidu.com). These data are calculated based on the time of appearance, frequency of appearance, and the type of WiFi connection of mobile phone users at specific locations. We calculated the mean and maximum values of the monthly residential populations for these 10 months, which are used to represent the residential population of Wuhan in 2020 at a 50 m grid scale. To assess the validity of this data, we distributed the grid-scale residential population to the community scale through the zoning statistical method and compared it with the community residential population from the census. As shown in , the R² between the maximum monthly residential population from Baidu and the residential population in the census is 0.734, and the maximum monthly residential population fits the census residential population better than the average monthly residential population. Therefore, we allocated the maximum monthly residential population at the 50 m grid scale (referred to as the Baidu residential population) to TAZ units, obtaining TAZ-scale population ground truth for accuracy assessment. Meanwhile, considering the overall population distribution is also correlated with the working population distribution, we calculated the minimum, mean, maximum, and total values of the working population for these 10 months as features for the models.

Figure 3. Scatter plots of Baidu's monthly resident population and the resident population in the census.

3.3. Population analysis units

In this study, TAZs with both intact geo-semantics and fine spatial scale were used as population analysis units (PAUs). TAZs within the central districts of Wuhan were constructed using the morphological map segmentation method introduced by Yuan, Zheng, and Xie (Citation2012), and the segmentation results are shown in ). The spatial scale of these TAZs was more refined compared with that of the communities. The study area contained approximately 16 times as many TAZs as communities, and the average area of the communities was about 20 times that of the TAZs, as shown in . During the data preprocessing stage, TAZs that lacked residential buildings were filtered out.

Table 2. Statistics for TAZs and communities in the study area.

Download CSV Display Table

4. Experimental results

4.1. TAZ-scale population estimation

The TAZ-scale population estimated for each age group is shown in . To reflect the differences in spatial distributions of the population by age groups, the population ratio is shown in the visualization instead of the population density. The population ratio for a given age group is the ratio of the number of people in that age group to the population of all age groups.

Figure 4. TAZ-scale population ratio and LISA cluster map of each age group in the study area.

In , the left column shows the distribution of population ratio of each age group in the study area, and the right column shows the LISA (local indicators of spatial association) cluster map of population ratio of the corresponding age group. We have also marked the global Moran'I index, P-value, and Z-score of the population ratio distribution of the current age group in the right column. It can be seen that the distributions of population ratios in all age groups show some degree of positive spatial correlation (Moran'I > 0) with high confidence (P-value < 0.01, Z-score > 2.58).

Specifically, the spatial distributions of the population in the 0-4 and 5-14 age groups are quite similar, and the TAZs with a higher ratio of the population in these two age groups are both more clustered in the Hongshan and Qingshan districts (the red-circled area in )). The clustering of high values of the 15–24 age group population ratio is more significant in the southern part of Hongshan District (the red-circled area in )), which was attributed to the presence of several universities, such as Wuhan University, Huazhong University of Science and Technology, Central China Normal University, and China University of Geosciences, and the population of the 15–24 age group is the most dominant component of the college-residential population. According to the census of Wuhan, the population ratios of the 25–44 and 45–64 age groups are relatively high, which are 0.339 and 0.269, respectively. Therefore, as shown in ), the spatial distributions of the population ratio of the 25–44 and 45–64 age groups share a certain degree of similarity, and their clustering of high values is fairly consistent. In contrast, the TAZs with a high population ratio in the 65 + age group are more evenly distributed across the districts. These results demonstrate that WSTP could accurately identify distinct distribution patterns for different age groups by modeling the population by age groups.

) depicts the estimated TAZ-scale population distribution calculated by summing the predicted TAZ-scale populations for all age groups. ) shows a part of Baiyushan Street in Qingshan District, encompassing several primary and secondary schools, and residential neighborhoods. In this region, TAZs corresponding to residential neighborhoods generally had higher population densities, whereas TAZs corresponding to primary and secondary schools tended to have lower population densities. ) shows Nanhu Street in Wuchang District. This region primarily comprised residential neighborhoods with correspondingly higher TAZ-scale population densities. ) illustrates the area adjacent to the Yangtze River in Jianghan District and Qiaokou District. This region encompassed diverse land use types, including commercial areas, residential areas, healthcare service areas, and educational and cultural areas. The building density was notably high in this region, where TAZs corresponding to residential areas had higher population densities, whereas those corresponding to other land use types tended to have lower population densities. These findings indicate that the TAZ-scale population data generated by WSTP aligned with the urban morphology, and provided valuable insights for urban management.

Figure 5. (a) TAZ-scale population distribution in the central districts of Wuhan. (b), (c), and (d) Population distributions in local regions and recent satellite images.

4.2. Accuracy evaluation

In this subsection, we evaluate the accuracy of the population estimates of our WSTP at both the community scale and the TAZ scale.

4.2.1. Accuracy evaluation at the community scale

In the community-scale accuracy evaluation, we aggregated the TAZ-scale population predictions to the community scale and conduct accuracy assessments by comparing these predictions with census data at the community level. The predicted TAZ-scale population was derived by summing the population predictions for all age groups. shows the accuracy of WSTP at estimating the total population (all) and the population for each age group. The ‘Proportion’ column shows the proportion of the population within each age group relative to the total population in the study area, and the ‘Mean’ column shows the mean population of each age group at the community level.

Table 3. Statistics and evaluation results for the total population (all) and the population within each age group at the community-scale. The CV column provides the community-scale coefficient of variation for each age group’s population.

Download CSV Display Table

Significant positive correlations were found between the MAE values for each age group's population and their respective proportions, thereby indicating that as the actual population increased, the absolute error of WSTP also tended to increase. However, the R² and sMAPE values did not have positive correlations with the ‘Proportion’ column, which suggests that the relative error of WSTP remained consistently low and stable. It should be noted that despite accounting for only 0.144 of the total population, the 15–24 years age group has significantly higher sMAPE and RMSE values than the other age groups, and this discrepancy can be attributed to the pronounced spatial heterogeneity of the population distribution for this age group. As shown in the ‘CV’ column, the coefficient of variation for the 15–24 years age group was 2.995, which was considerably greater than those for the other age groups.

presents scatter plots of the community-scale predicted populations for various age groups against the actual populations in the test data set. ‘OE’ indicates the proportion of cases for which WSTP overestimated the population. Clearly, the best-fit line for each age group agreed closely with the identity line, and the OE value for each age group was close to 50%, thereby confirming the capacity of WSTP to accurately model the population for different age groups.

Figure 6. Scatter plots of the predicted population for each age group against the actual population in the test dataset.

4.2.2. Accuracy evaluation at the TAZ scale

In the TAZ-scale accuracy evaluation, we used the TAZ-scale Baidu residential population as the ground truth. Since the Baidu residential population data does not contain information on the age groups of the population, we did not perform accuracy assessment by age groups.

displays scatter plots of predicted population versus actual population at the community level and TAZ level for the training, validation, and test sets. ‘#Samples’ indicates the number of community units or TAZ units. In the community-scale accuracy evaluation, our WSTP performed better on the test set in terms of the sMAPE and R² values compared with its performance on the validation set, which indicates WSTP did not overfit the data. The OE with the test set was 40.4%, which indicates that WSTP had a slight tendency to underestimate the population. In the TAZ-scale accuracy evaluation, the R² value of WSTP on the test set is 0.787, which is higher than its R² values on both the training and validation sets, indicating that WSTP has high accuracy and good generalization ability for TAZ-scale population estimation.

Figure 7. Scatter plots of predicted population versus actual population at the community level and TAZ level.

4.3. Ablation experiments

We then evaluated the requirement for the BatchNorm regularization module and the SoftPlus activation function in WSTP. In , WSTP (wo BN) and WSTP (wo SP) denote WSTP without BatchNorm and SoftPlus, respectively. The overall performance of WSTP decreased in both of these scenarios, thereby demonstrating the need to use BatchNorm to normalize feature extraction and apply SoftPlus to impose prior constraints on the model's output.

Table 4. Results obtained in ablation experiments.

Download CSV Display Table

5. Discussion

5.1. Comparative experiments

5.1.1. Baselines

We used two widely used machine learning algorithms for population estimation - Random Forest (RF) and Linear Regression (LR) - and a neural network model (NN) as baselines for comparative evaluation with the proposed WSTP. In the experiments, all models utilized the same training and testing data sets. WSTP adopted the weakly-supervised training method, as shown in ), whereas RF, LR, and MLP applied the traditional ‘fine-scale feature aggregation, model prediction, and supervision’ training approach depicted in ).

Figure 8. (a) Proposed weakly-supervised training method. (b) Traditional ‘fine-scale feature aggregation, model prediction, and supervision’ training approach.

Based on the hyperparameter tuning, RF was configured with n_estimators set to 2000 and max_depth set to 10, and the mean squared error was used as the loss function. For LR, L1 regularization was applied and the intercept parameter was set to 0. We observed that the performance of LR deteriorated significantly when the intercept was not set to 0, and LR performed better when the input features were not standardized. In our experiments, NN is equivalent to the Model prediction layer (MPL) module of WSTP, i.e. NN and MPL have the same neural network structure, activation function and number of neurons, and both use BatchNorm. The difference between WSTP and NN mainly lies in the difference of training methods.

WorldPop (Bondarenko et al. Citation2020), a publicly available 100 m grid population data set, was also used for comparison. Previous studies (Bai et al. Citation2018; Leyk et al. Citation2019; Yin et al. Citation2021) have compared various grid population data sets, including WorldPop, LandScan (Bright, Coleman, and Dobson Citation2000), GHS-POP (Freire et al. Citation2016), and GPW (Warszawski et al. Citation2017), and the consensus indicates that WorldPop has the highest accuracy. Due to the lower spatial resolution of other data sets making them unsuitable for community-scale assessment, we exclusively utilized the WorldPop data set for comparison. Population counts for each community in the WorldPop data set were calculated through zonal statistics.

5.1.2. Evaluation methods

We will evaluate the accuracy of baselines and WSTP for population estimates at both the community scale and the TAZ scale. Moreover, the training objectives also affect the performance of the models. Many previous studies have utilized the population density as the target for model training (Stevens et al. Citation2015; Shang et al. Citation2021). The population density tends to exhibit smaller variations across various spatial scales compared to population count. For example, if street A contains several communities, including community B, the population density of community B may closely match or even surpass that of street A, whereas the population count for community B will always be less than that for street A. Therefore, it is reasonable to assume that traditional methods may exhibit varying responses to scale effects when using the population count or population density as the training target. To quantify this disparity, the following two evaluation methods were applied to traditional models.

Count: The model was trained using community population counts as the target. After training, the features of TAZs were used as inputs to obtain population count predictions for TAZs, which were then aggregated to derive community-level population estimates. These estimates were compared with the actual community population counts for assessment.
Density: The model was trained using the community population density as the target. After training, the features of the TAZs were used as inputs to obtain population density predictions for TAZs. These density predictions were transformed into population counts for TAZs, aggregated to obtain community-level population estimates, and compared with the actual community population counts for assessment.

The population density of a community cannot be determined by simply summing the population densities of all the TAZs within it. Therefore, WSTP is not suitable for weakly supervised training with population density as the target, and thus the population count was always used as the training target for WSTP.

5.1.3. Comparative evaluation results

shows that the evaluation results were relatively poor for WorldPop, which can be attributed to the spatial misalignment between the community boundaries and 100 m grid boundaries inevitably increasing the evaluation errors when performing zonal statistics.

Table 5. Performance of models at the two scales: Community-level and TAZ-level.

Download CSV Display Table

At both the community and TAZ scales, RF-Count using population count as the training target obtained significantly lower accuracy compared with RF-Density using population density as the training target. This discrepancy occurred because RF-Count is a nonlinear model, and it consistently output the community-level population during the training phase. As a result, during the prediction phase, RF-Count used the TAZ-scale features as inputs but it tended to produce predictions that were within the same order of magnitude as the community population, thereby leading to severe overestimation of the TAZ-level population. As demonstrated in ) and (), RF-Count had OE rates of up to 99.5% and 96.8% for community and TAZ populations, respectively, thereby confirming that RF-Count was greatly influenced by the scale effect. Moreover, RF-Density outperformed RF-Count, indicating that using population density as the training target could alleviate the scale effect in nonlinear models such as RF. However, the accuracy of RF-Density is still significantly lower than that of WSTP. For example, at the TAZ scale, RF-Density achieved an R² value of only 0.398, whereas WSTP achieved a high R² value of 0.821. This suggests that population estimation models at the TAZ scale, which have irregular shapes, may not be suitable for training with population density as the target.

Figure 9. Scatter plots of predicted population versus actual population. A negative R² indicates that the chosen model fits worse than a horizontal line. The ‘identity’ line serves to illustrate the scenario where the predicted values precisely match the target values.

Figure 9. Scatter plots of predicted population versus actual population. A negative R2 indicates that the chosen model fits worse than a horizontal line. The ‘identity’ line serves to illustrate the scenario where the predicted values precisely match the target values.

NN is also a non-linear model, NN-Count outperforms NN-Density in R² and MAE metrics but is significantly worse than NN-Density in sMAPE, and the overall performance of both NN-Count and NN-Density are significantly worse than that of WSTP. It suggests that neural network models are not suitable for fine-scale population estimation when using traditional training methods.

Unlike the nonlinear RF model, LR-Count significantly surpassed LR-Density in terms of accuracy. LR is a linear model, which implies that LR-Count (or LR-Density) assumes a linear relationship between the input features and population count (or density). However, although a community’s population is indeed the sum of the populations of its constituent TAZs, there is no direct linear relationship between the population density of a community and the population densities of its constituent TAZs, and thus the assumption of LR-Density does not align well with reality. Moreover, in these models, community features are obtained by aggregating the features of TAZs. Consequently, the features of TAZs are typically much smaller than those of communities, thereby resulting in significant underestimation of the population densities of TAZs by LR-Density. As shown in ), LR-Density obtained an OE of only 5.8% for community populations, which confirmed its high sensitivity to the scale effect. LR-Count was less susceptible to the scale effect, but its accuracy remained lower than that of WSTP due to the limited expressive capacity of a linear model. In particular, at the community scale, LR-Count had an R² value of 0.711, much lower than that of WSTP at 0.821.

Moreover, we calculated the relative errors of WSTP and LR-Count for different population intervals on the test set, where the division of population intervals was achieved by the Jenks Natural Breaks method. As shown in , the relative error of LR-Count is higher than that of WSTP on all intervals, and this phenomenon is more prominent on the population interval [0, 4000), where the relative error of LR-Count is generally higher than 80%. This suggests that LR-Count's performance on communities with populations smaller than 4000 is most affected by scale effects.

Figure 10. Relative errors of WSTP and LR-Count in different population intervals.

In summary, when utilizing coarse-grained population data to train a fine-scale population estimation model, it is crucial to ensure that the model's assumptions align with reality. Nonlinear models, such as RF, are inappropriate when the training target is the population count, whereas linear models, such as LR, are not suitable for modeling the population density. Neural network models are not suitable for fine-scale population estimation when using traditional training methods. Although the proposed WSTP is a nonlinear neural network model trained with the population count, it exhibited significantly higher accuracy compared with baselines, demonstrating the robust expressive capabilities of WSTP, and its ability to mitigate scale effects through weakly supervised training.

5.2. Significance of age group-based population modeling

Populations within different age groups typically exhibit different spatial distribution patterns. For instance, rural areas often have a higher degree of population aging compared with urban areas. Moreover, modeling the population by age groups has the potential to introduce more supervision signals into the model’s training by utilizing age-specific population data as fine-grained labels. However, the existing population estimation methods primarily focus on estimating the distribution of the total population but without specifically estimating the population within various age groups. The proposed WSTP concurrently models the populations of all age groups through multi-task learning.

To assess the significance of age group-based population modeling, we compared the performance of WSTP in two scenarios: direct modeling of the total population and modeling by age groups. demonstrates that for WSTP, age group-based modeling outperformed direct modeling of the total population according to various metrics. For instance, the former achieved an R² value of 0.821, which exceeded the R² value of 0.719 with the latter. Therefore, by using age group-based modeling, WSTP can introduce richer supervision signals for weakly-supervised training to enhance the overall accuracy of population estimation.

Table 6. Performance of WSTP in two scenarios: modeling the total population directly and modeling the population by age groups.

Download CSV Display Table

Moreover, different age groupings could impact the overall accuracy of population estimation. Therefore, when partitioning age groups, we recommend considering the specific requirements of population analysis applications as well as the natural and social characteristics of the population’s age structure.

5.3. Analysis of the importance of features

Neural networks are often referred to as ‘black boxes’ but various statistical tools allow us to explore their behavior and indirectly interpret the relationships between features and the outputs of neural networks. In particular, the permutation feature importance (PFI) quantifies how a model's prediction error changes when the values of a specific feature are randomly permuted. PFI provides insights into the importance of each feature for the model (Breiman Citation2001). We use MAE as the error indicator for PFI.

presents the PFI values and rankings of features when estimating the population of each age group. The meanings of these features are shown in . We found that ‘Scenic spots’ type POIs consistently ranked high across all age groups. With the exception of the 0–4 years age group, the ‘Positioning’ feature set also had a pivotal role in all other age groups. However, each age group had distinct important features. For example, ‘MD.College’ ranked higher in the 15–24 and 25–44 years age groups but lower in the 45–64 and 65 + years age groups. ‘MD.College’ represents the proximity between the current TAZ and nearby higher education institutions, and thus it indirectly indicated whether the TAZ belonged to a college community, where the population primarily comprised individuals in the 15–24 and 25–44 years age groups.

Figure 11. Importance of features for estimating the population of each age group.

Furthermore, the top eight important features for the 45–64 and 65 + years age groups were consistent, with ‘Rural residential’ consistently ranking second. This indicates that the population distribution in these two age groups shared a similar association pattern with the model features, where both were highly correlated with rural residential buildings.

These observations also confirm that the age group-based modeling in WSTP is an effective strategy for accommodating the diverse relationships between age-specific populations and model features.

5.4. Addressing the inaccessibility of building data

The accuracy of population estimation depends significantly on the comprehensiveness and quality of the data employed. Dealing with the inaccessibility of selected data is an inherent challenge in this research field. It is imperative to develop methods that are capable of effectively handling the lack of data. Therefore, we investigated the performance of models without building data. In this study, without building data (WoB) means that all eight features associated with ‘Buildings’ in were excluded from the model training and prediction stages.

As shown in , after removing the building-related features, the performance degradation of WSTP and LR is not significant in terms of MAE, sMAPE and RMSE metrics, which suggests that both WSTP and LR exhibited relatively low sensitivity to the absence of building data. One possible explanation for this finding is the adoption of TAZs as the units for population analysis in our study. The construction of TAZs does not rely on building data but TAZs contain the surrounding environments of buildings. Therefore, computing auxiliary data features at the TAZ scale can effectively capture the living environments of residents and compensate for the absence of building features. In addition, WSTP-WoB still outperformed LR, which uses building data. For instance, the R² value was 0.758 for WSTP-WoB and higher than that of 0.711 for LR. This difference occurred because WSTP is a nonlinear model with greater expressive power than the linear model LR and WSTP was more effective at detecting association patterns between other features (e.g. POIs) and the population distribution.

Table 7. Performance of models with or without building data (WoB).

Download CSV Display Table

5.5. Considering the nature of spatial autocorrelation

Spatial autocorrelation (SA) refers to the degree of interdependence between the attributes of a geographical unit and its neighboring geographical units. Yang et al. (Citation2011) found significant local and global SA in the county-level population density in Jiangsu Province, China. Cheng, Wang, and Ge (Citation2022) incorporated SA into 1000 m grid population estimation by using the area-to-point kriging method and obtained promising results. However, SA indices are highly sensitive to the spatial scale used for analysis (Zhang et al. Citation2019).

To evaluate the influence of SA on TAZ-scale population estimation, we employed two graph-based models: the Graph Attention Network (GAT) and Graph Convolutional Network (GCN). These models share common features with WSTP in terms of the parameter configuration where each comprises three fully connected layers. The key difference is that GAT and GCN incorporate SA by integrating attention mechanisms and convolutional layers, respectively, to aggregate features from neighboring TAZs after the first fully connected layer. In our experiments, the following neighbor determination ranges are explored.

Self only: SA was not considered in this scenario. In this case, both GAT and GCN are equivalent to the proposed WSTP.
100, 500 m, and 800m: these cases corresponded to considering TAZs within 100, 500, and 800 meters as neighbors, respectively, for information propagation.
All: all other TAZs within the same community as neighbors for information propagation.

shows that both GAT and GCN performed best in the ‘Self only’ mode. As the neighbor determination range expanded, the performance tended to decrease with both GAT and GCN, before improving and then decreasing again, as illustrated in . However, the change in performance was significantly smaller for GCN than GAT. Thus, in addition to the neighbor determination distance, the method used for information propagation between neighbors had a significant impact on the modeling SA.

Figure 12. R² values for model when considering spatial autocorrelation at different neighborhood determination distances.

Figure 12. R2 values for model when considering spatial autocorrelation at different neighborhood determination distances.

Table 8. Performance of model when considering spatial autocorrelation at different neighborhood determination distances.

Download CSV Display Table

Therefore, incorporating SA into TAZ-scale population estimation did not obtain a substantial performance improvement, possibly because unlike fine-scale continuous grid units, TAZs are separated by roads and their environments exhibit a relatively low level of SA. However, it is worth exploring the potential impacts of different neighbor determination methods when considering SA in TAZ-scale population estimation, which could be investigated in future research.

6. Conclusion

In this study, we developed a weakly supervised TAZ-scale bottom-up population estimation method, WSTP. It takes TAZs as the PAUs, and utilizes weakly supervised learning to ensure that the model consistently focuses on TAZ-scale population estimation during the training and prediction phases, thereby avoiding scale effects. Moreover, WSTP decomposes census data into fine-grained age group labels to introduce more supervision signals for weakly supervised training. WSTP further leverages multi-task learning to concurrently model the TAZ-scale population for each age group. Our evaluation results showed that the R² values for WSTP at the community and TAZ scales were 0.821 and 0.785, respectively, and these significantly outperformed the baselines, demonstrating the capacity of WSTP to mitigate scale effects and yield high-resolution, precise population data. Furthermore, the age group-based modeling in WSTP was shown to be an effective strategy for accommodating the diverse relationships between age-specific populations and model features. The nonlinear nature of WSTP explain its robust feature expression capabilities, and the challenge of missing building data is effectively addressed by utilizing TAZs as PAUs. Experiments using graph neural networks to model the spatial autocorrelation (SA) properties of TAZ-scale population demonstrated that due to the relatively independent environments of TAZs, it was not necessary to consider SA during TAZ-scale population estimation in our study area.

This study presented a novel solution to the problem of scale effects in the estimation of fine-scale population distribution, and generated TAZ-scale population data for the central districts of Wuhan, which can provide a basis for decision-making in refined urban management, such as traffic management, disaster response, and epidemiological control in the central districts of Wuhan. In terms of spatial scope, although this study only focuses on the central districts, our WSTP can be easily extended to the whole Wuhan city and even other cities when multi-source geospatial data are available in the target area. In terms of temporal scope, the proposed WSTP was trained and validated only on 2020 Census data, but our bottom-up WSTP is census data-independent for population estimation, so it can also be used for TAZ-scale population estimation in the target year (2023 or even closer) when multi-source geospatial data features of the target year are used as inputs.

The proposed WSTP also has certain limitations. TAZs are generally smaller in spatial scale compared with communities but spatial scale heterogeneity still exists among TAZs. For instance, suburban areas usually have larger TAZ units due to their sparse road networks, which may introduce disturbances during TAZ-scale population estimation. To mitigate this scale heterogeneity, future studies could consider separately modeling areas with sparse and dense road networks. In addition, we treated population modeling for different age groups as independent tasks within the framework of multi-task learning but underlying correlation patterns may exist between population distributions of different age groups in reality. Future research could explore integrating these correlation patterns into the modeling process.

Data availability statement

Data not available due to legal restrictions.

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant no U20A2091, 41930107].

References

Bai, Zhongqiang, Juanle Wang, Mingming Wang, Mengxu Gao, and Jiulin Sun. 2018. “Accuracy Assessment of Multi-Source Gridded Population Distribution Datasets in China.” Sustainability 10 (5): 1363, Multidisciplinary Digital Publishing Institute. https://doi.org/10.3390/su10051363.
Web of Science ®Google Scholar
Baker, Jack, David Swanson, and Jeff Tayman. 2023. “Boosted Regression Trees for Small-Area Population Forecasting.” Population Research and Policy Review 42 (4): 51. https://doi.org/10.1007/s11113-023-09795-x.
Web of Science ®Google Scholar
Bondarenko, Maksym, David Kerr, Alessandro Sorichetta, and Andrew Tatem. 2020. “Census/Projection-Disaggregated Gridded Population Datasets, Adjusted to Match the Corresponding UNPD 2020 Estimates, for 183 Countries in 2020 Using Built-Settlement Growth Model (BSGM) Outputs.” University of Southampton. https://eprints.soton.ac.uk/444005/.
Google Scholar
Boo, Gianluca, Edith Darin, Douglas R. Leasure, Claire A. Dooley, Heather R. Chamberlain, Attila N. Lázár, Kevin Tschirhart, et al. 2022. “High-Resolution Population Estimation Using Household Survey Data and Building Footprints.” Nature Communications 13 (1): 1330, Nature Publishing Group. https://doi.org/10.1038/s41467-022-29094-x.
PubMed Web of Science ®Google Scholar
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Web of Science ®Google Scholar
Bright, Eddie A., Phil R. Coleman, and Jerome E. Dobson. 2000. “LandScan: A Global Population Database for Estimating Populations at Risk.” Photogrammetric Engineering and Remote Sensing 66: 849–858.
Web of Science ®Google Scholar
Cáceres, Aimy, Paulo Santos, Feliciano Tchalo, Michael Mills, and Martim Melo. 2013. “Human Use of Natural Resources and the Conservation of the Afromontane Forest in Mount Moco, Angola.” Journal of Sustainable Development in Africa 15 (January): 91–101.
Google Scholar
Chen, Hongxing, Bin Wu, Bailang Yu, Zuoqi Chen, Qiusheng Wu, Ting Lian, Congxiao Wang, Qiaoxuan Li, and Jianping Wu. 2021. “A New Method for Building-Level Population Estimation by Integrating LiDAR, Nighttime Light, and POI Data.” Journal of Remote Sensing 2021 (May), Science Partner Journal. https://doi.org/10.34133/2021/9803796.
Google Scholar
Cheng, Yang, Siyao Gao, Shuai Li, Yuchao Zhang, and Mark Rosenberg. 2019. “Understanding the Spatial Disparities and Vulnerability of Population Aging in China.” Asia & the Pacific Policy Studies 6 (1): 73–89. https://doi.org/10.1002/app5.267.
Web of Science ®Google Scholar
Cheng, Zhifeng, Jianghao Wang, and Yong Ge. 2022. “Mapping Monthly Population Distribution and Variation at 1-Km Resolution across China.” International Journal of Geographical Information Science 36 (6): 1166–1184. Taylor & Francis: https://doi.org/10.1080/13658816.2020.1854767.
Web of Science ®Google Scholar
Chi, Guangqing, and Paul R. Voss. 2011. “Small-Area Population Forecasting: Borrowing Strength across Space and Time.” Population, Space and Place 17 (5): 505–520. https://doi.org/10.1002/psp.617.
Web of Science ®Google Scholar
Chi, Guangqing, and Donghui Wang. 2017. “Small-Area Population Forecasting: A Geographically Weighted Regression Approach.” In The Frontiers of Applied Demography, edited by David A. Swanson, 449–471. Applied Demography Series. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-43329-5_21.
Google Scholar
Chi, Guangqing, Xuan Zhou, and Paul R. Voss. 2011. “Small-Area Population Forecasting in an Urban Setting: A Spatial Regression Approach.” Journal of Population Research 28 (2): 185–201. https://doi.org/10.1007/s12546-011-9053-6.
Google Scholar
Freire, Sergio, Kytt MacManus, Martino Pesaresi, Erin Doxsey-Whitfield, and Jane Mills. 2016. “Development of New Open and Free Multi-Temporal Global Population Grids at 250 m Resolution.” Population 250.
Google Scholar
Georganos, Stefanos, Sebastian Hafner, Monika Kuffer, Catherine Linard, and Yifang Ban. 2022. “A Census from Heaven: Unraveling the Potential of Deep Learning and Earth Observation for Intra-Urban Population Mapping in Data Scarce Environments.” International Journal of Applied Earth Observation and Geoinformation 114 (November): 103013. https://doi.org/10.1016/j.jag.2022.103013.
Google Scholar
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Adaptive Computation and Machine Learning Series. London, England: MIT Press.
Google Scholar
Grippa, Taïs, Catherine Linard, Moritz Lennert, Stefanos Georganos, Nicholus Mboga, Sabine Vanhuysse, Assane Gadiaga, and Eléonore Wolff. 2019. “Improving Urban Population Distribution Models with Very-High Resolution Satellite Information.” Data 4 (1): 13. Multidisciplinary Digital Publishing Institute. https://doi.org/10.3390/data4010013.
Web of Science ®Google Scholar
Hauer, Mathew E. 2019. “Population Projections for U.S. Counties by Age, Sex, and Race Controlled to Shared Socioeconomic Pathway.” Scientific Data 6 (1): 190005. Nature Publishing Group. https://doi.org/10.1038/sdata.2019.5.
PubMedGoogle Scholar
Hu, Qiushi, Rui Li, Huayi Wu, and Zhaohui Liu. 2022. “Construction of a Refined Population Analysis Unit Based on Urban Forms and Population Aggregation Patterns.” International Journal of Digital Earth 15 (1): 79–107. Taylor & Francis. https://doi.org/10.1080/17538947.2021.2013963.
Web of Science ®Google Scholar
Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” arXiv. https://doi.org/10.48550/arXiv.1502.03167.
Google Scholar
Jacobs, Nathan, Adam Kraft, Muhammad Usman Rafique, and Ranti Dev Sharma. 2018. “A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery.” In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 33–42. SIGSPATIAL ‘18. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3274895.3274934.
Google Scholar
Jain, Deepty, and Geetam Tiwari. 2017. “Population Disaggregation to Capture Short Trips – Vishakhapatnam, India.” Computers, Environment and Urban Systems 62 (March): 7–18. https://doi.org/10.1016/j.compenvurbsys.2016.10.003.
Google Scholar
Leasure, Douglas R., Warren C. Jochem, Eric M. Weber, Vincent Seaman, and Andrew J. Tatem. 2020. “National Population Mapping from Sparse Survey Data: A Hierarchical Bayesian Modeling Framework to Account for Uncertainty.” Proceedings of the National Academy of Sciences 117 (39): 24173–24179. https://doi.org/10.1073/pnas.1913050117.
PubMed Web of Science ®Google Scholar
Leyk, Stefan, Andrea E. Gaughan, Susana B. Adamo, Alex De Sherbinin, Deborah Balk, Sergio Freire, Amy Rose, et al. 2019. “The Spatial Allocation of Population: A Review of Large-Scale Gridded Population Data Products and Their Fitness for Use.” Earth System Science Data 11 (3): 1385–1409. https://doi.org/10.5194/essd-11-1385-2019.
Web of Science ®Google Scholar
Mei, Yuao, Zhipeng Gui, Jinghang Wu, Dehua Peng, Rui Li, Huayi Wu, and Zhengyang Wei. 2022. “Population Spatialization with Pixel-Level Attribute Grading by Considering Scale Mismatch Issue in Regression Modeling.” Geo-Spatial Information Science 25 (3): 365–382. Taylor & Francis. https://doi.org/10.1080/10095020.2021.2021785.
Web of Science ®Google Scholar
Metzger, Nando, John E. Vargas-Muñoz, Rodrigo C. Daudt, Benjamin Kellenberger, Thao Ton-That Whelan, Ferda Ofli, Muhammad Imran, Konrad Schindler, and Devis Tuia. 2022. “Fine-Grained Population Mapping from Coarse Census Counts and Open Geodata.” Scientific Reports 12 (1): 20085. Nature Publishing Group. https://doi.org/10.1038/s41598-022-24495-w.
PubMed Web of Science ®Google Scholar
Miller, Harvey, and Shih-Lung Shaw. 2001. “Geographic Information Systems for Transportation (GIS -T): Principles and Applications,” January.
Google Scholar
Neal, Isaac, Sohan Seth, Gary Watmough, and Mamadou S. Diallo. 2022. “Census-Independent Population Estimation Using Representation Learning.” Scientific Reports 12 (1): 5185. Nature Publishing Group. https://doi.org/10.1038/s41598-022-08935-1.
PubMed Web of Science ®Google Scholar
Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, edited by Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, 8024–35. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
Google Scholar
Shang, Shuoshuo, Shihong Du, Shouji Du, and Shoujie Zhu. 2021. “Estimating Building-Scale Population Using Multi-Source Spatial Data.” Cities 111 (April): 103002. https://doi.org/10.1016/j.cities.2020.103002.
Google Scholar
Stevens, Forrest R., Andrea E. Gaughan, Catherine Linard, and Andrew J. Tatem. 2015. “Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data.” Edited by Luís A. Nunes Amaral. PLoS One 10 (2): e0107042. https://doi.org/10.1371/journal.pone.0107042.
PubMed Web of Science ®Google Scholar
Swanwick, Rachel H., Quentin D. Read, Steven M. Guinn, Matthew A. Williamson, Kelly L. Hondula, and Andrew J. Elmore. 2022. “Dasymetric Population Mapping Based on US Census Data and 30-m Gridded Estimates of Impervious Surface.” Scientific Data 9 (1): 523. https://doi.org/10.1038/s41597-022-01603-z.
PubMed Web of Science ®Google Scholar
Tu, Wenna, Zhang Liu, Yunyan Du, Jiawei Yi, Fuyuan Liang, Nan Wang, Jiale Qian, Sheng Huang, and Huimeng Wang. 2022. “An Ensemble Method to Generate High-Resolution Gridded Population Data for China from Digital Footprint and Ancillary Geospatial Data.” International Journal of Applied Earth Observation and Geoinformation 107 (March): 102709. https://doi.org/10.1016/j.jag.2022.102709.
Google Scholar
Wang, Shunli, Rui Li, Jie Jiang, and Yao Meng. 2021. “Fine-Scale Population Estimation Based on Building Classifications: A Case Study in Wuhan.” Future Internet 13 (10): 251. Multidisciplinary Digital Publishing Institute. https://doi.org/10.3390/fi13100251.
Web of Science ®Google Scholar
Warszawski, L., K. Frieler, V. Huber, F. Piontek, O. Serdeczny, X. Zhang, Q. Tang, et al. 2017. “Center for International Earth Science Information Network—CIESIN—Columbia University.(2016). Gridded Population of the World, Version 4 (GPWv4): Population Density. Palisades. NY: NASA Socioeconomic Data and Applications Center (SEDAC).” Atlas of Environmental Risks Facing China Under Climate Change 228.
Google Scholar
Weber, Eric M., Vincent Y. Seaman, Robert N. Stewart, Tomas J. Bird, Andrew J. Tatem, Jacob J. McKee, Budhendra L. Bhaduri, Jessica J. Moehl, and Andrew E. Reith. 2018. “Census-Independent Population Mapping in Northern Nigeria.” Remote Sensing of Environment 204 (January): 786–798. https://doi.org/10.1016/j.rse.2017.09.024.
PubMedGoogle Scholar
Weerasinghe, Oshadhi, and Saman Bandara. 2023. “Modified Traffic Analysis Zones Approach for the Estimation of Passenger Flow Distribution in Urban Areas.” Journal of Urban Planning and Development 149 (1): 04022045. https://doi.org/10.1061/(ASCE)UP.1943-5444.0000881.
Web of Science ®Google Scholar
Wilson, Tom, Irina Grossman, Monica Alexander, Phil Rees, and Jeromey Temple. 2022. “Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs.” Population Research and Policy Review 41 (3): 865–898. https://doi.org/10.1007/s11113-021-09671-6.
PubMed Web of Science ®Google Scholar
Yang, Mengmeng, Jinsong Ma, Peihong Jia, Yingxia Pu, and Gang Chen. 2011. “The Use of Spatial Autocorrelation to Analyze Changes in Spatial Distribution Patterns of Population Density in Jiangsu Province, China.” 2011 19th International Conference on Geoinformatics, 1–6. https://doi.org/10.1109/GeoInformatics.2011.5980909.
Google Scholar
Yao, Yao, Xiaoping Liu, Xia Li, Jinbao Zhang, Zhaotang Liang, Ke Mai, and Yatao Zhang. 2017. “Mapping Fine-Scale Population Distributions at the Building Level by Integrating Multisource Geospatial Big Data.” International Journal of Geographical Information Science, February, 1–25. https://doi.org/10.1080/13658816.2017.1290252.
Google Scholar
Ye, Tingting, Naizhuo Zhao, Xuchao Yang, Zutao Ouyang, Xiaoping Liu, Qian Chen, Kejia Hu, et al. 2019. “Improved Population Mapping for China Using Remotely Sensed and Points-of-Interest Data within a Random Forests Model.” Science of The Total Environment 658 (March): 936–946. https://doi.org/10.1016/j.scitotenv.2018.12.276.
PubMedGoogle Scholar
Yin, Xu, Peng Li, Zhiming Feng, Yanzhao Yang, Zhen You, and Chiwei Xiao. 2021. “Which Gridded Population Data Product Is Better? Evidences from Mainland Southeast Asia (MSEA).” ISPRS International Journal of Geo-Information 10 (10): 681. Multidisciplinary Digital Publishing Institute. https://doi.org/10.3390/ijgi10100681.
Web of Science ®Google Scholar
Yuan, Nicholas Jing, Yu Zheng, and Xing Xie. 2012. “Segmentation of Urban Areas Using Road Networks.” Microsoft, Albuquerque, NM, USA, Tech. Rep. MSR-TR-2012-65. https://www.semanticscholar.org/paper/Segmentation-of-Urban-Areas-Using-Road-Networks-Yuan-Zheng/c512a4b6e80613abb88276810ed26e4b5b407c63.
Google Scholar
Zhang, Boen, Gang Xu, Limin Jiao, Jiafeng Liu, Ting Dong, Zehui Li, Xiaoping Liu, and Yaolin Liu. 2019. “The Scale Effects of the Spatial Autocorrelation Measurement: Aggregation Level and Spatial Resolution.” International Journal of Geographical Information Science 33 (5): 945–966. Taylor & Francis: https://doi.org/10.1080/13658816.2018.1564316.
Web of Science ®Google Scholar
Zhao, F. X., F. W. Zhao, and H. Sun. 2019. “A Coevolution Model of Population Distribution and Road Networks.” Physica A: Statistical Mechanics and Its Applications 536 (December): 120860. https://doi.org/10.1016/j.physa.2019.04.096.
Google Scholar
Zheng, Hao, Zhanlei Yang, Wenju Liu, Jizhong Liang, and Yanpeng Li. 2015. “Improving Deep Neural Networks Using Softplus Units.” 2015 International Joint Conference on Neural Networks (IJCNN), 1–4. https://doi.org/10.1109/IJCNN.2015.7280459.
Google Scholar
Zhou, Zhi-Hua. 2018. “A Brief Introduction to Weakly Supervised Learning.” National Science Review 5 (1): 44–53. https://doi.org/10.1093/nsr/nwx106.
Web of Science ®Google Scholar
Zong, Zefang, Jie Feng, Kechun Liu, Hongzhi Shi, and Yong Li. 2019. “DeepDPM: Dynamic Population Mapping via Deep Neural Network.” In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 1294–1301. AAAI’19/IAAI’19/EAAI’19. Honolulu, Hawaii, USA: AAAI Press. https://doi.org/10.1609/aaai.v33i01.33011294.
Google Scholar

Scale effects-aware bottom-up population estimation using weakly supervised learning

ABSTRACT

1. Introduction

2. Methodology

2.1. Model prediction layer

2.2. Spatial aggregation layer

2.3. Model training

2.4. TAZ-Scale population estimation

2.5. Accuracy evaluation

3. Study area and data

3.1. Stude area

3.2. Data and features

Table 1. Multi-source geospatial data and features.

3.3. Population analysis units

Table 2. Statistics for TAZs and communities in the study area.

4. Experimental results

4.1. TAZ-scale population estimation

4.2. Accuracy evaluation

4.2.1. Accuracy evaluation at the community scale

Table 3. Statistics and evaluation results for the total population (all) and the population within each age group at the community-scale. The CV column provides the community-scale coefficient of variation for each age group’s population.

4.2.2. Accuracy evaluation at the TAZ scale

4.3. Ablation experiments

Table 4. Results obtained in ablation experiments.

5. Discussion

5.1. Comparative experiments

5.1.1. Baselines

5.1.2. Evaluation methods

5.1.3. Comparative evaluation results

Table 5. Performance of models at the two scales: Community-level and TAZ-level.

5.2. Significance of age group-based population modeling

Table 6. Performance of WSTP in two scenarios: modeling the total population directly and modeling the population by age groups.

5.3. Analysis of the importance of features

5.4. Addressing the inaccessibility of building data

Table 7. Performance of models with or without building data (WoB).

5.5. Considering the nature of spatial autocorrelation

Table 8. Performance of model when considering spatial autocorrelation at different neighborhood determination distances.

6. Conclusion

Data availability statement

Disclosure statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date