472
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Exploring multi-relational spatial interaction imputation with distance-decay effects

ORCID Icon, ORCID Icon, , , & ORCID Icon
Article: 2300316 | Received 09 Aug 2023, Accepted 17 Dec 2023, Published online: 09 Jan 2024

ABSTRACT

Spatial interaction imputation aims to compensate for missing stable connections in geographical space, bolstering interaction network integrity and accuracy. Graph neural networks excel in graph-structured interaction data. However, existing research often focuses on homogeneous networks, neglecting the impact of heterogeneous interaction relationships influenced by distance decay on interaction imputation. Neglecting edge heterogeneity constrains the ability to effectively model the network structure, consequently leading to suboptimal performance in interaction imputation. This study introduces an interaction imputation graph convolutional network model. It constructs a heterogeneous interaction network with multi-distance relationships, considering distance decay. The model performs graph embedding based on interaction relationships between nodes. It comprehensively incorporates multiple interaction modes, topological structures, and node attributes to enhance spatial interaction imputation accuracy. Empirically validated using Beijing taxi travel data, our model outperforms existing models, improving imputation accuracy by approximately 8.70%. Our model consistently maintains superior accuracy in interaction networks of various sizes, demonstrating the stable superiority of our model. We also demonstrated that the variation in the number of interaction relationships affects imputation accuracy. A reasonable number of relationships and a larger feature dimension of geographical units yield better interaction imputation results.

1. Introduction

Spatial interaction refers to the varying intensities of connections between different entities in a geographical space, involving the movement and exchange of energy, information, and other forms. Interactions between geographical entities in a geographical space form a spatial interaction network. In this network, nodes represent geographical entities such as locations, grid cells, and more. Unlike origin–destination (OD) flows, edges represent the interactions between entities and are typically represented using aggregated OD flows. Spatial interaction networks model complex spatial interactions in the form of a network, effectively capturing their topological features. Spatial interactions reflect correlation patterns between non-adjacent locations and urban spatial organization at various scales. The study of spatial interactions unveils mobility patterns, traffic flow characteristics (Wang, Mo, and Wang Citation2014; Zhang et al. Citation2020), regional functions (Tao et al. Citation2019), and aids in tourism planning (Shao, Zhang, and Li Citation2017), among other insights.

The rise of big data has enabled the recording of human activity trajectories (Gao et al. Citation2013) and check-in data with geographic tags (Zhen et al. Citation2017), facilitating the study of spatial interactions across different scales. However, spatial interaction networks often grapple with missing data due to sensor failures, data distortion, and other factors (Jain, Murty, and Flynn Citation1999; Zhao et al. Citation2016), making it challenging to construct a complete interaction network. To address this, researchers employ spatial interaction imputation, leveraging historical data to uncover potential or missing interactions between locations. This process is vital for filling gaps in geographical big data, enhancing data completeness and usability, and ultimately revealing spatial structures (Chen et al. Citation2014; Roth et al. Citation2011), flow patterns (Hu et al. Citation2023), and correlation rules within geographical spaces. Spatial interaction imputation is particularly useful for transportation planning, population forecasting, and resource allocation in urban development.

Initially, researchers relied on physical models like gravity models (Erlander and Stewart Citation1990) and radiation models (Simini et al. Citation2012) for interaction imputation. With the advancement of computer algorithms, researchers (Fischer and Gopal Citation1994; Simini et al. Citation2021)have integrated gravity models with neural networks. Neural networks aid in learning more complex features from interaction data. Researchers (Morton, Piburn, and Nagle Citation2018; Pourebrahim et al. Citation2019) have also applied ensemble models, such as XGBoost and Random Forest, to interaction imputation. Markov chains and Bayesian (Hazelton Citation2008; Li Citation2009) probability models calculate transition probability matrices to predict traffic flow between roads. However, the aforementioned models lacked consideration of the interaction network’s topological structure.

The powerful feature-extraction ability of deep learning has been widely applied in various fields. For instance, multimodal deep learning (Hong, Gao, Yokoya, et al. Citation2020) in remote sensing has been utilized for ground object classification in more complex scenes. The superior fitting capability of deep learning (Li et al. Citation2023; Simini et al. Citation2021) has assisted in optimizing the base model parameters, thereby enhancing the performance of the base model. A Graph Convolutional Network (GCN) possesses the capability to extract features and perform non-linear fitting on arbitrary graph structures. The graph structure can be used to construct more relationships, and scholars (Hong, Gao, Yao, et al. Citation2020) have improved the classification performance by establishing batch GCN models for training large-scale remote sensing data. GCN are highly suitable for spatial interactions (Zhou et al. Citation2022), which involve irregular datasets with arbitrary connections between nodes. Initially, researchers applied GCN to traffic prediction. They (Hou et al. Citation2021; Li et al. Citation2019) used LSTM and GCN algorithms to predict future short-term traffic flow within a road network, focusing on modeling neighbor relationships and spatiotemporal dependencies between nodes. In contrast to traffic prediction, which focuses on short-term future road traffic flows, spatial interaction addresses long-term, stable interactions between locations (Zhang et al. Citation2020; Zhao et al. Citation2019). Interaction imputation emphasizes the imputation of past interactions and the exploration of potential interactions between entities. The SI-GCN model (Yao et al. Citation2020) incorporates the topological features of interactions into node embedding vectors, thus validating the feasibility and excellence of the GCN in interaction imputation. Furthermore, ConvGCN-RF (Yin et al. Citation2023) verifies that integrating the geographic proximity and interaction semantics of nodes contributes to interaction imputation. Li (Li et al. Citation2021) combined social interaction and spatial interaction, predicting dynamic human interaction intensity based on a graph convolutional network model.

Tobler's First Law of Geography (Tobler Citation1970) states that ‘everything is related to everything else, but near things are more related than distant things.’ Due to the constraints of interaction costs and the presence of intervening opportunities (Stouffer Citation1940), spatial interaction exhibits evident distance decay effects. In spatial interaction networks, as the distance between nodes increases, the interaction intensity between them gradually weakens and the probability of interaction decreases. Distance decay is a crucial spatial effect in understanding spatial interactions, and researchers have undertaken numerous studies to model it (Haggett Citation1965; Wang and Fahui Citation2012). Data field theory (Yi et al. Citation2020) has been used to categorize interactions into long-distance and short-distance interactions, which helped identify travel hotspots. Researchers (Yao et al. Citation2020; Yin et al. Citation2023) conducted tests on interaction imputation using GCN at different distances. They discovered that as the distance increased, the interaction imputation performance improved gradually. This observation demonstrates that the influence of the distance decay causes variations in the interaction imputation at different distances. The present study did not consider the effect of distance decay on interaction imputation.

To capture the diversity and complexity of the interactive relationships induced by distance decay, this study introduces heterogeneous graphs to model various types of interactions. Heterogeneous graphs consist of nodes with multiple edge types such as family relationships, friendships, and rivalries in social networks. To address heterogeneous graphs, researchers (Schlichtkrull et al. Citation2017) proposed Relation Graph Convolutional Networks (R-GCNs) to embed multiple edge types and represent them effectively. The primary idea is to partition a heterogeneous graph into multiple homogeneous graphs. This approach allows for a more comprehensive and detailed modeling of different edge relationships, thus enabling a more robust representation of the diverse interactions between nodes in a heterogeneous graph. It has been widely used in social networks (Salamat, Luo, and Jafari Citation2021; Wu et al. Citation2019), drug-target interactions (Li et al. Citation2022; Yang et al. Citation2022), knowledge graphs (Zheng et al. Citation2021), and other applications. Heterogeneous graphs provide a wealth of information in the analysis and modeling process, enabling a deeper understanding of the potential dynamics and patterns within the network. Typically, limited attention has been paid to the diversity of edge relationships in interaction imputation research. Interactions between geographic units are considered as homogenous relationships (Yue et al. Citation2018). The distance decay effect in spatial interactions causes interactions with similar intensities to cluster within specific distance intervals (John, Lyhagen, and Reggiani Citation2016). Multi-Relation Graph Convolutional Networks can help capture the differences in interactive relationships at different distances, thereby improving the accuracy of interaction imputation.

Existing spatial interaction imputation research has primarily focused on the role of node properties and interaction topology while overlooking interactions that exhibit heterogeneity as distance increases. Considering the aforementioned limitations, this study proposes a spatial interaction imputation model with a multi-distance heterogeneous graph convolutional network (MultDis-SIHN), comprising three components: an Interactivity Scale Divider, Heterogeneous Graph Feature Extractor, and DistMult Decoding Inference Engine. The MultDis-SIHN model considers the multiple interaction patterns caused by distance decay, incorporating edge types, node attributes, and interaction topology to achieve finer-grained node embeddings. Compared to existing models, our approach demonstrates superior performance in interaction imputation tasks. The main contribution of this study include considering the interaction heterogeneity caused by distance decay effect in interaction imputation. And introducing a multi-relational heterogeneous interaction network into interaction imputation tasks.

2. Problem statement

The premise of spatial interaction imputation is predicting the interaction intensity between unknown geographic units based on similar interaction features observed among similar geographical units. Therefore, the focus of research on spatial interaction imputation is the measurement of similarity between geographical units. As illustrated in , it is common to establish directed homogeneous spatial interaction network models to capture attribute proximity and topological structure proximity between geographical units,thereby achieving more precise interaction imputation. For example, in a scenario where universities A, B, and C exhibit similar attributes and network connectivity structures, as depicted in , their interactions with the station should be all similar. Therefore, it becomes possible to infer the interaction intensity of the other two with the station based on the known interaction intensity of any one of them with the station. However, when evaluating the proximity of topological structures, it is assumed that all node links were the same, thus neglecting the heterogeneity of the interaction relationships caused by varying interaction distances. As illustrated in , considering the edge relationships between nodes, in cases where universities A, B, and C have similar attributes, the similarity between universities B and C is closer. Consequently,using the interaction intensity between University B and the station as a reference, yields more accurate results when inferring the interaction between University C and the station, compared to using the interaction intensity between University A and the station. Therefore, considering the interaction relationships at different distances between nodes allows for more precise modeling, contributing to an increase in the measurement of node similarity and accuracy of spatial interaction imputation.

Figure 1. Homogeneous interaction network (the edges represent the connectivity).

Figure 1. Homogeneous interaction network (the edges represent the connectivity).

Figure 2. Heterogeneous interaction network (the colored edges represent different interaction distance relationships).

Figure 2. Heterogeneous interaction network (the colored edges represent different interaction distance relationships).

When given a spatial interaction network G, the interaction intensity and the likelihood of interactions occurring between two locations tend to decrease as the distance between these locations increases. Therefore, these interactions give rise to multiple interaction relationships, forming a heterogeneous spatial interaction network. The formal definitions are as follows:

Geographic unit set P={p1,p2,p3,,pn}. In a real geographic space, spatial interactions occur within geographic units P where people live and reside, such as grids, places, and actual areas of interest (AOI). Each unit is characterized by a vector that includes coordinates, pull, push, land type, and other relevant attributes describing the distinctions between the unit and other units. It can be defined as pi={Loc,Lat,pull,push,type}. Pull refers to the number of taxi drop-off points within a given unit, representing a geographical attraction. Conversely, push represents the number of taxi pick-up points contained within a unit, indicating a geographical impetus or pulling force.

Heterogeneous spatial interactive networks G=(P,F,R). The network is a weighted directed graph with the geographic unit set P as the nodes and the spatial interaction set F as the edges; F is defined as F={(pi,pj,fij,Rij)}, which is composed of the origin unit pi, destination unit pj, the interaction intensity fij between them, and the interaction relationship Rij that exists between the two units. In contrast to other spatial interaction networks, this network includes an additional attribute, Rij that represents the interactions between nodes that are not completely identical. In this study, Rij represents different interaction distance relationships, such as long-, medium-, and short-distance interactions, among others.

3. Methodology

In this section, we propose a spatial heterogeneous interaction imputation network model that considers distance decay, known as the MultDis-SIHN model. As shown in , the framework comprises three main components. The first component is the Interactivity Scale Divider, which partitions spatial interactions into multiple distance relationships considering the distance decay effect. The second component is the Heterogeneous Graph Feature Extractor, which applies relational graph convolution networks to the spatial interaction network with multiple distance relationships to obtain embedded vectors for the nodes considering the distance relationships. The third component is the DistMult Decoding Inference Engine, which utilizes a bilinear decoder, DistMult, to generate the predicted interaction intensity between two units based on learned node embedding.

Figure 3. Research framework.

Figure 3. Research framework.

3.1. Interactivity scale divider

By visualizing the scatter plot of the interaction distances and their corresponding interaction intensity between nodes in the spatial interaction network, the x-axis represents the interaction distance and the y-axis represents the interaction intensity. Each point on the plot represents the interaction between two locations. This visualization clearly displays the distribution patterns of interaction distances and their corresponding interaction intensities. By observing the spatial distribution and intensity variation of interactions across distances, we revealed the range of distance thresholds that exhibited similar interaction characteristics.

In this study, we chose the K-means clustering algorithm to cluster points in the scatter plot (Hamerly and Elkan Citation2003). The core idea of the K-means algorithm is to minimize the distance between each sample and its corresponding cluster centroid, maximizing the similarity within clusters, while maintaining a significant difference between clusters. Therefore, the K-means clustering method, where the K value is the number of relationships, meets the needs of this study better. The points in the scatter plot along the X-axis were partitioned into various clusters based on their compactness and distribution using the K-means clustering method. Using K-means clustering, we divided the interaction distances into several intervals, each with equal interaction features and relative differences in interactions across the intervals. Each distance interval represented a different type of interaction link, transforming the spatial interaction network based on distance decay effects into a geographically heterogeneous network with many distance relationships. This preliminary stage laid the foundation for subsequent research and analysis.

3.2. Heterogeneous graph feature extractor

In a spatial interaction network, each node carries numerous attributes that depict the similarity between nodes. One-hot was employed to construct the node feature vectors to facilitate the computation of the node features. This approach allowed us to incorporate attributes such as geographical unit functionality and positional coordinates into the spatial interaction network. By combining multiple types of edges identified in the previous step, we established a heterogeneous spatial interaction network. This network served as the foundation for the subsequent analyses and modeling.

As shown in , the interaction relationships between nodes in a heterogeneous spatial interaction network with multiple distance relationships can be divided into long-, short-, and medium-distance interactions (and possibly more types) based on the distance decay effect. For a given node, there may be multiple types of edges with connected nodes. The first challenge is to perform node aggregation across corresponding relationships. The selected encoder, the Relational Graph Convolutional Network (R-GCN) (Schlichtkrull et al. Citation2017), learns node representations for each individual relationship. As shown in , the heterogeneous graph feature extractor was specifically used for multi-relation heterogeneous graphs. Its key characteristic is gathering the features of neighboring nodes based on the corresponding edge types and using them to update their own states. It consists of two layers of loops, as shown in (1). The inner loop is used to aggregate the features of all neighboring nodes connected to the central node under relation type r and update the representation of the central node. The outer loop iterates over the interactions of the central node with the other nodes for each relation type rR. This process is repeated until the features of all the connected nodes, based on their corresponding interaction types, are updated into the embedding of the central node. In addition, the central node features are incorporated into the next layer's central node representation through a self-loop transformation. The features of the central node are updated in each layer of the propagation process by utilizing neighboring nodes through the aforementioned procedure, resulting in gradual optimization and updating of the node's features. To prevent gradient explosions during the propagation process, the ReLU activation function is used to represent complex non-linear features, eventually resulting in the final embedding vector for the central node hil+1. Similarly, the corresponding relation matrices Wr(l) between the nodes were also computed during this process. (1) hil+1=σ(rRjNir1ci,rWr(l)+W0(l)hi(l))(1) The neighboring nodes of node i with respect to relationship r can be represented as Nir.ci,r is a regularization constant, where the value of ci,r is |Nir|. Wr(l) is a linear transformation function that transforms all neighboring nodes with the same distance relationship. hil+1 is the final output vector of node i and l = 0,1,2 … represents the number of propagation layers. hi0is the initial input vector of node i. W0(l) is a self-loop relation matrix, σ is the ReLU activation function, and R represents all distance relationships. Increasing the number of layers in a GCN can lead to diminishing returns and may even harm the model's generalization performance. In this study, the number of layers in MultDis-SIHN was limited to 1 to avoid oversmoothing problems. The computational complexity of the MultDis-SIHN model primarily arises from the graph feature extractor, which is related to the number of relationships, nodes, layers, and edges. Compared with the single-relation interaction network, an increase in the number of relations can lead to overfitting owing to an increase in parameters. Therefore, in this study, we applied cardinality decomposition regularization to handle the relation matrices, as shown in (2). MultDis-SIHN incorporates complex topological information among nodes under multiple distance relations and embeds it into low-dimensional node vectors, facilitating interaction calculations. (2) Wr(l)=b=1Barb(l)Vb(l)(2) Where Vb(l)Rd(l+1)×d(l) denotes the adjacency matrix. arb(l) is the coefficient. This equation shows that Wr(l) is a linear combination of the base matrices. Vb(l) is shared among all relations within the same layer, which helps reduce the number of parameters.

Figure 4. Heterogeneous network with multi-distance relationship spatial interaction.

Figure 4. Heterogeneous network with multi-distance relationship spatial interaction.

Figure 5. Heterogeneous graph feature extractor illustration.

Figure 5. Heterogeneous graph feature extractor illustration.

3.3. DistMult decoding inference engine

DistMult is a commonly used scoring function for link-prediction tasks (Huang, Li, and Yin Citation2018; Yang et al. Citation2014). It is based on tensor factorization methods and assumes that the relationship between entities can be modeled as the product of their vector representations (Yang et al. Citation2022). The interaction intensity between nodes can be calculated using (3). By performing matrix multiplication on the input triple (hiL, Rr, and hjL), the resulting score represents the intensity of interaction between nodes i and j. In (3), hiL and hjL represent the embedding vectors obtained through the heterogeneous graph feature extractor and the predicted interaction intensity fij can be obtained by element-wise multiplication operation. T denotes the transpose operation and Rr is the diagonal matrix R derived from the relation matrix Wr for the node pair under the corresponding relation r. (3) fij=(hiL)TRrhjL(3) To improve the computational efficiency of the model and enhance its generalization capability, random negative sampling was performed on the aforementioned spatial interaction network (Kotnis and Nastase Citation2017). The interaction intensity of the random spatial interaction subset was sat 0 to create negative samples Fn, whereas the remaining interactions were considered as positive samples Fp. The mean squared error (MSE) was utilized as the loss function to compare the anticipated interaction intensity between two locations generated by DistMult with the genuine interaction intensity. The loss value can be calculated using (4). fij represents the predicted interaction quantity, fij represents the true interaction quantity, and N represents the total number of Fp and Fn. (4) Loss=1NfijFpFn(fijfij)2(4) We employed a full-batch gradient optimization strategy, iterating through the entire training dataset to train and optimize the parameters. The iteration goal is to minimize the loss function, maximizing the similarity between the predicted interaction intensity for positive samples and the actual interaction intensity. This aims to preserve the overall optimal performance of the model.

3.4. Performance evaluation

3.4.1. Evaluation metrics

We evaluated the model accuracy using four metrics commonly used in interaction imputation: root mean square error (RMSE), mean absolute percentage error (MAPE), spearman correlation coefficient (SCC), and common part of commuter (CPC). RMSE and MAPE are metrics used to measure the error between actual observations and predictions; the smaller the value, the better. CPC represents the similarity between the predicted and true values in the interaction intensity prediction (Simini et al. Citation2012), which is between 0 and 1; the greater the value, the better. SCC reflects the change in consistency between the predictions and the true interaction intensity, which is better for larger values. The equations for these metrics are as follows: (5) RMSE=1ni,j(f^ijfij)2(5) (6) MAPE=1ni,j|f^ijfijfij|(6) (7) CPC=2ijmin(f^ij,fij)ijf^ij+ijfij(7)

3.4.2. Baselines

In this study we compared the MultDis-SIHN model and two other models: the classic physics model known as the gravity model and highly effective graph convolutional interaction imputation model, SI-GCN. The baseline model was expressed as follows:

Gravity model (GM) suggests that the flow intensity fij between the two regions pi and pj is positively associated with pis pull (pulli) and pjs push (pushj), whereas the distance cost dij between them has an inverse effect (Erlander and Stewart Citation1990) (8) fij=Kpullipushjdijβ(8) The SI-GCN model integrates graph embedding into interaction imputation (Yao et al. Citation2020). It incorporates spatial representation and GCN encoders to embed geographic attributes and interaction topologies into node embedding. Subsequently, a bilinear decoder is used to estimate the intensity of the interaction between the two locations. This approach leverages the power of a GCN to effectively capture spatial dependencies and achieves optimal performance in spatial interaction prediction.

4. Case study

4.1. Data and processing

4.1.1. Datasets

In , the left panel shows the OD density of the taxis within each ring in Beijing during the day. It shows that the OD density within the Fifth Ring Road in Beijing exceeded 558 points/km2. However, the OD point density throughout Beijing is only 63 points/km2. And the area within the Fifth Ring Road in Beijing is a major region where the population resides and experiences intensive spatial interactions. Therefore, we selected this as our study area.

Figure 6. Study area (the left panel shows the OD density map of taxis on each ring road in Beijing in points/km2, while the right panel shows the functional zones within the Fifth Ring Road).

Figure 6. Study area (the left panel shows the OD density map of taxis on each ring road in Beijing in points/km2, while the right panel shows the functional zones within the Fifth Ring Road).

Functional zones are considered places where human activities take place. Compared with grid-based research units, functional zones provide a more accurate and realistic representation of geographic units. We selected the functional zone data for 2018 provided by Professor Gong Peng's team at Tsinghua University as the research units (Gong et al. Citation2020). Each research unit includes information such as land-use type, coordinates, and area. There were 11 types of units, and their descriptions are listed in . We utilized taxi trip OD data for Beijing between November 7 and November 18, 2016. The taxi OD data contained the coordinates of the pick-up and drop-off locations for each trip. To avoid the influence of weekends and holidays on the prediction, we used only taxi trip data from the 10 working days as a stable interaction between different locations.

Table 1. The types of functional zones.

4.1.2. Preprocessing

As the pick-up and drop-off locations of taxis were located on the road network, the first challenge of this study was to match taxi points with functional zones. We assumed that individuals choose pick-up and drop-off points closest to their actual destinations when taking taxi rides. Therefore, we gradually created buffer zones around these pick-up and drop-off points until they uniquely intersected with the surrounding units, as shown in . By employing this methodology, we considered places that only intersected with the buffer zones as genuine origin-destination units for spatial interaction.

Figure 7. Schematic diagram of matching between taxi spots and units.

Figure 7. Schematic diagram of matching between taxi spots and units.

Spatial interactions were represented as the pairs of origin and destination units. Each unit had attributes such as coordinates, pull, push, and land-use type. The interaction intensity between units was defined in this study as the sum of the taxi OD flows. As spatial interactions show steady interactions between two units, we only saved units when interactions occurred 10 times or more. We chose the Euclidean distance between units as the interaction distance. Consequently, we obtained a spatial interaction network with 1,876 units as nodes. As shown in , We separated the interaction intensity into five levels and visualized the spatial interaction using natural breaks. shows a scatter plot of each spatial interaction, providing a visual representation of the relationship between the distance and intensity of each interaction in the spatial interaction network. The horizontal axis represents the interaction distance and the vertical axis represents the interaction intensity. We obtained point clusters with similar intensities and neighboring interaction distances using the K-means clustering algorithm. This clustering method allows for the division of several distance threshold intervals, each of which represents a unique interactive relationship. The spatial interaction dataset was randomly divided into 6:2:2 training, testing, and validation sets. Furthermore, the proportion of each interaction type was the same across all the sets.

Figure 8. Spatial interactive network (the interaction intensity was categorized into five classes using the natural break method for visualization purposes).

Figure 8. Spatial interactive network (the interaction intensity was categorized into five classes using the natural break method for visualization purposes).

Figure 9. Interaction distance-intensity scatter plot (each point represents the interaction between two units).

Figure 9. Interaction distance-intensity scatter plot (each point represents the interaction between two units).

4.2. Results

As illustrated in , as the distance increased, spatial interactions with greater intensities gradually decreased, and interaction aggregation decreased. By setting K to a value between 2 and 6, we used the K-means clustering algorithm to cluster the scatter plot, as shown in . The aggregated distribution features of points of each distance interval were used to analyze the clustering findings. Using (b) as an example, the red dots in the first interval are concentrated at both low and medium interaction intensities, with high interaction intensity values. The green dots in the second interval are clustered only at low interaction intensities, with decreasing interaction intensity values. The orange dots in the third period are relatively sparse and exhibit a low interaction intensity. As a result, each cluster reflects a set of interactions with similar characteristics within that distance interval. And there are considerable variances in the interaction features across adjacent clusters. Based on this, we obtained the distance threshold intervals for two to six relationships, as depicted in . Each distance interval reflects a separate interactive relationship. Consequently, we built five heterogeneous spatial interaction networks with varying numbers of interaction relationships.

Figure 10. Distance interval division based on K-means (from K = 2 to K = 6, each color represents each distance interval).

Figure 10. Distance interval division based on K-means (from K = 2 to K = 6, each color represents each distance interval).

Figure 11. Different distance threshold intervals (in m).

Figure 11. Different distance threshold intervals (in m).

In this study, we repeated 10 random experiments on five heterogeneous interaction networks and calculated the average of the outcomes using the aforementioned accuracy evaluation measures. We used a negative sampling rate of five times the number of positive data, a batch size of 25,000, a single-layer neural network, an AMD R7-5800H processor, a 3050Ti GPU, and 16 GB of RAM.

As shown in , as the number of relationships increases, the RMSE initially decreases and then steadily increases. further shows that when the number of relationships is 4, the RMSE is 8.162, the CPC is 0.852, and the MAPE is 26.2%, indicating that the model performs best. Furthermore, the SCC value is excellent. The RMSE is maximized, SCC and CPC have comparatively low values. When the number of relationships is 2, it suggests the weakest interaction imputation performance.

Figure 12. The imputation results for different numbers of relationships.

Figure 12. The imputation results for different numbers of relationships.

Table 2. Results of different numbers of relationships.

In this study, we specifically analyzed the impact of node attributes on interaction imputation. We considered three types of attributes: Location (2D), Attraction (2D), and Type (1D). Attraction referred to the collective term for pull and push. We combined the three types of node attributes mentioned above to construct a total of seven attribute combinations ranging from 1 dimension to 5 dimensions. The detailed information is shown in . We separately used the MultDis-SIHN model to perform interaction imputation on these seven attribute combinations using the same dataset. As shown in , the best RMSE for interaction imputation with all node attributes is 8.642.

Table 3. Attribute dimension combinations and RMSE of imputation.

To facilitate the analysis of the relationship between the dimension of geographic unit attributes and RMSE, we plotted a waterfall chart. For the same dimension with multiple RMSE values, we calculate their average. As shown in the , as the dimension of geographic unit attributes increases, the RMSE shows a gradual decrease trend.

Figure 13. The effect of node attributes on interaction imputation.

Figure 13. The effect of node attributes on interaction imputation.

We evaluated the performance of the MultDis-SIHN model in interaction imputation against that of the existing gravity model and another graph convolutional spatial interaction imputation model, SI-GCN. For consistency and comparability, identical training, testing, and validation sets were used for each model. The SI-GCN model parameters remain constant from the original study, with a negative sampling rate of 0.25, a batch size of −1, and a neural network with one layer. The SI-GCN model retained all other features of the data unaltered, with only the multi-relational elements being eliminated and the gravity model exclusively retained the pull and push attributes. We chose the MultDis-SIHN model with the optimal accuracy, which had four relationships, and the model with the worst accuracy, which had two relationships. Each model was subjected to 10 repeated experiments, and the average values are shown in . When compared to the GM model, both the SI-GCN and MultDis-SIHN models display considerable improvements in the interaction imputation prediction accuracy. The proposed MultDis-SIHN model outperformed the SI-GCN model, which is also a graph convolutional network model, in terms of imputation accuracy. The decrease in RMSE was from 9.049–8.262 and MAPE decreased from 31.6% to 26.2%. Even the MultDis-SIHN model (k = 2), which had relatively lower performance, showed an improvement compared to the SI-GCN model. The accuracy improved by approximately 5.8%, with RMSE decreasing from 9.049–8.518.

Table 4. Imputation results of different models.

The differences in the number of nodes and edges within the interaction networks can be considered distinct scenarios, similar to the diversity of images caused by noise and degradation in remote sensing (Hong et al. Citation2019). The performance of the model in different scenarios can reflect its generalization ability. Consequently, we conducted tests in two additional study areas: the area inside Beijing Third and Fourth Ring Road. The differences among these three study areas in terms of geographical size and the size of their interaction networks are listed in .

Table 5. Differences in different study areas.

We compared the MultDis-SIHN model with the SI-GCN model for each study area, and the results are presented in . The results indicate that the model outperforms the homogeneous network interaction imputation model, SI-GCN, in different scenarios. Inside the Third Ring Road, this model exhibited the most significant improvement in accuracy, numbering to approximately 7.3%. Moreover, inside the Fourth Ring Road, RMSE maintained a slight lead, accompanied by an improvement in other related metrics.

Table 6. Imputation results of different models in multiple scenarios.

Finally, we investigated the performance of the model under various distance circumstances by partitioning the test set into numerous subgroups depending on the distance relations. In this section, we utilized the K-means to define the distance intervals, resulting in four categories: long distance (> 10004 m), moderate distance (5530—10004 m), medium distance (2450—5530 m), and short distance (< 2450 m).

We compared the imputation accuracy of the trained models, MultDis-SIHN and SI-GCN, separately on test sets with different distance relationships. According to the results shown in , both models have a poorer interaction imputation accuracy for short-distance relationship. As the interaction distance increases, the imputation accuracy gradually increases. The best imputation performance is achieved for the long-distance relationship, which aligns with the findings of other researchers. Furthermore, across all distance intervals, the MultDis-SIHN model consistently outperformed the SI-GCN model in terms of RMSE. In particular, the most significant improvement is observed in the moderate-distance relationship.

Figure 14. Comparison of RMSE across different distance relationships.

Figure 14. Comparison of RMSE across different distance relationships.

5. Discussion

In this section, we present a detailed discussion of the experimental results obtained in the previous section. We further analyze the influence of node attributes and the number of edge types in the interaction network on the interaction imputation. In addition, we discussed the superiority of the MultDis-SIHN model compared to existing models and its applications.

5.1. The impact of the number of interaction relationships

Unlike homogeneous interaction networks, heterogeneous networks partition various edge relationships and consider them for interaction imputation. Based on previous research findings, as the number of partitioned relations increases, the accuracy of interaction imputation first improves and then deteriorates. We infer that as the number of partitioned relationships increases, it becomes possible to capture edge types in more detail. However, the data volume for each specific relationship decreases, leading to insufficient training data for each relationship matrix. Consequently, this highlights the importance of balancing the number of relationships and data sufficiency when designing heterogeneous networks. Therefore, selecting an appropriate number of relationships is essential for constructing a multi-distance spatial interaction heterogeneous network to achieve superior interaction imputation results.

5.2. The impact of node attributes

The inherent attributes of geographic units are crucial for interaction imputation, especially in traditional interaction imputation models. We conducted ablation experiments to study the impact of node attribute dimension on interaction imputation. The experimental results shown in indicate that the interaction imputation performance increases as the node dimension increases. Indeed, the dimension of node attributes is crucial for interaction imputation because higher-dimensional attributes tend to lead to better accuracy in the imputation process. Therefore, incorporating various socioeconomic and other relevant indicators into node attributes can help assess the similarity between nodes more effectively, consequently improving the accuracy of interaction prediction.

5.3. The advantages of multi-relationship interaction networks

The essence of interaction imputation involves comparing the similarity of nodes in a spatial interaction network. The SI-GCN model leverages graph-based principles to obtain node vectors, taking into account the neighborhood structure, resulting in superior imputation performance compared to traditional models. Due to the interaction heterogeneity caused by distance decay, spatial interaction networks are heterogeneous networks with multiple distance relationships. Therefore, the interaction relationships between nodes are non-homogeneous which are crucial factors in modeling neighborhoods. The MultDis-SIHN model, based on the corresponding relationships between nodes to aggregate neighboring nodes and incorporating the differences in edge types, achieves a finer measurement of node similarity, resulting in superior imputation performance. As shown in , the results indicate that our model consistently outperforms the other model under various scenarios.

This study confirms the importance of multi-relationships in interaction imputation. The types of spatial interactions are diverse, and consequently, there are various approaches to construct multi-relational interaction networks. For instance, interactions related to human activities can be categorized based on the types of travel activities. While those related to regional heterogeneity can be categorized based on their heterogeneity. To further enhance interaction imputation accuracy, it is possible to integrate multiple interaction types to construct a heterogeneous interaction network.

5.4. Applications of the MultDis-SIHN model

The MultDis-SIHN model has several practical applications in everyday life, primarily in the following areas. First, this model can enhance the completeness of interaction networks. Issues such as sensor damage and missing data often lead to incomplete interaction networks. Therefore, this model can be employed to ensure a more comprehensive dataset for further spatial analysis. Second, this study has implications for government planning. Whether it is intra-city or inter-city interactions, historical migration, and interaction data inferred by this model can be used to estimate the level of interaction between two locations within a space. This information aids in determining whether new transportation routes between these locations should be planned. Finally, the model can be employed for anomaly detection. By analyzing the differences between real spatial interactions and the theoretical interaction quantities predicted by the model, it is possible to identify anomalies and investigate their underlying causes. This can provide valuable guidance for government planning and construction.

6. Conclusion and future work

Spatial interaction imputation is a research approach aimed at discovering potential or missing interaction intensities between two locations within an existing stable geographic interaction network. Studies on spatial interaction imputation enable the construction of comprehensive spatial interaction networks, facilitating the exploration of spatial interaction patterns, and supporting applications such as urban planning and transportation design. Spatial interactions exhibit distance decay effects, which necessitate the categorization of interactions into various types. In this study, we proposed the MultDis-SIHN model to consider interaction types for node embedding. This study aimed to achieve a higher level of precision in spatial interaction imputation by incorporating multiple interaction modes formed by distance decay into interaction imputation.

The primary contribution of this paper is that the MultDis-SIHN model is the first to incorporate the heterogeneity of interactions caused by distance decay effects into interaction imputation. This model performs graph embedding based on different distance relationships between nodes, allowing for more fine-grained modeling. Firstly, we utilize clustering algorithms to construct a spatially heterogeneous interaction network with multiple distance relationships based on distance decay. The second part involves a heterogeneous graph feature extractor, which is based on different interaction types in the network, to learn the features of the connected nodes. This process aims to obtain more precise embedding vectors that take into account node features, edge features, and network topology, consequently, improving imputation accuracy. The experiments using Beijing taxi trip data confirmed the MultDis-SIHN model outperforms the existing model in terms of interaction imputation in different scenarios. This confirms that considering interaction relationships for graph embedding can improve imputation accuracy. We explored imputation accuracy under various numbers of distance relationships. The results revealed that as the number of relationships increased, the imputation effectiveness initially improved and eventually decreased. Choosing the right number of relationships and higher attribute dimensions help characterize unit similarities and result in an ideal imputation outcome.

Future research should primarily focus on these three key aspects. First, we will continue to explore the interaction heterogeneity driven by distance decay. This exploration may encompass consideration of the interaction network topology and the subdivision of geographical units into multiple distance-based relationships. Second, multiple forms of edge-type relationships may exist in real spatial interactions. Enhancing interaction imputation can be achieved by incorporating various data sources, such as extracting activity types from social media data to represent interaction edges, enabling the prediction of the intensity of each activity type. This approach will broaden the application scenarios of spatial interaction imputation. Finally, in terms of model development, the integration of multi-layer graph convolutional networks holds significant promise. The community structure within geographic units is an important factor influencing spatial interactions. Using the GCN model, it is possible to model the community structure within each geographical unit and the obtained node vectors are subsequently input into the MultDis-SIHN model for interaction imputation.

Acknowledgement

The authors grateful to all the reviewers and editors for their constructive suggestion that improve the quality of this article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The dataset that support the findings of this study are available with the identifier at the https://doi.org/10.6084/m9.figshare.23884284.v1

Additional information

Funding

This research was supported by the National Nature Science Foundation of China (Grant number No. 42071376); this research was funded by the Open Project Program of the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (grant number 01122220010028).

References

  • Chen, Z., Stefan Mueller Arisona, X. Huang, M. Batty, and G. Schmitt. 2014. “Detecting the Dynamics of Urban Structure Through Spatial Network Analysis.” International Journal of Geographical Information Science 28 (11–12): 2178–2199.
  • Erlander, Sven, and Neil F. Stewart. 1990. The Gravity Model in Transportation Analysis: Theory and Extensions. Vol. 3:Vsp. the Netherlands.
  • Fischer, Manfred M., and Sucharita Gopal. 1994. “Artificial Neural Networks: A new Approach to Modeling Interregional Telecommunication Flows.” Journal of Regional Science 34 (4): 503–527. https://doi.org/10.1111/j.1467-9787.1994.tb00880.x
  • Gao, Song, Yu Liu, Yaoli Wang, and Xiujun Ma. 2013. “Discovering Spatial Interaction Communities from Mobile Phone d ata.” Transactions in GIS 17 (3): 463–481. https://doi.org/10.1111/tgis.12042
  • Gong, Peng, Bin Chen, Xuecao Li, Han Liu, Jie Wang, Yuqi Bai, Jingming Chen, Xi Chen, Lei Fang, and Shuailong Feng. 2020. “Mapping Essential Urban Land use Categories in China (EULUC-China): Preliminary Results for 2018.” Science Bulletin 65 (3): 182–187. https://doi.org/10.1016/j.scib.2019.12.007
  • Haggett, P. 1965. Locational Analysis in Human Geography. London: Edward Arnold.–(1972) Geography–A Modern Synthesis. New York: Harper and Row.
  • Hamerly, Greg, and Charles Elkan. 2003. “Learning the k in k-means.” Advances in neural information processing systems 16: 281–288.
  • Hazelton, M. L. 2008. “Inference for Origin–destination Matrices: Estimation, Prediction and Reconstruction.” Transportation Research Part B 35 (7): 667–676. https://doi.org/10.1016/S0191-2615(00)00009-6
  • Hong, Danfeng, Lianru Gao, Jing Yao, Bing Zhang, Antonio J. Plaza, and Jocelyn Chanussot. 2020. “Graph Convolutional Networks for Hyperspectral Image Classification.” IEEE Transactions on Geoscience Chanussot, and Remote Sensing 59: 5966–5978. https://doi.org/10.1109/TGRS.2020.3015157
  • Hong, Danfeng, Lianru Gao, Naoto Yokoya, Jing Yao, Jocelyn Chanussot, Qian Du, and Bing Zhang. 2020. “More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification.” IEEE Transactions on Geoscience Zhang, and Remote Sensing 59: 4340–4354. https://doi.org/10.1109/TGRS.2020.3016820
  • Hong, D., N. Yokoya, J. Chanussot, and X. X. Zhu. 2019. “An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing.” IEEE Transactions on Image Processing 28 (4): 1923–1938. https://doi.org/10.1109/TIP.2018.2878958.
  • Hou, Fan, Yue Zhang, Xinli Fu, Lele Jiao, and Wen Zheng. 2021. “The Prediction of Multistep Traffic Flow Based on AST-GCN-LSTM.” Journal of Advanced Transportation 2021: 1–10.
  • Hu, Junjie, Yong Gao, Xuechen Wang, and Yu Liu. 2023. “Recognizing Mixed Urban Functions from Human Activities Using Representation Learning Methods.” International Journal of Digital Earth 16 (1): 289–307. https://doi.org/10.1080/17538947.2023.2170482.
  • Huang, Zichao, Bo Li, and Jian Yin. 2018. Knowledge Graph Embedding via Multiplicative Interaction, 2018.
  • Jain, A. K., M. N. Murty, and P. J. Flynn. 1999. “Estimating Origin-Destination Flows Using Mobile Phone Location Data.” ACM Computing Surveys 31 (3): 264–323. https://doi.org/10.1145/331499.331504
  • John, Östh, Johan Lyhagen, and Aura Reggiani. 2016. “A new way of Determining Distance Decay Parameters in Spatial Interaction Models with Application to job Accessibility Analysis in Sweden.” European Journal of Transport and Infrastructure Research 16 (2): 344–362.
  • Kotnis, Bhushan, and Vivi Nastase. 2017. “Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs.” arXiv preprint arXiv:1708.06816.
  • Li, B. 2009. “Markov Models for Bayesian Analysis About Transit Route Origin-Destination Matrices.” Transportation Research Part B 43 (3): 301–310. https://doi.org/10.1016/j.trb.2008.07.001
  • Li, Mingxiao, Song Gao, Feng Lu, Kang Liu, Hengcai Zhang, and Wei Tu. 2021. “Prediction of Human Activity Intensity Using the Interactions in Physical and Social Spaces Through Graph Convolutional Networks.” International Journal of Geographical Information Science 35 (12): 2489–2516. https://doi.org/10.1080/13658816.2021.1912347.
  • Li, Zheng-wei, Jia-shu Li, Zhu-hong You, Ru Nie, Huan Zhao, and Tang-bo Zhong. 2022. “Associations Prediction Algorithm of MiRNAs and Diseases Based on Heterogeneous Graph Attention Network.” ACTA ELECTONICA SINICA 50 (6): 1428.
  • Li, Zhishuai, Gang Xiong, Yuanyuan Chen, Yisheng Lv, Bin Hu, Fenghua Zhu, and Fei-Yue Wang. 2019. A Hybrid Deep Learning Approach with GCN and LSTM for Traffic Flow Prediction, 2019.
  • Li, C., B. Zhang, D. Hong, J. Yao, and J. Chanussot. 2023. “LRR-Net: An Interpretable Deep Unfolding Network for Hyperspectral Anomaly Detection.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–12. https://doi.org/10.1109/TGRS.2023.3279834.
  • Morton, April, Jesse Piburn, and Nicholas Nagle. 2018. Need a boost? a comparison of traditional commuting models with the xgboost model for predicting commuting flows (short paper), 2018.
  • Pourebrahim, Nastaran, Selima Sultana, Amirreza Niakanlahiji, and Jean-Claude Thill. 2019. “Trip Distribution Modeling with Twitter Data.” Computers, Environment and Urban Systems 77: 101354. https://doi.org/10.1016/j.compenvurbsys.2019.101354
  • Roth, Camille, Soong Moon Kang, Michael Batty, and Marc Barthélemy. 2011. “Structure of Urban Movements: Polycentric Activity and Entangled Hierarchical Flows.” PLoS One 6 (1): e15923. https://doi.org/10.1371/journal.pone.0015923
  • Salamat, Amirreza, Xiao Luo, and Ali Jafari. 2021. “HeteroGraphRec: A Heterogeneous Graph-Based Neural Networks for Social Recommendations.” Knowledge-Based Systems 217: 106817. https://doi.org/10.1016/j.knosys.2021.106817
  • Schlichtkrull, M., T. N. Kipf, P. Bloem, Rvd Berg, I. Titov, and M. Welling. 2017. “Modeling Relational Data with Graph Convolutional Networks”.
  • Shao, Hu, Yi Zhang, and Wenwen Li. 2017. “Extraction and Analysis of City's Tourism Districts Based on Social Media Data.” Computers, Environment and Urban Systems 65: 66–78. https://doi.org/10.1016/j.compenvurbsys.2017.04.010
  • Simini, Filippo, Gianni Barlacchi, Massimilano Luca, and Luca Pappalardo. 2021. “A Deep Gravity Model for Mobility Flows Generation.” Nature Communications Pappalardo 12 (1): 6576–6576.
  • Simini, Filippo, Marta C. González, Amos Maritan, and Albert-László Barabási. 2012. “A Universal Model for Mobility and Migration Patterns.” Nature 484 (7392): 96–100. https://doi.org/10.1038/nature10856
  • Stouffer, Samuel A. 1940. “Intervening Opportunities: A Theory Relating to Mobility and Distance’, 1merican Sociological Review.” New York 5 (6): 845–967.
  • Tao, Haiyan, Keli Wang, Li Zhuo, and Xuliang Li. 2019. “Re-examining Urban Region and Inferring Regional Function Based on Spatial–Temporal Interaction.” International Journal of Digital Earth 12 (3): 293–310. https://doi.org/10.1080/17538947.2018.1425490
  • Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (sup1): 234–240. https://doi.org/10.2307/143141
  • Wang, Fahui. 2012. “Measurement, Optimization, and Impact of Health Care Accessibility: A Methodological Review.” Annals of the Association of American Geographers. Association of American Geographers 102 (5): 1104–1112. https://doi.org/10.1080/00045608.2012.657146.
  • Wang, Jiaoe, Huihui Mo, and Fahui Wang. 2014. “Evolution of air Transport Network of China 1930–2012.” Journal of Transport Geography 40: 145–158. https://doi.org/10.1016/j.jtrangeo.2014.02.002
  • Wu, Yongji, Defu Lian, Shuowei Jin, and Enhong Chen. 2019. Graph Convolutional Networks on User Mobility Heterogeneous Graphs for Social Relationship Inference., 2019.
  • Yang, Xi, Wei Wang, Jing-Lun Ma, Yan-Long Qiu, Kai Lu, Dong-Sheng Cao, and Cheng-Kun Wu. 2022. “BioNet: A Large-Scale and Heterogeneous Biological Network Model for Interaction Prediction with Graph Convolution.” Briefings in Bioinformatics 23 (1): bbab491. https://doi.org/10.1093/bib/bbab491
  • Yang, Bishan, Wen Tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. “Embedding Entities and Relations for Learning and Inference in Knowledge Bases”.
  • Yao, X., Y. Gao, D. Zhu, E. Manley, and Y. Liu. 2020. “Spatial Origin-Destination Flow Imputation Using Graph Convolutional Networks.” IEEE Transactions on Intelligent Transportation Systems 99: 1–11.
  • Yi, Disheng, Yusi Liu, Jiahui Qin, and Jing Zhang. 2020. “Identifying Urban Traveling Hotspots Using an Interaction-Based Spatio-Temporal Data Field and Trajectory Data: A Case Study within the Sixth Ring Road of Beijing.” Sustainability 12 (22): 9662. https://doi.org/10.3390/su12229662
  • Yin, Ganmin, Zhou Huang, Yi Bao, Han Wang, Linna Li, Xiaolei Ma, and Yi Zhang. 2023. “ConvGCN-RF: A Hybrid Learning Model for Commuting Flow Prediction Considering Geographical Semantics and Neighborhood Effects.” GeoInformatica 27 (2): 137–157.
  • Yue, Mengxue, Chaogui Kang, Clio Andris, Kun Qin, Yu Liu, and Qingxiang Meng. 2018. “Understanding the Interplay Between bus, Metro, and cab Ridership Dynamics in Shenzhen, China.” Transactions in GIS 22 (3): 855–871. https://doi.org/10.1111/tgis.12340
  • Zhang, Yang, Tao Cheng, Yibin Ren, and Kun Xie. 2020. “A Novel Residual Graph Convolution Deep Learning Model for Short-Term Network-Based Traffic Forecasting.” International Journal of Geographical Information Science 34 (5): 969–995. https://doi.org/10.1080/13658816.2019.1697879
  • Zhao, Ziliang, Shih-Lung Shaw, Yang Xu, Feng Lu, Jie Chen, and Ling Yin. 2016. “Understanding the Bias of Call Detail Records in Human Mobility Research.” International Journal of Geographical Information Science 30 (9): 1738–1762. https://doi.org/10.1080/13658816.2015.1137298
  • Zhao, L., Y. Song, C. Zhang, Y. Liu, and H. Li. 2019. “T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction.” IEEE Transactions on Intelligent Transportation Systems 99: 1–11.
  • Zhen, Feng, Yang Cao, Xiao Qin, and Bo Wang. 2017. “Delineation of an Urban Agglomeration Boundary Based on Sina Weibo Microblog ‘Check-in’ Data: A Case Study of the Yangtze River Delta.” Cities 60: 180–191. https://doi.org/10.1016/j.cities.2016.08.014
  • Zheng, Shuangjia, Jiahua Rao, Ying Song, Jixian Zhang, Xianglu Xiao, Evandro Fei Fang, Yuedong Yang, and Zhangming Niu. 2021. “PharmKG: A Dedicated Knowledge Graph Benchmark for Bomedical Data Mining.” Briefings in Bioinformatics 22 (4): bbaa344. https://doi.org/10.1093/bib/bbaa344
  • Zhou, Tao, Bo Huang, Rongrong Li, Xiaoqian Liu, and Zhihui Huang. 2022. “An Attention-Based Deep Learning Model for Citywide Traffic Flow Forecasting.” International Journal of Digital Earth 15 (1): 323–344. https://doi.org/10.1080/17538947.2022.2028912