328
Views
0
CrossRef citations to date
0
Altmetric
Research Article

TR2RM: an urban road network generation model based on multisource big data

, , , &
Article: 2344596 | Received 04 Dec 2023, Accepted 14 Apr 2024, Published online: 24 Apr 2024

ABSTRACT

Road networks are an important part of transportation infrastructure through which people experience a city. The existing methods of vector map data generation mainly depend on a single data source, e.g. images, trajectories, or existing raster maps, which are limited by information fragmentation due to incomplete data. This study proposes an urban road network extraction framework named trajectory and remote-sensing image to RoadMap (TR2RM) based on deep learning technology by combining high-resolution remote sensing images with big trajectory data; this framework is composed of three components. The first component focuses on feature map generation by fusing remote sensing images with trajectories. The second component is composed of a novel neural network architecture denoted as AD-LinkNet, which is used to identify roads from the fused dataset of the first component. The last component is a postprocessing step that aims to generate the vector map accurately. Taking Rome, Beijing, and Wuhan as examples, we conduct extensive experiments to verify the effectiveness of the TR2RM. The results showed that the correctness of both the topology and geometry of the generated road network based on the TR2RM in Rome, Beijing, and Wuhan was 83.86% and 88.27%, 74.72% and 80.36%, and 73.83% and 77.7%, respectively.

1. Introduction

Road network data play a crucial role in many fields, such as urban planning, intelligent transportation, and navigation (Li et al. Citation2021a). Approaches to generating road network data from a single data source have been widely discussed in recent years. One of the most common approaches for obtaining large-scale road network data is to use remote sensing images as the data source. The technology for identifying roads from remote sensing images involves two stages: ‘obtaining road information using classic machine learning methods’ and ‘identifying roads based on artificial intelligence (AI) technology’. In the first stage, road elements at the pixel level were recognized from images based on classic extraction algorithms, such as digital morphology (Lin et al. Citation2008), tensor voting (Medioni et al. Citation2008), conditional random fields (CRFs) (Wegner, Montoya-Zegarra, and Schindler Citation2013), or templates (Miao et al. Citation2014). These methods face the issues of a long processing cycle and poor generalization capability, which promote the development of the second stage. In the second stage, AI techniques are used to improve the road identification performance. Approaches in this stage can be classified into two types: constructing road network data based on generative artificial intelligence (AI) techniques (Juhász et al. Citation2023) and extracting road elements using deep neural networks (Sun et al. Citation2019a). The former pays much attention to generating nonexistent but realistic images based on models such as CycleGAN (Zhu et al. Citation2017), StyleGAN (Karras, Laine, and Aila Citation2019), or ChatGPT (Zhang et al. Citation2023). In comparison, the second type of AI model focuses on improving the road information extraction performance from the original images; examples include the deep convolutional encoder-decoder network (DCED) (Panboonyuen et al. Citation2017), D-LinkNet (Zhou, Zhang, and Wu Citation2018), HsgNet (Xie et al. Citation2019), and D-CrossLinkNet (Huang et al. Citation2020). Although these efforts have indeed made considerable progress in recent years, the quality of the image itself, caused by overexposure or sheltering by trees or buildings, reduces the accuracy of information extraction and leads to information fragmentation (Liu et al. Citation2018). As shown in a, roads highlighted by yellow lines could be easily missed during identification because of sheltering.

Figure 1. A comparison between remote sensing images and trajectories: (a) road information captured by images and (b) road information recorded by trajectories.

Figure 1. A comparison between remote sensing images and trajectories: (a) road information captured by images and (b) road information recorded by trajectories.

Trajectories collected by thousands of volunteers worldwide (e.g. tracking data shared in the OpenStreetMap (OSM) platform) or taxis and car-hailing companies (e.g. DiDi) are another option for constructing road network data (Karimi and Kasemsuppakorn, Citation2013; Ruan et al. Citation2018; Yang et al. Citation2022a; Zhang et al. Citation2020). Compared with remote sensing images, trajectories have better topological connectivity, which is beneficial for generating vector road network data at different levels, such as the road centerline level (Deng et al. Citation2019; Gao et al. Citation2021) and the traffic lane level (Mahmoud et al. Citation2021). Three kinds of methods have emerged for generating road network data based on trajectories: incremental insertion, clustering, and machine vision-based methods (Ahmed et al. Citation2015; Yang et al. Citation2020). The first obtains road networks by incrementally inserting tracking points into an initially empty map based on map-matching ideas (Gao et al. Citation2021). The second type used a set of tracking points as input and clustered them in different ways to acquire road segments or intersections and construct road network data, e.g. Mahmoud et al. (Citation2021), Deng et al. (Citation2019), and Yang et al. (Citation2018a). The last category extracts road networks by transforming trajectories into discretized images and recognizing road segments via computer vision algorithms, such as road information generation based on the DeepMG (Ruan et al. Citation2020) and road intersection identification via the Mask-RCNN model (Yang et al. Citation2020). Regardless of the specific technique used for road network generation, trajectories, whether collected by taxis, online car-hailing, or other people, still face the issues of unbalanced distributions and noise (Abdollahi et al. Citation2020; Sun et al. Citation2019a), which can lead to missed or misidentified roads.

By analyzing studies of road network data generation using remote sensing images or Global Navigation Satellite System (GNSS) tracking data, issues caused by the data itself always exist. Relying only on extraction techniques is not enough to eliminate their shortcomings entirely. Therefore, how to obtain road information through fostering the strength of multisource data and circumventing their weaknesses has been the focus of research. For instance, Sun et al. (Citation2019b) proposed extracting road information from aerial images first and then applying crowdsourced trajectories to repaint the identified road segments. Zhang et al. (Citation2020) rasterized the GPS trajectory of a floating car into a raster map and used the processed raster map to label the satellite image to obtain the road extraction sample set to reduce the manual labeling cost. Li et al. (Citation2021b) demonstrated the use of a multisource fusion network (MTMSAF) to simultaneously extract roads and detect intersections by combining remote sensing images with GPS tracking data. Qin et al. (Citation2022) proposed an incremental road network updating method that combines GPS trajectory data and UAV remote sensing images. This method uses the hidden Markov model (HMM) to identify problematic road sections and uses deep learning technology to automatically extract road sections from remote sensing images to update problematic road sections. However, previous works failed to address the features that represent road information from trajectories; instead, they simply superimposed them on remote sensing images for road recognition. This may increase the probability of misidentification due to data inconsistencies and the problems of road object fractures and poor connectivity not being well improved.

This study proposes extracting road networks by fusing remote sensing images with big trajectories based on the novel framework TR2RM, which consists of three components. The first component focuses on merging remote sensing images and trajectories into feature maps. In the second component, we design a novel model, AD-LinkNet, to identify road segments using feature maps as the input. The last component is a postprocessing algorithm used to correct the errors of the generated road networks of the second component. We verify the proposed TR2RM by using three kinds of datasets collected in Rome, Beijing, and Wuhan in various scenarios. The main contributions of this study are as follows.

  1. We construct a new framework, TR2RM, to automatically extract road networks by combining remote sensing images with trajectories.

  2. A novel model, AD-LinkNet, is designed based on the original version of LinkNet to increase the suitability of road network generation from the fused dataset.

  3. We present a postprocessing method to solve the problems of small, isolated areas and incorrect topological relations in the generated results.

2. Methodology

The overall structure of TR2RM is shown in . The first component of TR2RM is used to fuse tracking data with the corresponding remote sensing images. Then, we used the fused data to train the newly designed AD-LinkNet model to further identify roads. The third component is applied to recognize isolated small areas or road breakages and correct them to obtain a complete vector road network.

Figure 2. The structure of TR2RM.

Figure 2. The structure of TR2RM.

2.1. Feature extraction and data fusion

In this study, we converted the coordinate systems of remote sensing images, trajectories, and ground truths into WGS84 data to avoid position deviation. Here, ground truths were obtained from the OSM platform based on areas covered by remote sensing images and modified manually against high-resolution remote sensing images from Google Earth to maintain reliability. The crowdsourced big trajectory data are composed of a set of tracking points, and each of them records spatiotemporal information about the moving object, including the longitude, latitude, elevation, time stamps, and speed. To facilitate trajectory data computation by using a deep learning model and eliminate errors between remote sensing images and trajectory data, we rasterized the data based on the pixel size of the corresponding remote sensing image. Specifically, we obtained corresponding remote sensing images from Google Earth based on the area covered by trajectories and the collection time. In , four vertices record the coverage scope of trajectories in the experimental area. The parameters Latmax and Latmin and Lonmax and Lonmin indicate the maximum and minimum values of latitude and longitude, respectively. The total number of pixels in both the horizontal direction and vertical direction in the corresponding remote sensing image is denoted as I and J, respectively. Each pixel can be identified based on its column number in the grid image. Based on the pixel size of the corresponding remote sensing image, we can generate a new blank image. The intervals for both the longitude (denoted as Intervallon) and latitude (denoted as Intervallat) of each pixel in this blank image can be computed based on Equation (1). For a tracking point pi(lati, loni), we can calculate which pixel grid in the blank image it is in based on Equation (2). In Equation (2), parameters x and y represent the column number of the pixel in the grid image. The function ‘Math. round’ is used to round a numerical value.

Figure 3. The conversion of trajectories to raster based on the pixel size of the remote sensing images.

Figure 3. The conversion of trajectories to raster based on the pixel size of the remote sensing images.
(1) Intervallon=(LonmaxLonmin)I,Intervallat=(LatmaxLatmin)J(1) (2) x=Math.round (plonLonmin)Intervallon,y=Math.round (platLatmin)Intervallat(2)

The gray value of each pixel in the blank image is determined by the features of the tracking points falling into the pixel. According to the existing work, the higher the tracking point density is, the more likely it is to be a road (Yang et al. Citation2020). Moreover, Ruan et al. (Citation2020) indicated that the speeds of moving objects on roads or nonroads (e.g. parking lots) were different. Thus, we obtain the gray value of each pixel based on the point density and the speed at which the tracking point falls into it. Moreover, since each tracking point represents the geographic location of the moving object at time stamp ti, the trajectory composed of these tracking points also represents the approximate geometric structure of the road. Therefore, ‘Distance’ is the last feature to participate in gray value computation. The specific steps for obtaining the gray value of each converted pixel of trajectories are illustrated as follows.

  1. Point density. Let the number of trajectory points falling in a grid unit (also called a grid cell or pixel cell) T be n, and we use the threshold value N to confirm the gray value. That is, for N = 1, the gray value of a pixel cell is set to 0 when n is smaller than N. The gray value of a pixel cell is set to 128 if n is equal to N; otherwise, the gray value is set to 255.

  2. Speed. To reduce the influence of outliers, we calculate the standard deviation of the speed of trajectories in a pixel to confirm its gray value. The velocity of each tracking point t is assumed to be denoted as vt. Equation (3) shows the computation of the average speed vT of the tracking points. The parameter n represents the total number of tracking points in T. The standard deviation σT of the current grid cell can be computed based on Equation (4). (3) vT=i=1nvin(3) (4) σT=σ2=i=1N(vivT)2N(4) where N is the number of grid cells surrounding the current grid cell T whose speed is not equal to 0 and vi is the velocity of grid cell i. The current grid cell is considered to be isolated if the speeds of its eight neighbors are all equal to 0. The gray value of the isolated grid cell is set to 0. The gray values of the other grid cells are confirmed based on a trisection strategy. That is, the gray value of a grid cell is set to 0, 128, or 255 when σT > 0.8, 0.5<σT < 0.8, or 0<σT < 0.5, respectively.

  3. Distance. In this study, we use the distance between two adjacent tracking points to estimate the gray value of the current grid cell regardless of whether the points are in the same grid cell to reduce the influence of sparse sampling. The gray value of a grid cell is set to 255 if both adjacent tracking points fall in it. One tracking point is set to 128 if it falls inside the grid cell and its adjacent point falls within the neighbors of the grid cell. In other cases, it is set to 0.

The gray value of each pixel in the conversion of trajectories to raster data includes 0, 128, and 255, representing nonroad pixels, potential road pixels, and road pixels, respectively. The reason why we converted the point density, speed, and distance of trajectories into three levels is that two levels increase the probability of false recognition due to the uncertainty of trajectories. Moreover, four levels increase the computational complexity of the deep learning model from an exponential multiplication of 3–4. Additionally, to reduce the uncertainty caused by noise and outliers, we smoothed the raster map converted by trajectories by counting pixel types in an 8-neighborhood window. Specifically, for a pixel, if its original type is a ‘road pixel’ or ‘potential road pixel’, then it will be referred to as a ‘nonroad pixel’ when the total number of pixels without road information in its 8-neighborhood field is greater than H. In contrast, if the original type of a pixel is a ‘nonroad pixel’, then it will be referred to as a ‘potential road pixel’ when the total number of pixels with road information in its 8-neighborhood field is greater than H. Based on this smoothing strategy, we can obtain a three-dimensional vector in each grid cell and express it as XtrajRI×J×3. Moreover, the remote sensing images are expressed as XrsRI×J×3. As shown in in the Appendix, we connect three types of feature maps generated from trajectories with the corresponding remote sensing images based on the channel latitude to ensure that the size of the input dataset of the subsequent deep learning model is not changed (see Eq. 5). (5) XmergeRI×J×6=concatenate(XtrajRI×J×3,XrsRI×J×3)(5) whereXmergeRI×J×6 is a new feature map obtained by integrating the raster grids converted by trajectories with remote sensing images.

2.2. Road detection based on a newly designed AD-LinkNet model

shows the newly designed AD-LinkNet model obtained by introducing a new AD-block to replace the middle layer cascaded by the multilayer dilated convolution of D-LinkNet. We use a 1 × 1 convolution to reduce the dimension of feature maps from a 6-dimensional channel to a 3-dimensional channel in the beginning part of AD-LinkNet, highlighted by the red dotted box in , to ensure that the features of two different datasets can be fully integrated.

Figure 4. The structure of the AD-LinkNet.

Figure 4. The structure of the AD-LinkNet.

The newly designed AD-Block in the AD-LinkNet model is used to overcome two limitations of D-LinkNet. First, the dilated convolution structure of the D-LinkNet model causes a serious gridding phenomenon during training and prevents many pixels from participating in the convolution operation (Wang et al. Citation2018). Second, D-LinkNet obtains only local features and ignores the rich context information of images, even though it increases the receptive field of the convolution kernel. We embed the newly designed AD-Block in the four Res-blocks to expand the diversity of the receptive fields in each stage and further increase the semantic accuracy at the pixel level. Moreover, the AD-Block can fuse the features of the decoder component with the features of the following component to avoid bottom feature and high-level feature loss and improve the quality of stored images of detailed road features by preserving global information. shows that the newly designed AD-Block is composed of three criss-cross attention (CCA) modules and a hybrid dilated convolution (HDC) module with dilation rates of 1, 2, and 5.

Figure 5. The structure of the AD-Block.

Figure 5. The structure of the AD-Block.

The CCA module (Huang et al. Citation2019) embedded in the AD-Block is used to enhance the ability of global information learning by AD-LinkNet without increasing the amount of calculation and memory usage by considering its advantages in context information collection in both the horizontal and vertical directions. The steps of the CCA module for collecting the pixel-level context information from the horizontal and vertical directions of the feature map are as follows. Taking a downsampled feature map XRC×H×W as the input data, two 1 × 1 convolution kernels are used to process the data. After the convolution calculation, we obtain two new feature maps {Q,K}RC×H×Wwhere C indicates the number of channels. Since Q and K are essentially the same feature map, we can obtain a set ΩuRC×(H+W1) by gathering a certain location u in K and its feature points in the same row and column. The parameter Ωi,uRC represents the i-th element in Ωu. To obtain the correlation between points in the set ΩuRC×(H+W1), the similarity computation shown in Equation (6) is used. (6) Scorei,u=QuΩi,uT(6) where Scorei,u is the correlation score of point u and its feature points in the same row and column. Then, the channel compression method is applied to obtain the required attention map ARC×H×W. That is, we convolve XRC×H×Wwith a 1 × 1 convolution kernel to obtain a new feature map VRC×H×W. Based on the location u in the set of V and its feature points in the same column, we can obtain a set ΦuRC×(H+W1). We use an aggregation operation to further obtain the feature map weighted according to positional relationships, as shown in Equation (7). (7) Xu=Ai,uΦi,u+Xu(7) where Xu is the feature vector of the output feature map XRC×H×W at position u and Ai,u is the scalar value of the attention map A at channel i and position u. To further obtain the contextual information from all the pixels, we connected several CCA modules in series and embedded them in the AD-Block. The number of CCA modules embedded in the AD-Block is denoted as n.

We add the HDC module to the AD-Block to improve the receptive field and avoid the occurrence of the grid phenomenon (Wang et al. Citation2018). The dilation rate of the HDC module in the AD-Block is set to 1, 2, or 5 based on the standardized extended convolution block principle of HDC proposed in Wang et al. (Citation2018) (see ).

Figure 6. The effect diagram of HDC.

Figure 6. The effect diagram of HDC.

In the HDC module, the equivalent size of the ordinary convolution kernel after the expansion operation can be computed based on Equation (8). (8) k=d×(k1)+1(8) where k is the size of the input convolution kernel, d represents the expansion coefficient and can be artificially set as a hyperparameter, and kis the equivalent size of the ordinary convolution kernel after the expansion operation. According to Equation (8), the receptive field of the HDC module is equivalent to the receptive field of standard convolution kernel sizes of 3, 5, and 11. In , the receptive field of the HDC module covers the whole area, eliminating the gridding effect and obtaining more complete contextual information. After the expansion operation, the size of the feature map outputted from the HDC module can be calculated according to Equation (9). (9) o=i+2pks+1(9) where i is the size of the input feature map, o is the size of the output feature map, p is the size of the pooling operation, s represents the step size, and indicates a rounding-down operation. It should be noted that the convolution kernel size of the expansion convolution in this study is set to 3 × 3 to ensure that the sizes of the input feature map and output feature map are equal when the values of s and p are equal to 1 and d, respectively. In addition, by analyzing the similarities between ResNet with skip connections and the dilated convolution block, we add the skip connection to the HDC module to enhance and supplement the original features. That is, we connect the output feature map of the HDC module with the original input feature map to strengthen it. Then, we apply a 1 × 1 convolution to fuse their features and further reduce the computational overhead in the subsequent operations.

2.3. Postprocessing

At present, CNN-related methods mainly focus on object identification and cannot solve the issues of road breakage and small isolated areas caused by road feature loss or poor-quality images during road information extraction (Li et al. Citation2021c). In this study, we propose a postprocessing method to address the problems of fracture of detected road objects, poor connectivity, and small isolated areas generated after semantic segmentation based on AD-LinkNet. The proposed postprocessing procedure includes four steps: (1) isolated small area removal, (2) breakpoint detection, (3) breakpoint reconnection, and (4) road network refinement. The execution details for each step are described below.

  1. Isolated area removal. We removed isolated areas from the outputted images of AD-LinkNet based on the principle of road connectivity. That is, a road segment should connect with one or more road segments in the real world. The isolated small areas are usually incorrectly detected and need to be removed. Therefore, we extract the contour information of each independent road segment in the image and set it as a vector set V, where each element retains a set vector v of points constructed by continuous edge points; see Equation (10). Each point set is an independent connected domain outline. We calculate the area of each connected domain based on the obtained contour information (see Eq. 11). Then, the connected domains are regarded as the isolated area and removed if the area is smaller than the threshold δ. (10) V={v1,v2,,vn},vi={p1,p2,,pm}(10) (11) areai=contourArea(vi)(11)

  2. Breakpoint detection. The output dataset of AD-LinkNet is composed of binary images that contain only two kinds of objects, namely, road points and nonroad points. Based on the observations that the gray value of a point will extend around itself if it is not a breakpoint; otherwise, its gray value will change substantially compared with its surrounding points. According to this observation, we detect the breakpoint from the output result of AD-LinkNet by the following steps. First, we use the Zhang – Suen skeleton extraction algorithm (Zhang and Suen Citation1984) to extract skeleton lines from road areas for subsequent topology reconstruction and breakpoint connection operations. As shown in , each road area is regarded as an independent domain and denoted A, B, C, D, and E (see a). By using the Zhang – Suen skeleton algorithm, we obtain the refined road skeleton line, as shown in b. Then, the Shi-Tomasi Corner detector (Shi Citation1994) is applied to identify potential breakpoints, which can also be called corner points.

Figure 7. Breakpoint detection (a) Output results of AD-LinkNet (b) road skeleton line (c) corner point and (d) breakpoint.

Figure 7. Breakpoint detection (a) Output results of AD-LinkNet (b) road skeleton line (c) corner point and (d) breakpoint.

Given a point (x,y), its pixel value is denoted as I(x,y). w(x,y) represents the corresponding window function, and its value is set to 1. Let this point move along the X- and Y-axes with tiny increments of u and v, respectively. The change in pixel gray value obtained by moving the window in all directions can be calculated based on Equation (12). (12) E(u,v)=(x,y)w(x,y)×[I(x+u,y+v)I(x,y)]2(12) Based on the computation shown in Equation (12), the point (x, y) is regarded as a corner point if the value of E(u,v) is very large. In addition, we use Taylor expansion to simplify the computation of Equation (12); see Equations (13) and (14). (13) E(u,v)[u,v]M[uv](13) (14) M=(x,y)w(x,y)[Ix2IxIyIxIyIy2]R1[λ100λ2]R(14) where M is a real symmetric matrix, and its eigenvalues λ1 and λ2 can be extracted after diagonalization. Based on these computations, a corner detection function RF is obtained to find the corner points; see Equation (15). (15) RF=min(λ1,λ2)(15) As shown in c, the corner points of each skeleton line can be detected based on the above computation. Since not all corner points belong to breakpoints, we use an eight-neighborhood method to further confirm breakpoints. A corner point p is considered an isolated point and is abandoned when there is no road point in its neighborhood δ(p). A road point is confirmed if there are only two road points in δ(p). For other situations, road point p is considered a bifurcation point.

  1. Breakpoint reconnection. To ensure the completeness of the road network in terms of topology, we need to connect the detected breakpoints. The reconnection rules of breakpoints in this study include two points. First, the closer two nodes are, the more similar the two nodes are, and the more likely they are to belong to the same road. Second, the change in the direction trend between the current road point and its adjacent road points is not very large if they are on the same road. Based on these two rules, we design Algorithm 1 for breakpoint reconnections.

Algorithm 1: ConnectBreakPoints

In Algorithm 1, the direction D of the road from breakpoint point p to road point p is calculated. By moving the grid unit r forward in the direction D to search for breakpoints within the buffer of r, the breakpoint q and the breakpoint p will be connected if their Euclidean distance is the lowest among the breakpoints in the buffer. The breakpoint p will be connected to a road point with the lowest Euclidean distance if there is no other breakpoint in the buffer from the start to the end of the search. In addition, breakpoint p is considered a termination point with no connection to other road points if there are only nonroad points in the buffer from the start to the end of the search. The parameter r is a hyperparameter and is discussed in the experiment.

  1. Road segment restoration. We use the region growing algorithm (Adams and Bischof Citation1994) to restore the width of the detected road. The size of the growth area is calculated based on the average width of two connected road sections, as shown in Equation (14). (14) mi,j=12(ViTi+VjTj)(14) where mi,j indicates the size of the growth area, i and j represent the nodes of two road sections that are going to be connected, Vi and Vj are the numbers of pixels in the connected domain where i and j are located before the thinning operation, and Ti and Tj are the numbers of pixels in the connected domain where i and j are located before after the thinning operation. A road network map was constructed by superimposing the image generated with the pixel growth algorithm, and the results were isolated area removal.

3. Experiment

3.1. Datasets and preprocessing

We used three datasets collected in Rome, Beijing, and Wuhan to verify the proposed method. In Rome, trajectories were collected by more than 300 taxis in one month in 2014 and were provided by Ahmed et al. (Citation2015). In Wuhan, trajectories were gathered by 83 pedestrians within a month in 2016 and given by Yang et al. (Citation2020). Based on the collection times and coverage areas of trajectories in these two areas, we obtained corresponding remote-sensing images from Google Earth. Trajectories and images in Beijing were obtained from Sun et al. (Citation2019b); these data were collected from 28,000 taxis within one week and obtained from the Gaode Map in 2018. shows the details about the datasets used in this study.

Table 1. Dataset details.

To increase the certainty of the trajectories, we first removed outliers from the original datasets using the method proposed by Yang et al. (Citation2018b). Then, we filtered the tracking points again based on their speed. That is, the tracking point pi is removed if its velocity from pi−1 to pi is greater than the maximum velocity threshold vmax. Here, the maximum speeds vmax of driving and walking are set to 20 and 2.83`m/s, respectively, according to previous work (Yang et al. Citation2020). The trajectories covered in Beijing, Rome, and Wuhan were converted into grids based on the resolution of the corresponding images and used to construct a feature map based on the method illustrated in Section 2.1. According to existing work (Sun et al. Citation2019b), images of Beijing were clipped into a series of sub-images based on a size of 1024 × 1024. To maintain consistency, datasets, including the corresponding feature maps in Beijing and images and feature maps in Rome, were also clipped to the same size. In Wuhan, since the experimental area was far smaller than that in Rome and Beijing, we set the input size of the images and feature maps in TR2RM to 512 × 512 according to the conventions of image clipping (Sun et al. Citation2019a). The ground truth was used to quantitatively evaluate the generated road network ().

3.2. Experimental setting and evaluation indicators

Codes of the proposed TR2RM in this study were written using Python 3.6, and PyCharm2022.3.7 was used as an integrated development environment. The newly designed AD-LinkNet in TR2RM was developed based on PyTorch 1.4.0 and trained on a server equipped with two NVIDIA GeForce RTX 2080Ti graphics cards. The Room dataset contains 2160 images, the Beijing dataset contains 348 images, and the Wuhan dataset contains 132 images. The training, validation, and test sets were divided at a 6:2:2 ratio for these datasets in this study (Huang et al. Citation2020). The specific samples in each category during model practice were automatically assigned from the dataset based on their percentage. The parameters, including the initial learning rate, momentum, and weight attenuation coefficient, were set to 0.001, 0.95, and 0.0004, respectively. Moreover, the loss function of the proposed AD-LinkNet applies a weighted combination of Binary Cross Entropy (BCE) and Dice loss (see Equation 15). (15) L=λLBCE+(1λ)LDice(15) where λ is a hyperparameter used to control the proportion of two different loss functions. In the experiment, this value was set to 0.5. The batch size for model training and testing was 4. The number of training rounds on each dataset is 200, thus achieving the best performance of the model on two GPUs and SyncBN. Additionally, we applied random scaling and flipping during model training and testing. The specific steps include horizontal flip, vertical flip, rotation, random operation selection, and random noise operation. Specifically, images in the original dataset were horizontally flipped with a probability of 50% first. Then, images obtained in the previous step and from the original dataset were vertically flipped with a probability of 50% and processed with a clockwise rotation of 180 degrees with a probability of 50%. The next step involved translating and scaling the data to a probability of 20%. The last step involved adding a certain amount of noise to all the images with a probability of 20%. By using these steps, we enhanced the learning ability of the neural network model, which avoids the risk of overfitting and makes the model more robust and generalizable.

The value of parameter r during postprocessing was computed based on the width of the road, as shown in Equation (16). (16) r=RwidthI(16) where Rwidth indicates the width of the road and parameter I denotes the area represented by a single grid cell. According to road construction standards in China, the widths of urban trunk roads and roads or streets in residential areas range from 30 to 40 m and from 7 to 9 m, respectively. In this study, most of the roads in the experimental area in Rome and Beijing were urban trunk roads. In Wuhan, the type of roads included roads or streets in a residential area. Therefore, the values of r in Rome, Beijing, and Wuhan were set to 80, 80, and 20, respectively.

The performance of the proposed AD-LinkNet in TR2RM was estimated based on the following indicators: accuracy, precision, recall, overall and individual mIoU, and F1-score. The methods used to compute these indicators were the same as those used in existing studies (Yang et al. Citation2022a). Moreover, the correctness of the geometry and topology is used to estimate the performance of the TR2RM. The calculation steps for geometric correctness and topological correctness are illustrated in the Appendix based on existing studies (Kasemsuppakorn and Karimi Citation2013). According to the urban road construction standard of China, the buffer distance used to match the road with the benchmark is set to 10 m.

3.3. Result evaluation and comparison

3.3.1. Estimation of road generation using a single dataset and fused datasets

To improve this limitation, by using a single dataset to extract road networks, we propose generating road networks by fusing images with big trajectory data based on the designed AD-LinkNet model. To verify the performance of road extraction by using the fused datasets, we used three kinds of datasets to estimate the generation results of the road network. These three datasets included a trajectory dataset, remote sensing images, and a combined dataset of trajectories and images from the cities of Rome, Beijing, and Wuhan. These three different datasets are denoted as Traj, RS, and RS + Traj, respectively. shows the estimation results of road generation by using these three datasets. Here, the best result of each evaluating indicator is displayed in bold.

Table 2. Road network generation results from different data sources.

Based on the statistics shown in , we can find that the performance of road identification in the cities of Rome, Beijing, and Wuhan using RS + Traj is better than that of using a single dataset such as Traj or RS. Specifically, the average values of accuracy, precision, recall, mIoU, and F1 score for road network generation by using the RS + Traj dataset in Rome, Beijing, and Wuhan were approximately 93.71%, 69.44%, 86.74%, 62.84%, and 77.13%, respectively. In the city of Rome, the mIoU and F1 score of the generated road networks based on the fused dataset increased by 2.65% and 1.93%, respectively, compared with those of the single-dataset RS. Similarly, the mIoU and F1 score of road identification also increase by nearly 3% and 2%, respectively, when using the fused dataset in Beijing compared with when using only the RS dataset. Moreover, the estimation results shown in also indicate that the road extraction performance based on the fused dataset is obviously better than that based only on the Traj dataset in the cities of Rome and Beijing. In Wuhan, the mIoU and F1 score of road identification based on the RS + Traj dataset increase by 5.51% and 4.55%, respectively, compared with those of the test based on the Traj dataset. Interestingly, we also found that the performance of the RS model was superior in all the cases except for Wuhan, where the accuracy, precision, recall, mIoU, and F1 score were significantly lower. Moreover, the statistics illustrated that the road extraction performance in Wuhan using the Traj dataset was better than that using the RS dataset in both Beijing and Rome. This is a result of the differences in data quality and completeness in these three regions. For the RS dataset, manual inspection revealed that the issue of road sheltering by trees in images in Wuhan was more serious than that in images from the other two regions (see a). This leads to lower values of all elevation indicators of road identification using RS in Wuhan, e.g. accuracy and precision, than those of Beijing and Rome. In addition, the imbalance and incomplete coverage of trajectories in Beijing and Rome are more serious than those in Wuhan, which results in roads with fewer tracking points or even no misclassified or missed tracking points.

Figure 8. Visualization of road network generation results by using different datasets from the cities of Rome Beijing and Wuhan; (a) RS dataset (b) Traj dataset (c) ground truth (d) road network generation using the RS dataset (d) road network generation using the Traj dataset and (f) road network generation using the RS + Traj dataset.

Figure 8. Visualization of road network generation results by using different datasets from the cities of Rome Beijing and Wuhan; (a) RS dataset (b) Traj dataset (c) ground truth (d) road network generation using the RS dataset (d) road network generation using the Traj dataset and (f) road network generation using the RS + Traj dataset.

shows several typical identification examples from different datasets in Beijing, Rome, and Wuhan. As shown in d, some road segments in Rome are misidentified using RS due to sheltering. This situation is also obvious in Beijing and Wuhan. In Wuhan, trees and buildings sheltered most of the roads in the experimental area cannot be identified based only on remote sensing images, as shown in the last two rows of the images in . Compared with road generation results using RS, although road extraction using the Traj dataset can avoid losing some road segments, there is still the problem of incomplete information, especially in Beijing and Wuhan. In e, we can see that some road segments are missed because there are only a small amount of tracking data or even no tracking data. By using the fused datasets, the errors shown in d and e are improved; see f.

3.3.2. Performance estimation for AD-LinkNet

The proposed AD-LinkNet is compared with 7 other typical models, namely, UNet (Ronneberger, Fischer, and Brox Citation2015), SegNet (Badrinarayanan, Kendall, and Cipolla Citation2017), DenseNet (Huang et al. Citation2017), Deeplabv3+ (Chen et al. Citation2017a, Citation2017b), LinkNet (Chaurasia and Culurciello Citation2017), D-LinkNet (Zhou, Zhang, and Wu Citation2018), and NL-LinkNet (Wang, Seo, and Jeon Citation2021), to evaluate its performance on road segmentation based on the Traj + RS datasets in the cities of Rome, Beijing, and Wuhan. It should be noted that we did not add an ablation test because the only difference between AD-LinkNet and D-LinkNet was that the former had a new module, AD-Block. Therefore, the comparison between AD-LinkNet and D-LinkNet is equivalent to an ablation test. lists the quantitative results of each model by using the dataset collected in the city of Rome. In , the proposed AD-LinkNet achieves the best performance on the Rome dataset when compared with other models, in which the mIoU and F1 scores reach 66.89% and 80.16%, respectively. Moreover, the mIoU and F1 score obtained by using AD-LinkNet increased by nearly 2% and 1%, respectively, compared with those obtained by using NL-LinkNet.

Table 3. Comparison of segmentation performance of commonly used models on the Roman dataset.

We also conducted the same comparison experiment by using the Beijing dataset. According to the results shown in , the mIoU and F1 score for road network generation based on AD-LinkNet increased by 1.62% and 1.22%, respectively, compared with those of the best performing baseline.

Table 4. Comparison of segmentation performance of commonly used models on the Beijing dataset.

In contrast to those in the first two datasets, the trajectories contained in the Wuhan dataset were collected by pedestrians and exhibited random and irregular characteristics (Yang et al. Citation2022b). The statistics shown in indicate that both the mIoU and F1 score of road network generation based on AD-LinkNet increase by nearly 2% compared with those of NL-LinkNet. Compared with those of D-LinkNet, the mIoU and F1 values increased by 5.5% and 4.53%, respectively. As shown in , there is an approximately 8% difference between our study and the UNet model. By analyzing the identification results, we found that the Unet model recognized more nonroad pixels as road points. This will increase the values of false-positives (FPs) and true positives (TPs) while decreasing the values of true negatives (TNs) and false-negatives (FNs). In comparison, AD-LinkNet focused on improving the precision, mIoU, and F1 score, even if some road pixels were missing. Moreover, the results also indicate that the performance of AD-LinkNet on the Wuhan dataset is not as good as that on the first two regions. This may be because the randomness and irregularity of vehicle trajectories are much lower than those of pedestrian trajectories due to the limitations of traffic rules and relatively straight motor roads. Thus, future work should focus on how to detect multiple types of roads in urban areas using fused datasets.

Table 5. Comparison of segmentation performance of commonly used models on the Wuhan dataset.

3.3.3. Estimation for the postprocessing of TR2RM

Postprocessing is the second component of TR2RM and is used mainly to modify the errors caused by road segmentation via the AD-LinkNet model, such as broken road connection points, blurred road edges, and misidentified areas. To estimate its effectiveness, comparative performance tests were conducted using the fused datasets from Rome, Beijing, and Wuhan. shows the results of road generation with and without the postprocessing method.

Table 6. Performance of postprocessing algorithms on various datasets.

In , the mIoU and F1 score of road network generation after postprocessing in Rome, Beijing, and Wuhan increased by (0.33%, 0.24%), (0.11%, 0.1%) and (0.28%, 0.23%), respectively. Moreover, the values of geometrical correctness and topological correctness of road extraction after postprocessing in Rome, Beijing, and Wuhan also increased by (4.09%, 2.11%), (3.16%, 2.56%), and (4.46%, 2.8%), respectively. By comparison, we find that the improvement in the correctness of both the geometry and topology after postprocessing is more significant than that in the mIoU and F1 score. This is because postprocessing improves the result by removing several pixels identified as isolated small areas or adding several pixels to modify the breakpoints and blurred road edges; see .

Figure 9. Road network extraction results after postprocessing; (a) (b) (c) and (d) show cases of breakpoint reconnection; (d) isolated small area removal.

Figure 9. Road network extraction results after postprocessing; (a) (b) (c) and (d) show cases of breakpoint reconnection; (d) isolated small area removal.

In , the first four columns and the last column mainly indicate the comparisons of breakpoint reconnection and isolated small area removal using the postprocessing method, respectively. The comparison of the results in a, c, and d indicates that the postprocessing technique can connect the breakpoints very well, whether there are breakpoints near two parallel roads (see a and d) or on intersection roads (see c). In addition, the postprocessing method can also connect the breakpoint on the zigzag road, as shown in b. The effectiveness of road segment restoration and isolated area removal using the postprocessing method is also verified in c and e, respectively. These modifications are highly important for accurately estimating the geometry and topology of generated road vector maps. However, the number of pixels modified during postprocessing is limited, thereby leading to improvements in the mIoU and F1 score that are not obvious.

3.3.4. Visualization of the whole experimental area

This paper provides a panoramic view of road network generation in Rome, Beijing, and Wuhan. The pink mask in the picture shows the road network generated by the framework proposed in this paper, and the white mask overlayed by the pink mask is the ground truth. From the three panoramic views, the road network generated by the framework proposed in this paper is only slightly different in the nuances of roads ().

Figure 10. Panoramic view of road network generation in Rome Beijing and Wuhan: (a) road network in Rome (b) road network in Beijing and (c) road network in Wuhan.

Figure 10. Panoramic view of road network generation in Rome Beijing and Wuhan: (a) road network in Rome (b) road network in Beijing and (c) road network in Wuhan.

4. Conclusion

We propose a novel framework, TR2RM, to automatically extract road networks by integrating high-resolution remote sensing images with big trajectory data. There are three components in TR2RM, namely, feature extraction and fusion, proposed AD-LinkNet model training, and road network postprocessing. Taking Rome, Beijing, and Wuhan as examples, we carried out experiments to estimate the performance of the TR2RM. The results show that the average values of accuracy, precision, recall, mIoU, and F1 score of road network generation using the RS + Traj dataset for Rome, Beijing, and Wuhan were approximately 93.71%, 69.44%, 86.74%, 62.84%, and 77.13%, respectively, which are generally higher than those obtained using the RS and Traj datasets. We compared the proposed AD-LinkNet with 7 other typical models. According to the statistics, compared with those of the best-performing baselines, the mIoU and F1 score of AD-LinkNet on the basis of the fused datasets from Beijing, Rome, and Wuhan increased by nearly (2% and 1%), (2% and 1%), and (2% and 2%), respectively. Additionally, we evaluated the correctness of the generated road networks in three experimental regions from the aspects of geometry and topology before and after postprocessing. The results showed that the correctness of both the geometry and topology in Rome, Beijing, and Wuhan increased by (4.09%, 2.11%), (3.16%, 2.56%), and (4.46%, 2.8%), respectively.

This study has several limitations. First, the experimental area used for validation was still too small because of the limited access to data from both public trajectories and high-resolution remote sensing images. Then, the proposed method extracts road networks at the centerline level. Therefore, one direction of future work should focus on detailed road information detection from fused images and trajectories, such as traffic lanes, turnings, and road infrastructures, and extending the experimental area. Moreover, the implications of merging two datasets with differing temporal information should be explored in future work, such as road change detection, despite our efforts to maintain temporal consistency during data collection. In addition, the postprocessing of TR2RMs should be improved from the perspectives of threshold confirmation and optimization strategy establishment with additional road features in the future.

Supplemental material

Supplemental Material

Download MS Word (14.6 MB)

Acknowledgments

The authors would like to sincerely thank the anonymous reviewers for their constructive comments and valuable suggestions to improve the quality of this article. The authors sincerely thank all the organizations and scholars who provided the data and technical support for this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data and code that support the findings of this study are openly available at https://figshare.com/s/1779f7508e2c5974f792.

Additional information

Funding

This work was jointly supported by the National Natural Science Foundation of China [grant number: 42271449].

References

  • Abdollahi, A., B. Pradhan, N. Shukla, S. Chakraborty, and A. Alamri. 2020. “Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-of-the-Art Review.” Remote Sensing 12 (9): 1444. https://doi.org/10.3390/rs12091444.
  • Adams, R., and L. Bischof. 1994. “Seeded Region Growing.” IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (06): 641–647. https://doi.org/10.1109/34.295913.
  • Ahmed, M., S. Karagiorgou, D. Pfoser, and C. Wenk. 2015. “A Comparison and Evaluation of Map Construction Algorithms Using Vehicle Tracking Data.” GeoInformatica 19: 601–632. https://doi.org/10.1007/s10707-014-0222-6.
  • Badrinarayanan, V., A. Kendall, and R. Cipolla. 2017. “Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12): 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.
  • Chaurasia, A., and E. Culurciello. 2017. “Linknet: Exploiting Encoder Representations for Efficient Semantic Segmentation.” 2017 IEEE Visual Communications and Image Processing (VCIP), 1–4. https://doi.org/10.1109/VCIP.2017.8305148.
  • Chen, L.-C., G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2017a. “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFS.” IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4): 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.
  • Chen, L.-C., G. Papandreou, F. Schroff, and H. Adam. 2017b. “Rethinking Atrous Convolution for Semantic Image Segmentation.” ArXiv Preprint ArXiv:1706.05587.
  • Deng, M., J. Huang, Y. Zhang, H. Liu, L. Tang, J. Tang, and X. Yang. 2019. “Generating Urban Road Intersection Models from Low-Frequency GPS Trajectory Data.” International Journal of Geographical Information Science 32 (12): 2337–2361. https://doi.org/10.1080/13658816.2018.1510124.
  • Gao, L., J. Wang, Q. Wang, W. Shi, J. Zheng, H. Gan, S. Lv, H. Qiao. 2021. “Road Extraction Using a Dual Attention Dilated-Linknet Based on Satellite Images and Floating Vehicle Trajectory Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 10428–10438. https://doi.org/10.1109/JSTARS.2021.3116281.
  • Huang, G., Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. “Densely Connected Convolutional Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708. arXiv:1608.06993.
  • Huang, K., J. Shi, G. Zhang, B. Xu, and L. Zheng. 2020. “D-CrossLinkNet for Automatic Road Extraction from Aerial Imagery.” Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, People’s Republic of China, October 16–18, 2020, Proceedings, Part I, 315–327. https://doi.org/10.1007/978-3-030-60633-6_26.
  • Huang, Z., X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu. 2019. “Ccnet: Criss-Cross Attention for Semantic Segmentation.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 603–612.
  • Juhász, L., P. Mooney, H. H. Hochmair, and B. Guan. 2023. “ChatGPT as a Mapping Assistant: A Novel Method to Enrich Maps with Generative AI and Content Derived from Street-Level Photographs.” arXiv preprint arXiv:2306.03204.
  • Karimi, H. A., and P. Kasemsuppakorn. 2013. “Pedestrian Network Map Generation Approaches and Recommendation.” International Journal of Geographical Information Science 27 (5): 947–962. https://doi.org/10.1080/13658816.2012.730148.
  • Karras, T., S. Laine, and T. Aila. 2019. “A Style-Based Generator Architecture for Generative Adversarial Networks.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4401–4410.
  • Kasemsuppakorn, P., and H. A. Karimi. 2013. “A Pedestrian Network Construction Algorithm Based on Multiple GPS Traces.” Transportation Research Part C: Emerging Technologies 26: 285–300. https://doi.org/10.1016/j.trc.2012.09.007.
  • Li, J., Y. Meng, D. Dorjee, X. Wei, Z. Zhang, and W. Zhang. 2021a. “Automatic Road Extraction from Remote Sensing Imagery Using Ensemble Learning and Postprocessing.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 10535–10547. https://doi.org/10.1109/JSTARS.2021.3094673.
  • Li, P., Z. Tian, X. He, M. Qiao, X. Cheng, D. Song, M. Chen, J. Li, T. Zhou, and X. Guo. 2021b. “LR-RoadNet: A Long-Range Context-Aware Neural Network for Road Extraction via High-Resolution Remote Sensing Images.” IET Image Processing 15 (13): 3239–3253. https://doi.org/10.1049/ipr2.12320.
  • Li, Y., L. Xiang, C. Zhang, F. Jiao, and C. Wu. 2021c. “A Guided Deep Learning Approach for Joint Road Extraction and Intersection Detection from RS Images and Taxi Trajectories.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 8008–8018. https://doi.org/10.1109/JSTARS.2021.3102320.
  • Lin, X., J. Zhang, Z. Liu, and J. Shen. 2008. “Semi-Automatic Extraction of Ribbon Roads from High Resolution Remotely Sensed Imagery by T-Shaped Template Matching.” Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Classification of Remote Sensing Images 7147: 168–175. doi: https://doi.org/10.1117/12.813220.
  • Liu, Y., J. Yao, X. Lu, M. Xia, X. Wang, and Y. Liu. 2018. “RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from High-Resolution Remotely Sensed Images.” IEEE Transactions on Geoscience and Remote Sensing 57 (4): 2043–2056. https://doi.org/10.1109/TGRS.2018.2870871.
  • Mahmoud, N., M. Abdel-Aty, Q. Cai, and J. Yuan. 2021. “Predicting Cycle-Level Traffic Movements at Signalized Intersections Using Machine Learning Models.” Transportation Research Part C: Emerging Technologies 124: 102930. https://doi.org/10.1016/j.trc.2020.102930.
  • Medioni, G., C. K. Tang, and M. S. Lee. 2000. “Tensor Voting: Theory and Applications.” In Proceedings of RFIA 2000.
  • Miao, Z., B. Wang, W. Shi, and H. Zhang. 2014. “A Semi-Automatic Method for Road Centerline Extraction from VHR Images.” IEEE Geoscience and Remote Sensing Letters 11 (11): 1856–1860. https://doi.org/10.1109/LGRS.2014.2312000.
  • Panboonyuen, T., P. Vateekul, K. Jitkajornwanich, and S. Lawawirojwong. 2017. “An Enhanced Deep Convolutional Encoder-Decoder Network for Road Segmentation on Aerial Imagery.” International Conference on Computing and Information Technology, 191–201. https://doi.org/10.1007/978-3-319-60663-7_18.
  • Qin, J., W. Yang, T. Wu, B. He, and L. Xiang. 2022. “Incremental Road Network Update Method with Trajectory Data and UAV Remote Sensing Imagery.” ISPRS International Journal of Geo-Information 11 (10): 502. https://doi.org/10.3390/ijgi11100502.
  • Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.
  • Ruan, S., R. Li, J. Bao, T. He, and Y. Zheng. 2018. “Cloudtp: A cloud-based flexible trajectory preprocessing framework.” 2018 IEEE 34th International Conference on Data Engineering (ICDE), 1601–1604. https://doi.org/10.1109/ICDE.2018.00186.
  • Ruan, S., C. Long, J. Bao, C. Li, Z. Yu, R. Li, Y. Liang, T. He, and Y. Zheng. 2020. “Learning to Generate Maps from Trajectories.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (01): 890–897. https://doi.org/10.1609/aaai.v34i01.5435.
  • Shi, J. 1994, June. “Good Features to Track.” In 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 593–600. IEEE. https://doi.org/10.1109/CVPR.1994.323794.
  • Sun, T., Z. Di, P. Che, C. Liu, and Y. Wang. 2019a. “Leveraging Crowdsourced GPS Data for Road Extraction from Aerial Imagery.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7509–7518.
  • Sun, K., Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang. 2019b. High-resolution representations for labeling pixels and regions. ArXiv Preprint ArXiv:1904.04514.
  • Wang, P., P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. 2018. “Understanding convolution for semantic segmentation.” In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 1451–1460. https://doi.org/10.1109/WACV.2018.00163.
  • Wang, Y., J. Seo, and T. Jeon. 2021. “NL-LinkNet: Toward Lighter but More Accurate Road Extraction with Nonlocal Operations.” IEEE Geoscience and Remote Sensing Letters 19: 1–5. https://doi.org/10.1109/LGRS.2021.3050477.
  • Wegner, J. D., J. A. Montoya-Zegarra, and K. Schindler. 2013. “A higher-order CRF model for road network extraction.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1698–1705.
  • Xie, Y., F. Miao, K. Zhou, and J. Peng. 2019. “HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information.” ISPRS International Journal of Geo-Information 8 (12): 571. https://doi.org/10.3390/ijgi8120571.
  • Yang, X., X. Fan, M. Peng, Q. Guan, and L. Tang. 2022a. “Semantic Segmentation for Remote Sensing Images Based on an AD-HRNet Model.” International Journal of Digital Earth 15 (1): 2376–2399. https://doi.org/10.1080/17538947.2022.2159080.
  • Yang, X., K. Stewart, M. Fang, and L. Tang. 2022b. “Attributing Pedestrian Networks with Semantic Information Based on Multi-Source Spatial Data.” International Journal of Geographical Information Science 36 (1): 31–54. https://doi.org/10.1080/13658816.2021.1902530.
  • Yang, X., L. Tang, C. Ren, Y. Chen, Z. Xie, and Q. Li. 2020. “Pedestrian Network Generation Based on Crowdsourced Tracking Data.” International Journal of Geographical Information Science 34 (5): 1051–1074. https://doi.org/10.1080/13658816.2019.1702197.
  • Yang, X., L. Tang, K. Stewart, Z. Dong, X. Zhang, and Q. Li. 2018a. “Automatic Change Detection in Lane-Level Road Networks Using GPS Trajectories.” International Journal of Geographical Information Science 32 (3): 601–621. https://doi.org/10.1080/13658816.2017.1402913.
  • Zhang, J., Q. Hu, J. Li, and M. Ai. 2020. “Learning from GPS Trajectories of Floating Car for CNN-Based Urban Road Extraction with High-Resolution Satellite Imagery.” IEEE Transactions on Geoscience and Remote Sensing 59 (3): 1836–1847. https://doi.org/10.1109/TGRS.2020.3003425.
  • Zhang, T. Y., and C. Y. Suen. 1984. “A Fast Parallel Algorithm for Thinning Digital Patterns.” Communications of the ACM 27 (3): 236–239. https://doi.org/10.1145/357994.358023.
  • Zhang, C., C. Zhang, C. Li, Y. Qiao, S. Zheng, S. K. Dam, M. Zhang, et al. 2023. “One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on Chatgpt in AIGC Era.” arXiv preprint arXiv:2304.06488.
  • Zhou, L., C. Zhang, and M. Wu. 2018. “D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 182–186.
  • Zhu, J. Y., T. Park, P. Isola, and A. A. Efros. 2017. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.” In Proceedings of the IEEE International Conference on Computer Vision, 2223–2232.

Appendix

The calculation method of the correctness both in geometry and topology: the geometric correctness (denoted as δ1) is defined as the percentage of the generated road network compared to the ground truth and calculated by Equation (A1). The topological correctness (denoted as δ2) indicates the percentage of correctly connected intersections in the generated road network compared to the number of ground truth which can be computed based on Equation (A2). (A1) δ1=nm/ng(A1) (A2) δ2=xm/xg(A2) where nm is the length of road segments who can be matched with the ground truth and ng is the length of all generated road segments. The parameters xm and xg represent the number of correct intersections and the total number of all detected intersections, respectively.

The figure of trajectories and remote sensing images fusing:

Figure A1. Feature extraction and fusion.

Figure A1. Feature extraction and fusion.