556
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Large dam candidate region identification from multi-source remote sensing images via a random forest and spatial analysis approach

, , , &
Pages 4212-4228 | Received 26 Apr 2023, Accepted 24 Sep 2023, Published online: 11 Oct 2023

ABSTRACT

The extraction of large dam candidate regions is critical for broad-scale efforts to rapidly detect large-area dams. The framework proposed in this paper attempts to combine random forest classification models and spatial analysis methods with large dam candidate area extraction methods for large-scale areas. First, we studied the combination of optical, microwave, texture, and topographic features of the dam and constructed a multisource remote sensing and topographic feature vector of the dam. Secondly, we constructed random forest classifiers in different study areas and evaluate their performance. Then we explored the geographic characteristics of the dams and their relationships with other features. Finally, we introduced the spatial analysis method to constrain the large dam candidate area. The proposed framework was tested in a total area of 968,533 km2 in five countries and achieved promising results, which constrained the candidate area to less than 1.06% of the total area. We calculated the completeness rate of large dams using the multi-source dam datasets. The framework achieved a completeness rate of more than 97.62%. Our results show that the entire framework is reliable for automated and fast large dam candidate area acquisition based on data from open remote sensing products.

1. Introduction

Given the important role that the hydraulic engineering of dams, especially large dams, has recently had in water supply, irrigation, ecological protection and economic development, dam detection in broad areas has also received much attention in the field of deep learning (Buchanan et al. Citation2022; Fang et al. Citation2019; Jing et al. Citation2021; Jing et al. Citation2022; Lee, Hong, and Kim Citation2021; Suhara et al. Citation2022). Although the importance of dam detection has been recognized, it is still difficult to detect dams in large areas, and the acquisition of large dam candidate areas is one of the important influencing factors. Some scholars have accurately obtained possible dam locations by using high-precision DEM data (Buchanan et al. Citation2022). This approach can effectively extract large dam candidate regions, but it cannot be practically applied to a large area due to the difficulty of data acquisition and the large investment of time and money. Other scholars directly use the existing waterbody raster dataset for dam detection directly along the water's edge, and this approach can effectively generate the large dam candidate region, but the candidate region is not well constrained, which affects the dam detection accuracy (Jing et al. Citation2021; Jing et al. Citation2022). Therefore, it is necessary to develop an automated and faster method to extract large dam candidate regions, obtaining as few dam candidates as possible from a broad area, and thus providing a basis for subsequent dam detection.

The development of traditional machine learning and deep learning methods has made it possible to classify dams in remote sensing images by pixels. Some scholars selected features by multisource data fusion methods and used machine learning algorithms such as random forests, vector machines, and artificial neural networks for remote sensing image classification (Chen et al. Citation2020; Izquierdo-Verdiguier and Zurita-Milla Citation2020; Phiri and Morgenroth Citation2017; Wei et al. Citation2020; Yang, Song, and Li Citation2019). Other scholars used deep learning methods that can automate the construction of advanced features for classification (Bidkar, Kumar, and Ghosh Citation2022; Fan et al. Citation2020; Guo and Dou Citation2021; Kang and Mattyus Citation2015; Le Dréan et al. Citation2021). Although deep learning methods have unique advantages in remote sensing image processing, the need for a large number of samples and better-performing hardware results in deep learning performing more poorly than machine learning (Vali, Comai, and Matteucci Citation2020). The application scenarios of remote sensing image pixel classification mainly include (1) feature construction and land cover classification using remote sensing data (Benhammou et al. Citation2022; Gong et al. Citation2019), (2) identification of buildings, residential areas, and commercial areas (Chen et al. Citation2023; Chen et al. Citation2023; Gao et al. Citation2022), (3) biomass and ecosystem monitoring (Belgiu and Drăguţ Citation2016; Liu et al. Citation2021; Navarro et al. Citation2020), and (4) airport candidate region extraction using optical images and SAR images to construct image features, texture features and structural features of airport runways for pixel segmentation of airport runways, acquiring airport runway candidate areas (Tu et al. Citation2021). As man-made surfaces, dams also have unique optical, microwave, and topographic features that can be used for large dam candidate region extraction based on machine learning methods.

To further constrain the large dam candidate region, the corresponding geographic analysis methods can be combined. There have also been some instructive works by numerous scholars in the spatial analysis of dams, airports and other objects. Although these studies did not focus on large dam candidate area extraction, they are still important references for the selection of dam topographic features and dam spatial analysis methods. Jing et al. (Citation2021) simply used products such as water body data and coastline data to constrain the candidate areas of dams, which can effectively reduce the dam detection time within small areas without considering the optical and microwave characteristics of dams. Buchanan et al. (Citation2022) detected the spatial distribution of dams using high-precision DEM data and other geographic open products, an approach that requires higher accuracy of open data. Li et al. (Citation2021) and Zeng et al. (Citation2019) also utilized the spatial analysis method in airport candidate areas. They used impervious surface data and road network data to constrain the candidate areas of airports, which can effectively reduce the airport candidate areas and improve the accuracy of airport identification from 10% to 67%. In addition, some scholars have performed relevant studies on dam siting and suitability analysis, considering factors such as elevation, slope, slope aspect and river width in the process of site selection (Buchanan et al. Citation2022; Ettazarini Citation2021; Gholami and Khaleghi Citation2021; Li et al. Citation2023; Medeiros et al. Citation2023; Rose, Mugi, and Saleh Citation2023; Weit et al. Citation2023). As hydraulic engineering hubs, dams have a significant impact on topography, navigation and transportation, and water volume and level. We were able to consider relevant elements such as elevation, water bodies and roads in the spatial analysis.

Currently, with the ongoing development of unknown-area dam detection technology, large-area regional dam detection has become possible. However, the study of broad-area large dam candidate regions is still a gap and cannot be applied to dam detection in practice. Accordingly, we propose a large dam candidate region extraction method based on a random forest model and spatial analysis. We apply this method to identify broad-area large dam candidate regions extracted in countries around the South China Sea. We validated the results using dam location from the dam validation dataset. The main contributions of this study can be summarized as follows:

  1. A new framework for dam candidate extraction based on a random forest model and geographic analysis with a high completeness rate is proposed and applied. It can effectively constrain the candidate regions for dam detection, thus providing a basis for subsequent large-scale dam detection.

  2. This work explores remote sensing features and geographic features for dam extraction, integrating random forest and spatial analysis methods into large dam candidate region extraction. The framework has the advantage of achieving simple and high-speed acquisition that can be extended to other regions. This work explores the in-depth application of random forest methods and spatial analysis in the field of large dam candidate area extraction.

2. Materials

The study area included Vietnam, the Philippines, Malaysia, Brunei and Singapore with areas of 331,212 km2, 300,000 km2, 330,803, 5765, and 753 km2, respectively (). Study area selections mainly considered countries surrounding the South China Sea. The South China Sea is an important energy route for China, and monitoring key water hubs is of great significance to national security. We labeled 965 dams with locations as shown in . These dams were derived from public datasets, including Open Street Map (OSM dams), Global Geographical Names Database (GeoNames) and Global Dam Watch (Grand_dams, GOOD2, FAODams) (Lehner et al. Citation2011; Mulligan, van Soesbergen, and Sáenz Citation2020). The location of the dam in the figure is a large dam that has been underwent data fusion and visual validation. In the five study areas, the global waterbody raster dataset used was the dynamic global world land use dataset released by Google (Dynamic World). This dataset has a 10 m resolution and contains 9 land cover types, namely water bodies, numbers, grasses, crops, shrubs, vegetation, buildings, bare ground, and snow and ice (Brown et al. Citation2022). The other raster dataset for the five study areas included the impervious surface raster dataset From-GLC10 from Tsinghua University (Gong et al. Citation2019) and SRTM1 from the America National Aeronautics and Space Administration. In addition, vector datasets, specifically coastline data and dam polygons were derived from OSM ().

Figure 1. Experimental study areas in Vietnam, Philippines, Malaysia, Brunei and Singapore.

Figure 1. Experimental study areas in Vietnam, Philippines, Malaysia, Brunei and Singapore.

Table 1. Data sources. ‘Count’ refers to the amount of data contained in a dataset in five study areas.

3. Methods

In this study, we proposed a large dam candidate region extraction method based on a random forest classifier and spatial analysis method. The research framework for this study is shown in . There were three main steps, consisting of a 46-dimensional remote sensing feature vector constructed by fusing Sentinel-1 SAR image features, Sentinel-2 MSI image features, Landsat 8 OLI image features, and SRTM-1 topographic features. For different study areas, multiple random forest classifiers were constructed, and the classifier accuracy was evaluated. Considering the unique spatial location characteristics of dams, we propose a spatial analysis method that uses water body constraints, marking the important areas and filtering small patches through connectivity analysis. Comparing the dam validation dataset, we statistically measured the completeness of the large dam candidate area and the validity of the constraints. Finally, the advantages of the fused random forest classifier and spatial analysis are discussed.

Figure 2. Framework for dam candidate extraction.

Figure 2. Framework for dam candidate extraction.

3.1. Dam sample labeling

3.1.1. OSM dam length statistics

Previous studies on the classification of dams mainly based on dam height, water storage capacity, material, function, use and location (N. Leroy Poff et al. Citation2002; Martínez-Gomariz et al. Citation2023). Because remote sensing images are two-dimensional, we focused more on the length of the dam. Mao et al. (Citation2022) counted the lengths in the dam database, using one hundred meters as a segmentation point. However, they did not explicitly categorize dams by length. Considering the minimum accuracy of remote sensing data used, we set thirty meters as the segmentation point. We focused on large dams, above sixty meters measured from the top ends of the structure. We assessed the large dam candidate area extraction and present the results.

We used the OSM vector surface dataset to label the dam region. Setting 30 meters as the segmentation point (the lowest spatial resolution remote sensing image used for the experiment), we counted polygon dam lengths in OSM, and the statistics are shown in . At the same time, we performed visual interpretation in Google images and found that dams below 30 meters long were mostly water measuring stations and dam appurtenances. Dams between 30 and 60 meters long were mostly located in areas with less developed water flow or near agricultural fields, as shown in . In our experiments, to maximize the accuracy of the automated samples, we selected the sample of dams above 60 meters.

Figure 3. Histogram of dam length frequencies.

60 150 240 330 420 510 600 690 780 870 960 1050 1140 1230 1320 1410 1500 1590 1680 1770 1860 1950 2040 2130 2220 2310 2400 2490 2580 2670.
Figure 3. Histogram of dam length frequencies.

Figure 4. Dams smaller than 60 meters long.

Figure 4. Dams smaller than 60 meters long.

3.1.2. OSM-assisted dam sample labeling

The main purpose of this paper was to extract the candidate regions of dams, so the samples were divided into dam samples and non-dam samples in the form of vector point files. For the five countries, due to the different levels of economic development and water use, we labeled different sample files for each country. Considering the cost and time issues, the OSM open database was used for auxiliary labeling, which mainly included positive and negative samples.

The data for positive sample labeling of dams were derived from OSM dam polygon files. The dam polygon was divided into more points using point sampling, and the minimum spatial spacing of random sampling was set to 10 meters to avoid repeated occupancy of points.

Non-dam sample points included water, buildings, farmland, forests, mountains and other labels, and sampling methods included point, line and polygon. For individual point data such as buildings, points were directly used. For lines such as water, 10-meter intervals were sampled along the line, and then a number of sampling points were randomly selected. For polygon data such as farmland, the sampling method was the same as with positive samples. The detailed sample labels, numbers, and attribute information are shown in . The number of positive samples of dams within each country was 10,000 points, and to ensure sample balance, the other negative samples had a total of 10,000 points. In the experiment, the ratio of training samples to test samples is 8:2. The Brunei and Singapore regions are small in area, and to avoid under-sampling, Brunei, Singapore, and Malaysia were combined into one same study area.

Table 2. OSM sample classification, attributes and number.

3.2. Large dam candidate area extraction based on a random forest classifier

3.2.1. Analysis of remote sensing features

As artificial structures, dams can be classified as earth and concrete dams according to their materials. The visible light signals recorded in optical remote sensing images can effectively distinguish the artificial ground surface. The geographical location characteristics of dams are very special, and the interfering features are usually water bodies, forests and other natural features. For the unique spatial location characteristics of dams, the dams can be effectively distinguished from other features by researching the synthesis of visible wavelengths and invisible wavelengths.

Based on the existing research, we considered coupling the optical remote sensing features of different sensors in different wavelength bands. In this paper, the optical satellite images used include the multispectral remotely sensed Level-1C image product (Sentinel-2 L1C) from the Copernicus Project Sentinel-2 satellite and the remote sensing image from the Landsat 8 OLI land imager (Landsat 8 OLI), a joint program of NASA and USGS. In this study, the B1, B2, B3, B4, and B5 bands of Sentinel-2 were extracted directly as the base features. The names of the basic features of optical remote sensing and the central wavelength are described in Appendix A. We found more misidentified vegetation and less misidentified bare land in the Japan-based study (Jing et al. Citation2021), therefore, more vegetation indices and soil indices were added as optical remote sensing features, including the LSWI index, NDVI index, GCVI index, BI index and SAVI index (Chen et al. Citation2004; Gitelson et al. Citation2003; Huete Citation1988; Tian et al. Citation2019; Tucker Citation1979; Xiao et al. Citation2005), which together constitute the Sentinel-2 L1C optical remote sensing features, as shown in Appendix B. Landsat 8 OLI uses Collection1 T2 level data. In this study, the cumulative percentages of the LSWI index, NDVI index, GCVI index, and BI index of 25, 50, 75, and 95, respectively, are calculated as optical remote sensing features (Appendix C).

3.2.2. Analysis of SAR image features and terrain characteristics

The process of SAR imaging is more sensitive to the material properties of the target, and brighter or darker light spots appear. Line targets, such as airport runways, rivers and coastlines have a strong response to the polarization mode of the radar system, which can be divided into horizontal and vertical polarizations according to the vibration direction of electromagnetic waves when the radar transmits signals. When the radar beam is perpendicular to the line object, a very bright spot is presented; when the radar beam is parallel to the line object, the echo signal is weaker, and a dark spot is presented. Therefore, the line object may appear as both bright and dark lines on the radar image. Dams mostly have line features in remote sensing images, such as . The Sokobaru Dam in Japan showed dark spots in the VH polarization direction and bright spots in the VV polarization direction in SAR images, so dam features can also be effectively extracted from SAR image features and texture features.

Figure 5. Microwave remote sensing features of Sokobaru Dam in Japan. (a) Sentinel-1 VH polarization image; (b) Sentinel-1VV polarization image.

Figure 5. Microwave remote sensing features of Sokobaru Dam in Japan. (a) Sentinel-1 VH polarization image; (b) Sentinel-1VV polarization image.

As the basic geographic information data of the country, DEM is widely used in urban planning and dam siting. Dams are generally sited with reference to topographic and geological factors, considering the cost, and are generally selected at the outlet of river valleys with dense contours while avoiding seismic areas, karst topography and fault zones. Elevation and its derived characteristics, etc., in the DEM were used as image features of dams (). Appendix D. shows the relevant SAR image features and topographic features.

Figure 6. DEM and slope characteristics of the Three Gorges Dam in China.

Figure 6. DEM and slope characteristics of the Three Gorges Dam in China.

3.2.3. Metrics used to evaluate the results

The metrics used to assess the accuracy of the dam random forest classifier included the overall accuracy (OA) and kappa coefficient. OA is the fraction represented by the number of correctly classified dam and non-dam pixels over the total number of test sample sets, and is calculated as follows: (1) OA=TP+TNTP+FP+TN+FN(1) where TP and TN represent the number of dam pixels and non-dam pixels that are correctly classified, respectively. FP and FN denote the number of non-dam pixels misclassified as dam pixels and the number of dam pixels misclassified as non-dam pixels, respectively. The kappa coefficient is calculated as follows: (2) kappa=OApe1pe(2) (3) pe=(TP+FP)(TP+FN)+(FN+TN)(FP+TN)(TP+FP+TN+FN)2(3)

3.3. Spatial analysis methods

The initial large dam candidate areas obtained by the random forest classifier were a classification of each experimental area as a whole, and integrating geographic environment knowledge in the candidate area acquisition process helped to further reduce the large dam candidate areas. This section describes how the large dam candidate area was further optimized based on a priori knowledge from three aspects: constrained by water, marking key areas, and connectivity by removing small patches.

3.3.1. Constrained by water

The main functions of a dam as a hydraulic engineering structure are flood control, power generation, irrigation, and shipping. Through the interception of the dam, the water level and flow of the water can be regulated. The relationship between dams and water bodies is inseparable, and thus, the optimization of large dam candidate areas can be constrained by water. Using the random forest classifier to obtain the candidate blocks, it was clear that some blocks are not near the water, so water was used to establish a 500-meter buffer, effectively screening the large dam candidate area within the water buffer. The specific implementation is shown in .

Figure 7. Constrained by water.

Figure 7. Constrained by water.

3.3.2. Key area markers

Dams also play an important supporting function in transportation, and we used the road data in the OSM dataset to focus on marking these dams that intersect with the road. Dams are preferably sited in river valley zones or pocket basins to save engineering costs and make full use of the local natural environment. This means that the dam will generally pass through the centerline of the water. Finally, some of the large dam candidate areas were also be marked using open source impervious surface datasets. The specific implementation is shown in .

Figure 8. Key area markers.

Figure 8. Key area markers.

3.3.3. Connectivity by removing small patches

The classified images acquired by random forest will inevitably have small patches, which need to be removed and smoothed in practical applications. The method used in this study to remove small patches is connected domain analysis, which was used to determine each connected region by analyzing the image pixels and recording the number of pixels in the connected domain to remove patches smaller than the threshold value. The domain relationship for pixels used was the 8-neighborhood (). After the acquired images of the large dam candidate area were analyzed for connectivity, the patch areas with less than the threshold value were filtered and counted to further filter the images of the large dam candidate area. The specific implementation is shown in .

Figure 9. Eight-neighborhood and connectivity analysis.

Figure 9. Eight-neighborhood and connectivity analysis.

Figure 10. Connectivity to remove small patches.

Figure 10. Connectivity to remove small patches.

4. Results

4.1. Large dam candidate region extraction results and validation

In the methods section of this paper, a random forest classifier and the spatial analysis method were described. The output results included the large dam candidate region and non-large dam candidate region. This section evaluates the results from qualitative and quantitative perspectives. The integrity rate of the large dam candidate region was referenced to the dam validation dataset, which was derived from the multisource open-access dam dataset, after cleansing, consistency processing and fusing.

4.1.1. Qualitative evaluation

shows the extraction results of dams in eight countries, including Ayanan Dam in Miyazaki Prefecture, Japan, Huangjiang Dam in North Korea, A Dam in the city of On Ate, Philippines, Batang Ai Dam in Sarawak, Malaysia, Changheung Dam in South Korea, Krongbuk Dam in Vietnam, Serita Dam in northeastern Singapore, and Upper Dudong River Dam in Dudong District, Brunei.

Figure 11. Large dam candidate areas: (a) Ayanan Dam in Miyazaki Prefecture, Japan, (b) Huangjiang Dam in North Korea, (c) A Dam in the city of On Ate, Philippines, (d) Batang Ai Dam in Sarawak, Malaysia, (e) Changheung Dam in South Korea, (f) Krongbuk Dam in Vietnam, (g) Serita Dam, northeast of Singapore, and (h) Upper Dudong River Dam, Dudong District, Brunei.

Figure 11. Large dam candidate areas: (a) Ayanan Dam in Miyazaki Prefecture, Japan, (b) Huangjiang Dam in North Korea, (c) A Dam in the city of On Ate, Philippines, (d) Batang Ai Dam in Sarawak, Malaysia, (e) Changheung Dam in South Korea, (f) Krongbuk Dam in Vietnam, (g) Serita Dam, northeast of Singapore, and (h) Upper Dudong River Dam, Dudong District, Brunei.

The results show that the large dam candidate region was successfully extracted, and the dam region can be retained intact. Further, the dam and non-large dam candidate regions were effectively distinguished and the results are better.

4.1.2. Quantitative extraction perspectives

Using the random forest classifier for dam classification, we set the corresponding threshold to 0.5. That means that a random forest classifier with an output probability greater than 0.5 is a large dam candidate region and less than 0.5 is a non-large dam candidate region. We evaluated the pixel-level accuracy of the large dam candidate region extracted by the random forest classifier for five countries. Malaysia, Brunei, and Singapore are counted as the same study area ().

Table 3. Quantitative evaluation of the extraction results of the large dam candidate region.

From the overall results, the extraction accuracy of the dam candidates in the study area is high, and the highest accuracy reaches 98.14% with a kappa coefficient of 0.9627. Among them, the best performer is the Philippines with an accuracy of 98.14%, and the worst performer is Korea with an accuracy of 94.93%.

4.2. Large dam candidate region extraction integrity and constraint

The integrity rate of the large dam candidate region is whether the large dam candidate region can recall the dam points contained in the dam validation dataset. The constraint effectiveness of the candidate region means that the candidate region of the dam should be as small as possible and can satisfy the purpose of candidate region extraction. To analyze the integrity of the large dam candidate region, the dam validation dataset was used as a benchmark in the study, combined with Google Maps high-resolution images for visual interpretation to determine the loss of the dam. The dam validation dataset used here was the dam dataset after data fusion, data cleansing and data consistency processing.

4.2.1. Large dam candidate region extraction integrity

To verify the integrity of the dam point locations in the large dam candidate region extraction results, in this research, the dam open source datasets were derived from the global dam database GDW (Global Dam Watch), GRanD, GOODD, and FAO AQUASTAT, the global geographical names database GeoNames, and the OSM (Open Street Map). To verify the integrity of the dams in the extracted large dam candidate regions, the dam dataset was visually verified. The number of dams in the dam validation dataset and the number of missing dams in the large dam candidate regions were counted as shown in .

Table 4. Statistics of coverage integrity of large dam candidate region.

From the table, it can be seen that the extracted dam candidates had an integrity rate of more than 95%, reaching as high as 99.61% in Vietnam and as low as 97.62% in the Philippines. The overall integrity rate was high. The number of missing dams in the extracted large dam candidate regions did not exceed five. In summary, the proposed dam candidate extraction method can achieve effective extraction of dam candidates with a smaller number of missing dams.

4.2.2. Large dam candidate region extraction constraint

To quantitatively evaluate the effectiveness of constraining the large dam candidate area, we calculated the study area, the area of the dam candidate, the specific gravity of the area of the dam candidate to the study area, and the number of blocks of the dam candidate. The statistical results are shown in , where the study area values were obtained from Wikipedia.

Table 5. Statistics of the extraction results of the large dam candidate region.

As shown in the table, the extracted large dam candidate area was between 0.12% and 2.42% of the total area of the study area, which can effectively constrain the large dam candidate regions. Among them, the Malaysia-Brunei-Singapore study area had the best dam extraction results and the strongest constraint. The large dam candidate area was 404.63 km2, accounting for 0.12% of the total study area, and the number of dam candidate blocks obtained was 107027. The constraint effectiveness of the large dam candidate area in Korea was the worst, with the candidate block being 429421 and the large dam candidate area accounting for 2.42% of the total area of Korea. In summary, the large dam candidate area extraction method proposed in this paper can effectively constrain the dam identification range and facilitate further dam detection.

5. Discussion

5.1. Advantages of integrating random forest classification and spatial analysis methods

To verify the advantages of the random forest classification-spatial analysis method in large dam candidate region extraction we quantitatively evaluated the dam candidates based on a single random forest classifier and the extracted dam candidates fused with the random forest classifier and spatial analysis methods. The quantitative evaluation focused on the integrity and constraint of the dam candidates. The evaluation results are summarized in .

Table 6. Quantitative evaluation of integrating random forest classifier and spatial analysis methods.

As shown in the table, in terms of the integrity rate, after integrating random forest and spatial analysis methods, no significant changes occurred in other regions except for Vietnam and the Philippines, where the integrity rate decreased to 97.62%. This integrity rate is acceptable. In terms of the effectiveness of the constraint, compared to the dam candidate obtained using the random forest classifier alone, the dam candidate obtained has less area using the integrated random forest classification and spatial analysis method, and the candidate region can be reduced by more than half, especially in Vietnam. The large dam candidate region is reduced by 4104.74 km2, which greatly constrains the large dam candidate regions. The effective constraints on the large regional candidate areas greatly reduce errors in identification, making it possible to achieve high-precision detection of large-scale dams in the real world.

5.2. Validity of the large dam candidate regions

After generating the large dam candidate regions, we used the target detection method to find new dams in the candidate regions that were not recorded in the database. In the case of the Philippines, we can detect 42 newly discovered dams in the large dam candidate area after methods such as target detection and comprehensive discrimination are used. As shown in , it is the spatial location distribution of large dams.

Figure 12. Newly discovered dams in the Philippines.

Figure 12. Newly discovered dams in the Philippines.

In obtaining candidate areas for large-scale dams, we have taken into full consideration the characteristics of similar land features such as farmland and bare ground. We effectively minimized 62% of the invalid region in the large dam candidate region extraction stage. The reduction in the number of candidate blocks/points has greatly benefited dam detection. As shown in , in a study area of 300,000 square kilometers, we extracted a total of 163 dam candidate boxes, the recall is 98.4% and the precision is 76.07%. Some misidentification objects are mainly man-made structures such as roads, impoundments and embankments (). It shows that the extraction method of the large dam candidate area proposed in this paper is very effective.

Figure 13. Some false detection boxes in the Philippines.

Figure 13. Some false detection boxes in the Philippines.

Table 7. Distribution of dams in the Philippines.

Although the promising results present in the Philippines, the small dams are overlooked. This is a limitation of our work. On the one hand, there is a much wider variety of small dams, including water measurement stations and dykes. On the other hand, we believe that the experiment requires higher precision multispectral remote sensing data. In the future, we will attempt other novel methods to consider this issue.

5.3. Comparing the performance of different large dam candidate region algorithms

To demonstrate the effectiveness of the proposed method, we compared the large dam candidate area extraction algorithms involved (Jing et al. Citation2021) in study areas. Experimentally, the quantitatively evaluated integrity and constraint of the dam candidates. The evaluation results of two algorithms are shown in .

Table 8. The performance of different large dam candidate region algorithms.

As shown in , the proposed algorithm achieves a higher integrity rate of 99.61% in Vietnam, the same integrity rate in the study area Malaysia, Singapore, Brunei and Philippines. In terms of dam candidate constraint, the candidate blocks/ points are the smallest units for dam detection. Jing et al. (Citation2021) generated the dam candidate points to serve dam detection, which are 2.64 to 4.65 times larger than our dam candidate block. Both of our methods accurately identify candidate dam locations. However, our proposed algorithm can achieve a higher integrity rate and impose stronger constraints for the extraction of dam candidates because we thoroughly consider both optical and geospatial characteristics of large dams.

6. Conclusion

This study was primarily motivated by the ineffectiveness of previous dam candidate extraction efforts and the lack of consideration of remote sensing features, spatial location features, and topographic features in large-area dam candidate extraction studies. Previous studies focused more on dam site selection, airport candidate area extraction and dam target detection. This paper presented an approach for a large dam candidate region extraction method based on optical and radar images and terrain features combined with spatial analysis methods.

The proposed approach includes three general steps: (1) A 46-dimensional remote sensing feature vector was constructed by fusing Sentinel-1 SAR image features, Sentinel-2 MSI image features, Landsat 8 OLI image features and SRTM-1 topographic features. (2) For different countries, considering the differences in geography and economic development, we constructed several random forest classifiers and evaluated the classifier accuracy evaluation. The average accuracy was above 96%, and the kappa coefficient was above 0.94. (3) It was proposed to further constrain the large dam candidate regions using water body constraints, markers in the key areas and connectivity screening of small patches. The candidate area was reduced to 0.12% of the entire area. The integrity of the large dam candidate region reached 99.61%.

The proposed approach was used to evaluate five study areas: Vietnam, the Philippines, Malaysia, Singapore, and Brunei. Based on the results of this case study, we concluded that the approach is effective at extracting large dam candidate regions; in particular, this approach can effectively reduce the large dam candidate area. We will use this candidate region approach in future studies to detect the location of dams.

Supplemental material

Supplemental Material

Download MS Word (23.7 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The code used in this study is available by contacting the corresponding author.

Additional information

Funding

This work was supported by National Key Research and Development Program of China: [Grant Number 2022YFB3903600]; Automatic Detection of Unknown Dams in a Broad-area based on High-Resolution Remote Sensing Images: [Grant Number 202202B024]; Foundation of Science & Technology on Integrated Information System Laboratory: [Grant Number HLJGXQ20220916032].

References

  • Belgiu, Mariana, and Lucian Drăguţ. 2016. “Random Forest in Remote Sensing: A Review of Applications and Future Directions.” ISPRS Journal of Photogrammetry and Remote Sensing 114: 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011.
  • Benhammou, Y., D. Alcaraz-Segura, E. Guirado, R. Khaldi, B. Achchab, F. Herrera, and S. Tabik. 2022. “Sentinel2GlobalLULC: A Sentinel-2 RGB Image Tile Dataset for Global Land use/Cover Mapping with Deep Learning.” Scientific Data 9 (1): 681. https://doi.org/10.1038/s41597-022-01775-8.
  • Bidkar, Pravin Shivaji, Ram Kumar, and Abhijyoti Ghosh. 2022. “SegNet and Salp Water Optimization-Driven Deep Belief Network for Segmentation and Classification of Brain Tumor.” Gene Expression Patterns 45: 119248. https://doi.org/10.1016/j.gep.2022.119248.
  • Brown, Christopher F., Steven P. Brumby, Brookie Guzder-Williams, Tanya Birch, Samantha Brooks Hyde, Joseph Mazzariello, Wanda Czerwinski, et al. 2022. “Dynamic World, Near Real-Time Global 10 m Land use Land Cover Mapping.” Scientific Data 9: 1. https://doi.org/10.1038/s41597-022-01307-4.
  • Buchanan, B. P., S. A. Sethi, S. Cuppett, M. Lung, G. Jackman, L. Zarri, E. Duvall, et al. 2022. “A Machine Learning Approach to Identify Barriers in Stream Networks Demonstrates High Prevalence of Unmapped Riverine Dams.” Journal of Environmental Management 302 (Pt A): 113952. https://doi.org/10.1016/j.jenvman.2021.113952.
  • Chen, Hui, Sensen Chu, Qizhi Zhuang, Zhixin Duan, Jian Cheng, Jizhe Li, Li Ye, Jun Yu, and Liang Cheng. 2023. “FSPN: End-to-end Full-Space Pooling Weakly Supervised Network for Benthic Habitat Mapping Using Remote Sensing Images.” International Journal of Applied Earth Observation and Geoinformation 118. https://doi.org/10.1016/j.jag.2023.103264.
  • Chen, Gaoxiang, Qun Li, Fuqian Shi, Islem Rekik, and Zhifang Pan. 2020. “RFDCR: Automated Brain Lesion Segmentation Using Cascaded Random Forests with Dense Conditional Random Fields.” NeuroImage 211: 116620. https://doi.org/10.1016/j.neuroimage.2020.116620.
  • Chen, Wanhui, Liangyun Liu, Chao Zhang, Jihua Wang, Jindi Wang, and Yuchun Pan. 2004. “Monitoring the Seasonal Bare Soil Areas in Beijing Using Multi-Temporal TM Images.” IEEE International Geoscience and Remote Sensing Symposium 5: 3379–3382. https://doi.org/10.1109/IGARSS.2004.1370429.
  • Chen, Shenglong, Yoshiki Ogawa, Chenbo Zhao, and Yoshihide Sekimoto. 2023. “Large-scale Individual Building Extraction from Open-Source Satellite Imagery via Super-Resolution-Based Instance Segmentation Approach.” ISPRS Journal of Photogrammetry and Remote Sensing 195: 129–152. https://doi.org/10.1016/j.isprsjprs.2022.11.006.
  • Ettazarini, Said. 2021. “GIS-based Land Suitability Assessment for Check Dam Site Location, Using Topography and Drainage Information: A Case Study from Morocco.” Environmental Earth Sciences 80 (1): 17. https://doi.org/10.1007/s12665-021-09881-3.
  • Fan, Runyu, Ruyi Feng, Lizhe Wang, Jining Yan, and Xiaohan Zhang. 2020. “Semi-MCNN: A Semisupervised Multi-CNN Ensemble Learning Method for Urban Land Cover Classification Using Submeter HRRS Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 4973–4987. https://doi.org/10.1109/jstars.2020.3019410.
  • Fang, Weizhen, Cunguang Wang, Xi Chen, Wei Wan, Huan Li, Siyu Zhu, Yu Fang, Baojian Liu, and Yang Hong. 2019. “Recognizing Global Reservoirs from Landsat 8 Images: A Deep Learning Approach.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (9): 3168–3177. https://doi.org/10.1109/JSTARS.2019.2929601.
  • Gao, Fang, Yihui Li, Peng Zhang, Yuwei Zhai, Yan Zhang, Yongshuai Yang, and Yuan An. 2022. “A High-Resolution Panchromatic-Multispectral Satellite Image Fusion Method Assisted with Building Segmentation.” Computers & Geosciences 168: 105219. https://doi.org/10.1016/j.cageo.2022.105219.
  • Gholami, V., and M. R. Khaleghi. 2021. “Performance-Based Analysis of Earth Dams: A Case Study of Hajji Qushan Watershed, Iran.” Modeling Earth Systems and Environment 8 (2): 1585–1596. https://doi.org/10.1007/s40808-021-01168-7.
  • Gitelson, Anatoly A., Andrés Viña, Timothy J. Arkebauer, Donald C. Rundquist, Galina Keydan, and Bryan Leavitt. 2003. “Remote Estimation of Leaf Area Index and Green Leaf Biomass in Maize Canopies.” Geophysical Research Letters 30: 5. https://doi.org/10.1029/2002GL016450.
  • Gong, P., H. Liu, M. Zhang, C. Li, J. Wang, H. Huang, N. Clinton, et al. 2019. “Stable Classification with Limited Sample: Transferring a 30-m Resolution Sample set Collected in 2015 to Mapping 10-m Resolution Global Land Cover in 2017.” Sci Bull (Beijing) 64 (6): 370–373. https://doi.org/10.1016/j.scib.2019.03.002.
  • Guo, Qian, and Quansheng Dou. 2021. “Semantic Image Segmentation Based on SegNetWithCRFs.” Procedia Computer Science 187: 300–306. https://doi.org/10.1016/j.procs.2021.04.066.
  • Huete, A. R. 1988. “A Soil-Adjusted Vegetation Index (SAVI).” Remote Sensing of Environment 25 (3): 295–309. https://doi.org/10.1016/0034-4257(88)90106-X.
  • Izquierdo-Verdiguier, Emma, and Raúl Zurita-Milla. 2020. “An Evaluation of Guided Regularized Random Forest for Classification and Regression Tasks in Remote Sensing.” International Journal of Applied Earth Observation and Geoinformation 88. https://doi.org/10.1016/j.jag.2020.102051.
  • Jing, Min, Liang Cheng, Chen Ji, Junya Mao, Ning Li, ZhiXing Duan, ZeMing Li, and ManChun Li. 2021. “Detecting Unknown Dams from High-Resolution Remote Sensing Images: A Deep Learning and Spatial Analysis Approach.” International Journal of Applied Earth Observation and Geoinformation 104. https://doi.org/10.1016/j.jag.2021.102576.
  • Jing, Min, Liang Cheng, Chen Ji, Junya Mao, Ning Li, ZhiXing Duan, ZeMing Li, and ManChun Li. 2021. “Detecting unknown dams from high-resolution remote sensing images: A deep learning and spatial analysis approach.” International Journal of Applied Earth Observation and Geoinformation 104. http://doi.org/10.1016/j.jag.2021.102576.
  • Jing, Yafei, Yuhuan Ren, Yalan Liu, Dacheng Wang, and Linjun Yu. 2022. “Dam Extraction from High-Resolution Satellite Images Combined with Location Based on Deep Transfer Learning and Post-Segmentation with an Improved MBI.” Remote Sensing 14: 16. https://doi.org/10.3390/rs14164049.
  • Kang, Liu, and Gellert Mattyus. 2015. “Fast Multiclass Vehicle Detection on Aerial Images.” IEEE Geoscience and Remote Sensing Letters 12 (9): 1938–1942. https://doi.org/10.1109/lgrs.2015.2439517.
  • Le Dréan, Y., P. H. Conze, M. Hatt, D. Visvikis, and B. Badic. 2021. “Segmentation Automatique par DeepLearning en Contexte de Métastases Hépatiques de Cancer du Côlon.” Journal de Chirurgie Viscérale 158 (4): S64. https://doi.org/10.1016/j.jchirv.2021.06.056.
  • Lee, Yoon-Kyung, Sang-Hoon Hong, and Sang-Wan Kim. 2021. “Monitoring of Water Level Change in a Dam from High-Resolution SAR Data.” Remote Sensing 13: 18. https://doi.org/10.3390/rs13183641.
  • Lehner, B., C. Reidy Liermann, C. Revenga, C. Vorosmarty, B. Fekete, P. Crouzet, P. Doll, et al. 2011. “Global Reservoir and Dam Database, Version 1 (GRanDv1): Dams, Revision 01.” In Palisades. New York: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/H4N877QK.
  • Li, Ning, Liang Cheng, Lingyong Huang, Chen Ji, Min Jing, Zhixin Duan, Jingjing Li, and Manchun Li. 2021. “Framework for Unknown Airport Detection in Broad Areas Supported by Deep Learning and Geographic Analysis.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 6328–6338. https://doi.org/10.1109/jstars.2021.3088911.
  • Li, M., W. Dai, M. Fan, W. Qian, X. Yang, Y. Tao, and C. Zhao. 2023. “Combining Deep Learning and Hydrological Analysis for Identifying Check Dam Systems from Remote Sensing Images and DEMs in the Yellow River Basin.” International Journal of Environmental Research and Public Health 20: 5. https://doi.org/10.3390/ijerph20054636.
  • Liu, Haoyang, Tao Liu, Yanzhen Gu, Peiliang Li, Fangguo Zhai, Hui Huang, and Shuangyan He. 2021. “A High-Density Fish School Segmentation Framework for Biomass Statistics in a Deep-Sea Cage.” Ecological Informatics 64: 101367. https://doi.org/10.1016/j.ecoinf.2021.101367.
  • Mao, J., L. Cheng, C. Ji, M. Jing, Z. Duan, N. Li, Z. Gesang, and M. Li. 2022. “Verification of Dam Spatial Location in Open Datasets Based on Geographic Knowledge and Deep Learning.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 7277–7287. https://doi.org/10.1109/JSTARS.2022.3199249.
  • Martínez-Gomariz, Eduardo, Carlos Barbero, Martí Sanchez-Juny, Edwar Forero-Ortiz, and Marcos Sanz-Ramos. 2023. “Dams or Ponds Classification Based on a New Criterion to Assess Potential Flood Damage to Roads in Case of Failure.” Natural Hazards 117 (1): 625–653. https://doi.org/10.1007/s11069-023-05875-5.
  • Medeiros, Marcelo B., Washington L. Oliveira, Flávio R. O. Rodrigues, Renata D. Silva, Íris J. K. Ferreira, Wellyngton E. Ayala, Suelma R. Silva, Rafaela T. Souza, and Marcelo F. Simon. 2023. “Monitoring the Impacts of a Mega-dam on Amazonian Understorey Herbs.” Forest Ecology and Management 536: 120909. https://doi.org/10.1016/j.foreco.2023.120909.
  • Mulligan, Mark, Arnout van Soesbergen, and Leonardo Sáenz. 2020. “GOODD, a Global Dataset of More Than 38,000 Georeferenced Dams.” Scientific Data 7 (1): 31. https://doi.org/10.1038/s41597-020-0362-5.
  • Navarro, Alejandro, Mary Young, Blake Allan, Paul Carnell, Peter Macreadie, and Daniel Ierodiaconou. 2020. “The Application of Unmanned Aerial Vehicles (UAVs) to Estimate Above-Ground Biomass of Mangrove Ecosystems.” Remote Sensing of Environment 242: 111747. https://doi.org/10.1016/j.rse.2020.111747.
  • Phiri, Darius, and Justin Morgenroth. 2017. “Developments in Landsat Land Cover Classification Methods: A Review.” Remote Sensing 9: 9. https://doi.org/10.3390/rs9090967.
  • Poff, N. Leroy, and David D. Hart. 2002. “How Dams Vary and Why It Matters for the Emerging Science of Dam Removal: An Ecological Classification of Dams Is Needed to Characterize How the Tremendous Variation in the Size, Operational Mode, Age, and Number of Dams in a River Basin Influences the Po.” BioScience 52 (8): 659–668. https://doi.org/10.1641/0006-3568(2002)052[0659:HDVAWI]2.0.CO;2.
  • Rose, Rodrigo L., Sohan R. Mugi, and Joseph Homer Saleh. 2023. “Accident Investigation and Lessons Not Learned: AcciMap Analysis of Successive Tailings Dam Collapses in Brazil.” Reliability Engineering & System Safety: 109308. https://doi.org/10.1016/j.ress.2023.109308.
  • Suhara, K. K. Shaheemath, V. Ravikumar, Balaji Kannan, and S. Panneerselvam. 2022. “Mapping the Potential Regions for the Construction of Cement Concrete Check Dams Using Remote Sensing and GIS.” Journal of the Indian Society of Remote Sensing 50 (11): 2193–2208. https://doi.org/10.1007/s12524-022-01591-y.
  • Tian, Fuyou, Bingfang Wu, Hongwei Zeng, Xin Zhang, and Jiaming Xu. 2019. “Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform.” Remote Sensing 11: 6. https://doi.org/10.3390/rs11060629.
  • Tu, Jun, Fei Gao, Jinping Sun, Amir Hussain, and Huiyu Zhou. 2021. “Airport Detection in SAR Images Via Salient Line Segment Detector and Edge-Oriented Region Growing.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 314–326. https://doi.org/10.1109/jstars.2020.3036052.
  • Tucker, Compton J. 1979. “Red and Photographic Infrared Linear Combinations for Monitoring Vegetation.” Remote Sensing of Environment 8 (2): 127–150. https://doi.org/10.1016/0034-4257(79)90013-0.
  • Vali, Ava, Sara Comai, and Matteo Matteucci. 2020. “Deep Learning for Land Use and Land Cover Classification Based on Hyperspectral and Multispectral Earth Observation Data: A Review.” Remote Sensing 12: 15. https://doi.org/10.3390/rs12152495.
  • Wei, Jing, Wei Huang, Zhanqing Li, Lin Sun, Xiaolin Zhu, Qiangqiang Yuan, Lei Liu, and Maureen Cribb. 2020. “Cloud Detection for Landsat Imagery by Combining the Random Forest and Superpixels Extracted via Energy-Driven Sampling Segmentation Approaches.” Remote Sensing of Environment 248: 112005. https://doi.org/10.1016/j.rse.2020.112005.
  • Weit, A., B. Mourier, T. Fretaud, and T. Winiarski. 2023. “Combined Usage of Geophysical Methods in Continental Water Bodies, Their Benefits and Challenging Issues: A Special Focus on Sediment Deposits in dam Reservoirs.” Journal of Applied Geophysics, 105036. https://doi.org/10.1016/j.jappgeo.2023.105036.
  • Xiao, Xiangming, Qingyuan Zhang, Scott Saleska, Lucy Hutyra, Plinio De Camargo, Steven Wofsy, Stephen Frolking, Stephen Boles, Michael Keller, and Berrien Moore. 2005. “Satellite-based Modeling of Gross Primary Production in a Seasonally Moist Tropical Evergreen Forest.” Remote Sensing of Environment 94 (1): 105–122. https://doi.org/10.1016/j.rse.2004.08.015.
  • Yang, Tiejun, Jikun Song, and Lei Li. 2019. “A Deep Learning Model Integrating SK-TPCNN and Random Forests for Brain Tumor Segmentation in MRI.” Biocybernetics and Biomedical Engineering 39 (3): 613–623. https://doi.org/10.1016/j.bbe.2019.06.003.
  • Zeng, Fanxuan, Liang Cheng, Ning Li, Nan Xia, Lei Ma, Xiao Zhou, and Manchun Li. 2019. “A Hierarchical Airport Detection Method Using Spatial Analysis and Deep Learning.” Remote Sensing 11: 19. https://doi.org/10.3390/rs11192204.