ABSTRACT
This study compares the performance of ensemble machine learning methods stacking, blending, and soft voting for Landslide susceptibility mapping (LSM) in a highly affected Northern Italy region, Lombardy. We first created a spatial database based on open data ensuring the accessibility to relevant information for landslide-influencing factors, historical landslide records, and areas with a very low probability of landslide occurrence called ‘No Landslide Zone’, an innovative concept introduced in this study. Then, open-source software was employed for developing five Machine Learning classifiers (Bagging, Random Forests, AdaBoost, Gradient Tree Boosting, and Neural Networks) which were tested at a basin scale by implementing different combinations of training and testing schemes using three use cases. The three classifiers with the highest generalization performance (Random Forests, AdaBoost, and Neural Networks) were selected and combined by ensemble methods. The soft voting showed the highest performance among them. The best model to generate the LSM for the Lombardy region was a Neural Network model trained using data from three basins, achieving an accuracy of 0.93 in Lombardy. The LSM indicates that 37% of Lombardy is in the highest landslide susceptibility categories. Our findings highlight the importance of openness in advancing LSM not only by enhancing the reproducibility and transparency of our methodology but also by promoting knowledge-sharing within the scientific community.
Author Contributions
Conceptualization, M.A.B., V.Y., Q.X. and L.A.; methodology, M.A.B., V.Y. and Q.X.; software, V.Y. and Q.X.; validation, Q.X.; formal analysis, M.A.B., V.Y. and Q.X.; investigation, Q.X.; resources, V.Y. and Q.X.; data curation, V.Y. and Q.X.; writing – original draft preparation, Q.X. and L.A.; writing – review and editing, M.A.B. and V.Y.; visualization, Q.X.; supervision, M.A.B. and V.Y.; project administration, M.A.B.; funding acquisition, M.A.B. All authors have read and agreed to the published version of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Code availability
The code is available at GEOlab Github - Landslide Susceptibility Mapping
Data availability statement
The data that support the findings of this study are openly available in Zenodo. The relevant input datasets are available at https://doi.org/10.5281/zenodo.8185734.
, the outputs from ensemble machine learning models are available at https://doi.org/10.5281/zenodo.8185870, . The estimated No Landslide Zone for Lombardy Regions is available at https://doi.org/10.5281/zenodo.8185887 .