Full article: Enhancing flood-prone area mapping: fine-tuning the K-nearest neighbors (KNN) algorithm for spatial modelling

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This study focuses on determining the optimal distance metric in the K-Nearest Neighbors (KNN) algorithm for spatial modelling of floods. Four distance metrics of the KNN algorithm, namely KNN-Manhattan, KNN-Minkowski, KNN-Euclidean, and KNN-Chebyshev, were utilized for flood susceptibility mapping (FSM) in Estahban, Iran. A spatial database comprising 509 flood occurrence points extracted from satellite images and 12 factors influencing floods was created for analysis. The particle swarm optimization (PSO) algorithm was employed for hyperparameter optimization and feature selection, considering eight influential factors as modelling inputs. The modelling results revealed that the KNN-Manhattan algorithm exhibited superior accuracy (root mean squared error (RMSE) = 0.169, mean absolute error (MAE) = 0.051, coefficient of determination (R²) = 0.884, and area under the curve (AUC) = 0.94) compared with the other algorithms for identifying flood-prone areas. The KNN-Minkowski algorithm followed closely, with an RMSE of 0.175, MAE of 0.056, R² of 0.876, and AUC of 0.939. The KNN-Euclidean algorithm achieved an RMSE of 0.183, MAE of 0.061, R² of 0.842, and AUC of 0.929, whereas the KNN-Chebyshev algorithm achieved an RMSE of 0.198, MAE of 0.075, R2 of 0.842, and AUC of 0.924.

KEYWORDS:

1. Introduction

Flooding is a natural disaster characterized by the overflow of water onto normally dry lands. It can result from various factors such as heavy rainfall, river overflow, or dam failure (Augustine and Akinlolu Citation2015; Şen Citation2018). Floods can significantly damage infrastructure, agriculture, and human lives, causing immense economic and environmental losses worldwide (Manzoor et al. Citation2022). In particular, the Middle East region, including Iran, has faced a fair share of flood-related incidents and has witnessed detrimental consequences. Statistics on flood damage in these areas show concern regarding the frequency and severity of these events (Modarres, Sarhadi, and Burn Citation2016). Effective flood management is of utmost importance in mitigating flood risks and minimizing adverse impacts (Manzoor et al. Citation2022). This has led to the integration of spatial sciences, such as geographic information systems (GIS) and remote sensing, into flood management practices (Eldho, Zope, and Kulkarni Citation2018; Isma'il and Saanyol Citation2013).

GIS technology allows the collection, analysis, and visualization of spatial data, enabling authorities and researchers to understand the geographical patterns and characteristics of flood-prone areas (Yang Citation2016). On the other hand, remote sensing techniques provide valuable data from satellite imagery and aerial surveys, aiding in detecting and monitoring flood events in real-time (Rosser, Leibovici, and Jackson Citation2017). By harnessing the power of the spatial sciences, flood management professionals can make informed decisions, develop comprehensive flood risk maps, and implement preventive measures to safeguard vulnerable areas and populations (Hemmati, Ellingwood, and Mahmoud Citation2020). A critical advantage of spatial flood modelling is its ability to provide a comprehensive understanding of flood hazards (Bates et al. Citation2021). Traditional flood studies often rely on historical flood records and simplistic assumptions about flood patterns (Razavi-Termeh et al. Citation2023). However, spatial flood modelling considers multiple factors such as topography, land cover, rainfall data, and river networks to accurately represent flood-prone areas (Wu, Chen, and Lu Citation2022). By considering these complex spatial relationships, spatial modelling of floods enables a more nuanced assessment of flood hazards, including identifying areas at different levels of susceptibility and understanding the underlying processes that contribute to flooding (Disse et al. Citation2020). Floods pose significant risks to human lives, infrastructure, and the environment, making accurate flood susceptibility mapping (FSM) crucial for effective disaster management and urban planning (Pourghasemi et al. Citation2020a). With the increasing frequency and intensity of extreme weather events associated with climate change, there is a growing need for advanced spatial modelling techniques to identify and evaluate areas prone to flooding (Kundzewicz et al. Citation2014). Flooding poses a significant threat to communities, infrastructure, and the environment. Accurate identification of flood-prone areas is crucial for effective disaster management and urban planning. Despite advancements in mapping technologies, there is a persistent need for more precise and efficient techniques for mitigating the risks associated with flooding.

Numerous methods have been employed in the spatial modelling of floods to predict flood-prone area mapping. These methods encompass a range of techniques, including statistical approaches (frequency ratio (FR), weight of evidence (WOE), certainty factor (CF), and evidential belief function (EBF)) (Dutta et al. Citation2023; Khosravi et al. Citation2016; Tehrany et al. Citation2014; Youssef, Pradhan, and Sefry Citation2016), and hydrological simulations (HEC-HMS, HEC-RAS, and MIKE) (AL-Hussein et al. Citation2022; Lin et al. Citation2022; Ramly and Tahir Citation2016) and machine/deep learning algorithms (Andaryani et al. Citation2021; Bui et al. Citation2020; Janizadeh et al. Citation2019; Mehravar et al. Citation2023). Statistical approaches utilize historical flood data to identify patterns and relationships between variables, aiding the prediction of future flood events (Youssef, Pradhan, and Sefry Citation2016). Hydrological simulations involve mathematical models to simulate water flow behaviour and predict the flood extent based on various input parameters (Merkuryeva et al. Citation2015). Machine learning algorithms have gained significant popularity in flood modelling because of their ability to analyze large datasets and uncover complex relationships. These algorithms learn from the existing data to make predictions and classify those (Schmidt et al. Citation2020). Integrating machine learning algorithms in flood modelling has revolutionized the field by enabling the analysis of extensive datasets and extraction of valuable insights (Pham et al. Citation2021). These algorithms can identify hidden patterns and relationships that traditional methods can overlook (Razavi-Termeh et al. Citation2023). Numerous machine- and deep-learning algorithms have been employed to identify flood-prone regions. These algorithms include random forest (RF) (Razavi-Termeh et al. Citation2023), adaptive neuro-fuzzy inference system (ANFIS) (Pourghasemi et al. Citation2020b), artificial neural network (ANN) (Seleem et al. Citation2022), support vector regression (SVR) (Siam et al. Citation2021), group method of data handling (GMDH) (Rezaie et al. Citation2021), logistic regression (LR) (Chowdhuri, Pal, and Chakrabortty Citation2020), bagging (Shahabi et al. Citation2020), AdaBoost (Saravanan et al. Citation2023), decision tree (DT) (Tehrany, Pradhan, and Jebur Citation2013), stacking (Ilia et al. Citation2022), k-nearest neighbors (KNN) (Prasad et al. Citation2022), convolutional neural network (CNN) (Wang et al. Citation2020), and long short-term memory (LSTM) (Fang et al. Citation2021).

Although various methodologies, including machine learning algorithms such as KNN, have been applied to FSM, the domain remains underexplored. Existing literature highlights the need for innovative approaches to enhance the precision and efficiency of flood-prone area identification. The development of more precise and efficient FSM techniques has a substantial scientific and economic potential. Accurate identification of flood-prone areas can inform robust disaster preparedness strategies, optimize resource allocation, and contribute to resilient infrastructure planning. Among the machine-learning algorithms employed in flood modelling, the KNN algorithm is particularly significant (Costache et al. Citation2020). The KNN algorithm is a nonparametric method that classifies new instances based on their similarity with neighbouring data points (Taunk et al. Citation2019). In flood modelling, the KNN algorithm can use spatial proximity to classify areas as flood-prone (Al-Areeq et al. Citation2022). By considering the characteristics and attributes of neighbouring locations, the algorithm can make informed decisions and accurately predict flood occurrences (Gauhar, Das, and Moury Citation2021). The KNN algorithm utilizes the concept of proximity, in which the class of an unclassified sample is determined by its proximity to known samples in the feature space (Pandya, Upadhyay, and Harsha Citation2013). However, an essential factor that significantly affects the performance and accuracy of the KNN algorithm is the selection of an appropriate distance metric (Zhang Citation2016). The distance metric is crucial for quantifying the similarity or dissimilarity between feature vectors and directly influences classification results (Xing and Bei Citation2019). Therefore, determining the optimal distance metric is crucial for achieving reliable FSM outcomes. This study addresses an innovative aspect of flood modelling by determining the optimal distance metric for the KNN algorithm. By exploring different distance metrics, this study sought to evaluate their impact on the accuracy and reliability of flood predictions. This research aims to provide valuable insights into the spatial modelling of flood-prone areas, ultimately enhancing flood management strategies.

This study demonstrates innovation in multiple ways. First, it pioneers the integrating of Landsat-8 and Sentinel-1 satellite imagery to create a comprehensive historical flood distribution map. This integration is coupled with other GIS-based criteria to enhance flood sensitivity modelling. Second, this study introduced a novel approach for optimizing the effectiveness of the parameters. It employs a particle swarm optimization (PSO) algorithm to select features, ensuring that only the most relevant parameters are utilized. Furthermore, the PSO algorithm was used to determine the optimal hyper parameters of the KNN algorithm, thereby improving its performance. Finally, the study tackles the modelling and identification of flood-prone areas by comparing four distance metrics within the KNN algorithm: Euclidean, Manhattan, Minkowski, and Chebyshev distances. Researchers have achieved more accurate assessments of flood-prone regions by employing various distance measures. In addition, the study utilized four different algorithms (KNN-Euclidean, KNN-Minkowski, KNN-Manhattan, and KNN-Chebyshev) to prepare the FSM.

2. Material and methods

2.1. Methodology

The methodology employed in this research involved several key steps in determining the optimal distance metric in the KNN algorithm for spatial modelling of flood-prone areas (). Data collection involved gathering relevant spatial and remote sensing data, which were then pre-processed and prepared for analysis. This study employed a multicollinearity test and the FR method to assess the interrelationship between variables and ascertain the likelihood of flood occurrences based on specific criteria. Feature selection was performed to identify the most significant variables for flood occurrence. The PSO algorithm was used to select features and evaluate the significance of the parameters in this study. The dataset is divided into training and testing subsets. The training set was used to train the KNN algorithm, allowing it to learn patterns and relationships within the data. The testing set was used to evaluate the performance and generalization capabilities of the trained model. The PSO algorithm is employed to identify the optimal hyper parameters for the KNN algorithm. The KNN algorithm was implemented using Python. The algorithm assigns classes (flood-prone or non-flood-prone) to new instances based on the similarity of their attributes to neighbouring cases in the training set. Different distance metrics, such as the Euclidean distance, Manhattan distance, Minkowski distance, and Chebyshev distance, were considered to determine their impact on the accuracy of flood predictions. Subsequently, flood-prone areas were delineated using four algorithms: KNN-Euclidean, KNN-Minkowski, KNN-Manhattan, and KNN-Chebyshev. In the final step, modelling performance was assessed using root mean square error (RMSE), mean absolute error (MAE), and R² (Coefficient of determination) metrics. Maps of flood-prone areas were also evaluated using the area under the curve (AUC) index in the receiver operating characteristic (ROC) curve. Through this systematic methodology, this study aimed to provide valuable insights into the optimal distance metric selection for the KNN algorithm in flood modelling.

Figure 1. Research methodology for the spatial modelling of flood-prone areas.

2.2. Study area

Estahban, a town in Iran, is the study area of this research. Positioned at approximately 29.12°N latitude and 54.05°E longitude, Estahban is situated in Fars province in the southwestern part of the country (). With an elevation of 1,250 m above sea level, the town is characterized by a diverse range of geographical and climatic features. Surrounded by mountainous terrain, including the (Tudj mountain range), Estahban benefits from its proximity to (the Roadball rivers), which contributes to the hydrological dynamics of the area. The elevation within the study area varies, ranging from to 1298–3070 m. The climate of Estahban can be described as semi-arid, featuring moderate winters and hot summers. The warmest period of the year, with the highest temperatures, typically occurs from June to August, with an average annual temperature of approximately 17°C. The region experiences an average annual rainfall of approximately 282 mm, with most of the precipitation concentrated in the winter months of December, January, and February. Estahban and its surrounding regions are susceptible to natural hazards such as floods and landslides. The study area has encountered severe floods that have significantly damaged infrastructure and agriculture. Notably, on August 22, 2022, a flash flood caused by monsoon rains claimed the lives of 23 individuals in Estahban, with property damage totalling $23,809,524 (). Considering the region's vulnerability to floods and their devastating consequences, this study focuses on flood-prone areas within Estahban. The selection of Estahban as the study area is driven by its susceptibility to flooding, availability of relevant data, and significance of developing accurate flood risk models for effective disaster management and urban planning.

Figure 2. Study area and distribution of flood points.

Figure 3. Images of areas flooded during monsoon floods.

2.3. Flood inventory map

Creating a flood inventory map is a crucial aspect of spatial flood modelling, as it serves as a comprehensive database that captures and characterizes past flood events within a specific area (Mojaddadi et al. Citation2017). In the particular context of the study area during the monsoon season of 2022, floods were monitored using a combination of Synthetic Aperture Radar (SAR) and Landsat-8 satellite images (DeVries et al. Citation2020). Implementing this flood monitoring method relies on the powerful capabilities of the Google Earth Engine (GEE) platform. The flood occurrence polygons were converted into 509 flood occurrence points to model the flood occurrence. These points served as the basis for the flood occurrence model within the study area. The Holdout method randomly divided these points to ensure an adequate model evaluation. The training set comprised 70% of the flood points (356 points), whereas the evaluation set contained 30% (153 points) (). To ensure balanced training data for the machine learning algorithms, an equal number of non-flooding points (target with a value of 0) and flood occurrence points (target with a value of 1) were considered. This approach aimed to prevent bias towards any class and promote fair and accurate modelling.

2.4. Flood effective factors

Various factors play a significant role in determining the effectiveness and severity of floods in flood-prone areas (Haghizadeh et al. Citation2017). These factors included rainfall, altitude, slope, aspect, lithology, land cover, distance to the river, topographic wetness index (TWI), stream power index (SPI), normalized difference vegetation index (NDVI), plan curvature, and profile curvature ( and ). Understanding and analyzing these factors can provide valuable insights into flood dynamics and aid in the development of accurate flood models (Vojtek and Vojteková Citation2019). Various methods have been employed to gather the necessary data, including accessing existing datasets, conducting field surveys, and utilizing remote-sensing techniques. Topographic factors, including SPI, plan curvature, TWI, profile curvature, altitude, aspect, and slope, were derived using a 30 × 30 m pixel size from the digital elevation model (DEM) using ArcGIS 10.8 and SAGA GIS 8.2.1. The DEM of Estahban town was prepared and processed using images from the Shuttle Radar Topography Mission (SRTM) within the GEE system. A land cover map was generated by combining Sentinel-1 and Sentinel-2 images in the GEE platform (Ghorbanian et al. Citation2020). Vegetation cover was evaluated using the NDVI derived from Landsat-8 images in 2022 and the GEE system. Hydrological conditions, such as the distance to the river, were determined using the river layer prepared from the DEM. Rainfall patterns were assessed using data from 26 synoptic stations in the Fars province over a decade (2012–2022) and employing the Kriging interpolation method in ArcGIS 10.8. The geology of the study area was determined using data from Iran's Geological Maps with a scale of 1:100,000.

Figure 4. Factors influencing flood occurrence: (a) rainfall, (b) land cover, (c) distance to river, (d) NDVI, (e) lithology, (f) slope, (g) profile curvature, (h) TWI, (i) SPI, (j) altitude, (k) aspect, and (l) Plan curvature.

Table 1. Factors affecting floods.

Download CSV Display Table

To clean and prepare the data for analysis using ArcGIS 10.8 software, a series of essential operations were undertaken to ensure the integrity and uniformity of the dataset. The initial step involved projection of the dataset into the Universal Transverse Mercator (UTM) coordinate system zone 40. Following the coordinate system transformation, the interpolation method of choice was kriging. This spatial interpolation technique enabled the estimation of values at unobserved locations, providing a continuous and nuanced representation of critical variables such as rainfall across the study area. Euclidean distance analysis was subsequently employed to quantify the proximity of various features within the dataset. A pivotal step in pre-processing involves converting vector data into a raster format. This transformation is essential for seamlessly integrating diverse datasets, including lithology, into a unified spatial database. The entire dataset was resampled to ensure consistency and comparability across all factors. This involved converting all the factors to a standardized pixel size of 30 × 30 m. This resampling method provided uniformity in spatial representation and ensured that each variable contributed equally to subsequent analyses and modelling.

2.5. Multicollinearity test

Multicollinearity is characterized by interdependence, the absence of independence, and strong or almost perfect linear correlations among the variables within the regression equation (Razavi-Termeh, Sadeghi-Niaraki, and Choi Citation2021a). To minimize model bias and optimize prediction accuracy, it is essential to assess the presence of multicollinearity and eliminate high multicollinearity factors (Zainodin, Noraini, and Yap Citation2011). Multicollinearity is commonly evaluated using the variance inflation factor (VIF). Generally, when the VIF value exceeds 10, there is significant multicollinearity among variables (Farahani, Razavi-Termeh, and Sadeghi-Niaraki Citation2022).

2.6. Determining the probability of flooding with the frequency ratio (FR) method

FR is a modified version of probabilistic methods that relies on the observed connections between the distribution of floods and the associated causal factors that exhibit correlation (Shafapour Tehrany et al. Citation2019). The frequency ratios of all the evaluation factors were computed by computing the probability ratio between flood occurrence and non-occurrence across various classification intervals (Samanta et al. Citation2018). This index was calculated using Equation (1) (Masroor et al. Citation2023). (1) $FR = \frac{F_{ij} / F_{r}}{A_{ij} / A_{r}}$ (1) $F_{ij}$ denotes the number of flood occurrences within the jth category of the i-th flood evaluation factor. $F_{r}$ represents the cumulative count of flood occurrences within the entire study area. $A_{ij}$ refers to the area occupied by the jth category of the i-th flood evaluation factor, while $A_{r}$ indicates the total area of the study region. Higher values of this index indicate a probability of more floods in that class of criteria (Samanta et al. Citation2018).

2.7. K-nearest neighbor (KNN) algorithm

Cover and Hart (Citation1967) initially introduced the KNN as a classification algorithm. However, its widespread use as a non-parametric regression technique in recent decades has proven its versatility and effectiveness (Sumayli Citation2023). The KNN regression algorithm is a straightforward technique that employs the K-nearest data points from a dataset to predict the value of a new observation (Tang, Chang, and Li Citation2023). The basic calculation steps of the KNN algorithm can be summarized as follows (Lin et al. Citation2022): State vectors are built using past and present data to predict the target value. The k-nearest neighbors are determined by determining the lowest distances between the current and initial state vectors. By averaging the values of k neighbours in the following time step, we can derive a prediction for the desired future time. The KNN algorithm is an essential parameter for measuring the distance between the samples (Hamed et al. Citation2020). This study used four metrics: Euclidean distance, Manhattan distance, Minkowski distance, and Chebyshev distance. These distance metrics are described in detail in ().

Figure 5. Integration of KNN algorithm with PSO algorithm for flood modelling.

Euclidean distance, a measure of the straight-line distance between two points in Euclidean space, is non-negative (Equation 2) (Du et al. Citation2016). In contrast, the Manhattan distance (Equation 3), also known as the city block distance or taxicab distance, is a distance metric that calculates the distance between two points in a grid-like space by considering only horizontal and vertical movements. Similar to the Euclidean distance, the Manhattan distance is always non-negative (Alfeilat et al. Citation2019). A more generalized distance metric is the Minkowski distance (Equation 4), which encompasses Euclidean and Manhattan distances as exceptional cases. Minkowski distance is a flexible metric that can handle various scenarios and offers a versatile approach for distance measurement (Lee and Torpelund-Bruin Citation2012). Another distance metric, the Chebyshev distance (Equation 5), also known as the chessboard distance or maximum metric, calculates the maximum absolute difference between the coordinates of two points in a multidimensional space. Notably, the Chebyshev distance satisfies the properties of a distance measure, including non-negativity, the identity of indiscernible, symmetry, and triangle inequality (Cunningham and Delany Citation2021). (2) $\sqrt{\sum_{i = 1}^{n} {| x_{i} - y_{i} |}^{2}}$ (2) (3) $\sum_{i = 1}^{n} | x_{i} - y_{i} |$ (3) (4) ${(\sum_{i = 1}^{n} {| x_{i} - y_{i} |}^{C})}^{1 / C}$ (4) (5) $ma x_{i} | x_{i} - y_{i} |$ (5)

Where x and y represent two points in an n-dimensional space, n is the number of dimensions, and C is a positive value of either 1 or 2. The advantages and disadvantages of each distance are summarized in .

2.8. Feature selection and hyper parameter optimization

The PSO algorithm was employed to determine the optimal parameters for modelling and identify the hyper parameters of the KNN algorithm. This approach enables the selection of the most suitable parameters and hyper parameters, thereby enhancing the performance and effectiveness of the KNN algorithm in various applications (Abualigah, Khader, and Hanandeh Citation2018). Eberhart and Kennedy (Citation1995) introduced the PSO algorithm in 1995 as an iterative, population-based optimization algorithm. The inspiration behind the PSO algorithm stems from the collective behaviour observed in natural swarms, such as the synchronized movements of fish schools or flocking patterns of birds. This concept of swarm behaviour serves as a broad inspiration for the development of the PSO algorithm (Marini and Walczak Citation2015). In the PSO algorithm, the search process involves the utilization of a population, also called a swarm, consisting of multiple particles. These particles represent candidate solutions, and collectively explore the search space to find the optimal solution (Esmin, Lambert-Torres, and De Souza Citation2005). The particles dynamically adjust their positions and velocities based on their experience and the best-performing particle in the swarm, allowing them to efficiently navigate and converge toward the optimal solution (Jain et al. Citation2022). In a multidimensional search space of dimensionality ‘d’, the position and velocity of each particle at a given time ‘t’ are denoted as $X_{i}^{t}$ and $V_{i}^{t}$ , respectively. These values represent the current location and movement direction of the particles within the search space. During the evolutionary process, the values of $X_{i}^{t}$ and $V_{i}^{t}$ for each particle are adjusted by considering the influence of both the particle's personal best experience (pbest) and the global best experience (gbest) (Yang, He, and Fu Citation2014). This adjustment was performed using the following equation (Jain et al. Citation2022): (6) ${\begin{matrix} X_{i}^{t + 1} = X_{i}^{t} + V_{i}^{t + 1} \\ V_{i}^{t + 1} = w_{}^{t} * V_{i}^{t} + c 1 * r 1 * ({pbest}_{i}^{t} - X_{i}^{t}) + c 2 * r 2 * ({gbest}_{i}^{t} - X_{i}^{t}) \end{matrix}$ (6) The above equation includes the acceleration constants denoted as c1 and c2. Additionally, it incorporates random variables r1 and r2, which are uniformly distributed within the range of 0 to 1. The inertia weight, represented as $w_{}^{t}$ , is a parameter that governs the velocity change during the optimization process (Jain et al. Citation2022).

Table 2. Advantages and disadvantages of distances used in KNN algorithm.

Download CSV Display Table

Feature selection involves selecting a subset of features from a given feature set. Owing to the NP-hard nature of the problem, determining the optimal feature selection is impractical. To address this challenge, this research employed the PSO metaheuristic algorithm (Harb and Desuky Citation2014). Equation (7) represents the objective function utilized in the PSO algorithm to select optimal features. (7) $Object function 1 = \min (\frac{\sum_{i = 1}^{N} {(y - y^{'})}^{2}}{N} + w * n)$ (7) The variables in the equation have the following interpretations: y represents the actual value of an observation, y′ represents the predicted value, w is a weight ranging between 0 and 1, and n denotes the total number of data points.

In the initial step, the PSO algorithm selects a feature, and using the KNN algorithm, predictions are made based on the values of this feature regarding the objective of the problem (the occurrence or non-occurrence of floods). The model’s predictions were then rigorously compared with the actual values of the objective, forming the basis for the computation of the objective function. The PSO algorithm dynamically replaces different features in diverse iterations to minimize the objective function. The process begins with two features, and the KNN algorithm performs predictions using these features. The predicted values were then meticulously compared with the actual values, contributing to the ongoing refinement of the objective function. Across multiple iterations, the PSO algorithm systematically considers pairs of features and progressively minimizes the objective function. This iterative process persists until all possible combinations of features are explored exhaustively. Ultimately, the combination of features yielding the minimum objective function value is identified as the pivotal set for modelling.

In this study, another notable implementation of the PSO algorithm involved fine-tuning and optimization of the hyper parameters associated with the KNN algorithm. The hyper parameters encompassed by the KNN algorithm include various elements, such as the number of neighbours (n_neighbours), weight function (weights), and leaf size (leaf_size). The primary objective of optimizing these hyper parameters is to ascertain their values with high precision, while minimizing the discrepancy between the expected and target variables. To accomplish this objective, this study employed an objective function that aims to minimize the RMSE index (Equation 8) (Razavi-Termeh et al. Citation2021b). (8) $Object function 2 = RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(y - y^{'})}^{2}}{n}}$ (8) Where n represents the number of data points, y′ refers to the predicted values generated by the KNN algorithm, and y denotes the corresponding actual or target values.

2.9. Validation techniques

Several validation techniques were employed to assess the performance and sensitivity of the developed algorithms based on the KNN approach for spatial modelling of flood-prone areas. These techniques aim to evaluate the accuracy and reliability of the models in predicting flood events and generating flood susceptibility maps. The accuracy of the developed algorithms was assessed using three commonly used evaluation indices: RMSE (Equation 8), MAE (Equation 9), and R² (Equation 10). These indices were calculated for both the training and test datasets to measure the effectiveness of the models in capturing the spatial patterns and variability of the flood-prone areas (Farhangi et al. Citation2021). (9) $R^{2} = 1 - \frac{\sum {(y - y^{'})}^{2}}{\sum {(y - \bar{y})}^{2}}$ (9) (10) $MAE = \frac{\sum_{i = 1}^{n} | y - y^{'} |}{n}$ (10) Variable n represents the total number of data points in the given equations. The predicted values generated by the KNN algorithm are denoted as y′, while $\bar{y}$ represents the mean value. Variable y represents the corresponding actual or target values of the data points (Farhangi et al. Citation2021).

The MAE provides a straightforward measure of the average magnitude of errors between predicted and actual values. Calculated as the average of the absolute differences, a lower MAE indicates that, on average, the model's predictions closely align with the actual flood susceptibility values (Razavi-Termeh et al. Citation2023). RMSE extends the evaluation by considering both the magnitude and direction of errors. By taking the square root of the average of squared differences, RMSE penalizes significant errors more heavily than minor ones. A lower RMSE signifies that the model effectively minimizes small and large errors (Farhangi et al. Citation2021). The Coefficient of Determination, or R², assesses the goodness of fit by representing the proportion of variance in flood susceptibility that is predictable from the chosen features. From 0 to 1, a higher R² indicates a better fit, suggesting that the model explains a significant portion of the variability in flood susceptibility. These metrics contribute to a thorough evaluation of our spatial modelling approach. MAE and RMSE gauge precision, ensuring the model's predictions closely match the observed flood susceptibility values. Meanwhile, R² assesses the accuracy and the explained variability, offering insights into how well the chosen features capture the underlying patterns in flood susceptibility (Razavi-Termeh et al. Citation2023). In addition, the AUC index in the ROC curve analysis was employed to evaluate the flood susceptibility maps generated by the four algorithms (Nachappa et al. Citation2020). The ROC curve is a graphical representation of the sensitivity and specificity of a binary classifier system, which, in our case, represents the FSM algorithm (Janizadeh et al. Citation2019). The AUC index quantifies the overall performance of the algorithms in distinguishing between flood-prone and non-flood-prone areas with values ranging from 0 to 1. A higher AUC value indicates higher accuracy in identifying flood-prone areas (Razavi-Termeh et al. Citation2023). The Wilcoxon signed-rank test is a non-parametric statistical test used to assess whether there is a difference between two groups. It is often applied when the data does not meet the assumptions required for a parametric test, such as the paired t-test (Razavi-Termeh et al. Citation2023).

3. Result

3.1. Results of multicollinearity between effective factors

From the VIF values (), it can be observed that most of the independent variables have VIF values close to or below two, indicating relatively low levels of multicollinearity. Variables such as aspect, lithology, land cover, NDVI, plan curvature, profile curvature, rainfall, and distance to the river exhibited VIF values around or below 1.5, suggesting a low degree of intercorrelation. However, two variables, slope and SPI, had VIF values above 3. This indicates a relatively high level of multicollinearity between these variables and the other independent variables in the model. According to the VIF index findings, there is an absence of multicollinearity among the variables influencing floods, allowing all factors to contribute to flood modelling.

Table 3. Multicollinearity analysis results for effective factors.

Download CSV Display Table

3.2. Determining the probability of flooding using the FR method

The results of the FR analysis indicated the factors with the highest weights in determining the flood-prone areas (). Among the factors examined, slope played a significant role, with the highest weight observed in the range of 0–6 (FR = 2.19). Areas with steeper slopes exhibit higher flood risks because the water runoff is more rapid and concentrated. The highest weight was observed in the range of 0–100 m (FR = 2.29) of the distance to the river factor, indicating that areas near rivers are more prone to flooding. The highest weight was observed at–1298–1674 m (FR = 2.37), indicating that areas at lower altitudes face higher flood risks. This observation aligns with the notion that low-lying areas are more susceptible to flooding owing to their proximity to water bodies and their potential for water accumulation. In the NDVI factor, the highest weight is observed in the range of >0.29 (FR = 7.61), indicating that areas with dense vegetation cover exhibit higher flood risks. The SPI factor exhibited its highest weight within the 0–100 range (FR = 2.56), suggesting that regions with lower SPI values are at a higher risk of flooding. Areas with higher TWI values had the highest weight (FR = 3.18), indicating that these regions were more prone to flooding. The profile curvature factor indicates that the highest likelihood of flooding is associated with the −0.003–0.0007 class (FR = 1.31). It is observed that the 263–280 mm class in the rainfall factor demonstrates a heightened probability of flooding (FR = 1.29). The class of −0.001–0.001 in the plan curvature criterion exhibits the most significant influence on flood occurrence, as evidenced by its weight of 1.02. Regarding slope aspect, land cover, and lithology, the SE class (FR = 1.56) in slope aspect, water body class (FR = 4.25) in land cover, and Qft2 class (FR = 1.7) in lithology demonstrated the highest probability of flooding.

Table 4. Probability of flooding determination using the FR method.

Download CSV Display Table

3.3. Results of feature selection with PSO algorithm

The feature selection process aims to identify the optimal combination of factors that yield the best cost, indicating the effectiveness of the selected factors in flood modelling. A feature selection process using the PSO algorithm was conducted to determine the optimal dataset for modelling. Python programming language was used to implement the PSO algorithm. The implementation was performed in Google Colab, an online integrated development environment (IDE) provided by Google (https://colab.research.google.com/). The parameters used in the PSO algorithm are listed in .

Table 5. Parameters used in the PSO algorithm.

Download CSV Display Table

The PSO algorithm computes an objective function for each feature set by iterating through various feature combinations. The objective function value represents the quality of the feature selection for modelling, with a lower value indicating a better fit. The convergence of the PSO algorithm is illustrated in , which shows the progression of the objective function values across iterations. As the number of factors increased, the best cost gradually decreased, indicating improved performance in flood risk assessment. The selected factors included rainfall, land cover, altitude, aspect, slope, TWI, lithology, and NDVI. As shown in , in the 78th iteration, using a set of eight factors, the objective function reached a minimum value of 0.056742. This indicates that the selected feature set in this iteration achieves the best performance for modelling. These eight factors were used to model the flood susceptibility.

Figure 6. Convergence diagram of the PSO algorithm for feature selection.

Table 6. Optimal factors in different iterations using PSO algorithm.

Download CSV Display Table

The importance of each factor influencing flooding was determined using the PSO method. Each factor's repetition percentage during the feature selection stage was considered an indicator of its importance (). Altitude emerged as the most influential factor (100% frequency), followed by rainfall (91.6%) and NDVI (83.33%). Land cover and lithology showed moderate importance, at 75% and 66.66%, respectively. SPI appeared at a frequency of 50%. Other factors, such as slope, TWI, aspect, plan curvature, and profile curvature, held relatively lower but considerable importance (33.33% or 25%). The distance to the river had a frequency of 16.6%, indicating less influence on flood modelling. These findings highlight the significance of altitude, rainfall, NDVI, land cover, and lithology for flood prediction.

Table 7. Importance of factors affecting floods using the PSO algorithm.

Download CSV Display Table

3.4. Hyper parameter optimization of KNN algorithm

In the hyper parameter optimization of the KNN algorithm using the PSO algorithm, the values for the tuned hyper parameters were as follows: n_neighbours = 2, Weights = Uniform, and leaf_size = 4 (). These optimized values are obtained through the PSO algorithm, which aims to find the most practical combination of hyper parameters for improved performance in flood modelling. The optimized n_neighbours indicate that the algorithm considers the two nearest neighbours, whereas the uniform weights indicate that all the neighbours have equal influence. A leaf_size of 4 sets the threshold for switching to a brute-force algorithm. By optimizing these hyper parameters, the KNN algorithm can be tailored to meet the specific requirements of flood modelling, leading to enhanced accuracy and performance.

Table 8. Optimized hyperparameters in the KNN algorithm using PSO algorithm.

Download CSV Display Table

3.5. Flood modelling and susceptibility

The following steps were undertaken in flood modelling and susceptibility mapping. First, the absence of multicollinearity among factors was confirmed. The weights for each criterion class were then determined using the FR method, and weight maps were created accordingly. These weight maps were subsequently normalized between zero and one. The training and test data values were extracted from the normalized weight maps and were utilized as inputs for the feature selection process. The PSO algorithm was employed to determine the optimal number of effective criteria and hyperparameters for the models. Eight effective criteria were selected for the flood sensitivity modelling. To model the flood susceptibility, four different distances in the KNN algorithm were compared: KNN-Manhattan, KNN-Minkowski, KNN-Chebyshev, and KNN-Euclidean. Flood susceptibility modelling using these four algorithms was implemented using Python programming language in the Google Colab environment. Flood modelling and susceptibility results were obtained for each algorithm, and their performances were evaluated using the MAE, RMSE, and R² metrics ().

Table 9. Evaluation indices for the KNN-based algorithm performance.

Download CSV Display Table

The KNN-Manhattan model exhibited the best performance, with an MAE of 0.051 and an RMSE of 0.169 on the training dataset, accompanied by an impressive R² value of 0.884. The model achieved an MAE of 0.106, RMSE of 0.272, and R² of 0.702 for the test dataset. The KNN-Minkowski model closely followed with an MAE of 0.056, RMSE of 0.175, and R² of 0.876 for the training dataset. The test dataset produced an MAE, RMSE of 0.285, and R2 of 0.117, 0.285, and 0.673, respectively. The KNN-Euclidean model yielded an MAE of 0.061, an RMSE of 0.183, and an R² of 0.865 for the training dataset. However, its performance slightly decreased on the test dataset, resulting in an MAE of 0.133, an RMSE of 0.313, and an R² of 0.607. The KNN-Chebyshev model exhibited an MAE of 0.075, RMSE of 0.198, and R² of 0.842 for the training dataset. The test dataset achieved an MAE of 0.118, an RMSE of 0.275, and an R² of 0.696. Overall, the KNN-Manhattan model demonstrated superior MAE, RMSE, and R² performances on both the training and test datasets. The KNN-Minkowski model also showed competitive results. In contrast, the KNN-Euclidean and KNN-Chebyshev models exhibited slightly lower performance. These findings highlight the significance of the different distance metrics within the KNN algorithm for flood modelling and susceptibility mapping. The KNN-Manhattan and KNN-Minkowski models showed promising outcomes and can be further explored for accurate flood risk assessment and effective flood management strategies.

After applying the four algorithms and obtaining flood susceptibility models, the next step was to use these models for the entire town of Estahban. This allowed the determination of the flood susceptibility for each pixel in the study area. The resulting continuous maps were divided into five susceptibility classes using the natural-break classification method (). The five susceptibility classes, ranging from very low to very high, clearly indicate the flood risk level for each area within the Estahban. The natural reclassification method ensures that the division of continuous maps into classes is based on the inherent characteristics of the data. By dividing the continuous maps into distinct susceptibility classes, it becomes easier to visualize and communicate flood risk levels across different Estahban areas.

Figure 7. Flood-prone area mapping by (a) KNN-Euclidean, (b) KNN-Chebyshev, (c) KNN-Minkowski, and (d) KNN-Manhattan.

3.6. Validation of susceptibility maps

The susceptibility maps prepared using the four algorithms were validated using 30% of the flood occurrence data that were not utilized during the modelling phase. The ROC curve and AUC index were used to assess the quality of the susceptibility maps. The evaluation results are shown in and . The AUC values indicate the overall performance of each model in predicting the flood susceptibility. A higher AUC value suggests better discrimination and accuracy in differentiating between areas prone to flooding and non-flood-prone regions. According to the results, the KNN-Manhattan and KNN-Minkowski models achieved high AUC values of 0.940 and 0.939, respectively. These models exhibit strong predictive capabilities and can distinguish between flood-prone and non-flood-prone areas. The KNN-Euclidean model also performed reasonably well with an AUC value of 0.929. The KNN-Chebyshev model had a slightly lower AUC value of 0.924 but still demonstrated acceptable performance in flood susceptibility prediction. The standard errors associated with the AUC values indicate the precision of the estimates. At the same time, the 95% confidence intervals provided a range within which the actual AUC value was likely to fall.

Figure 8. ROC curve analysis for flood-prone area maps.

Table 10. The AUC values for the flood-prone areas were generated using different algorithms.

Download CSV Display Table

The Wilcoxon signed-rank test was used to compare the models, based on . The test evaluated the difference between the AUC for each model and calculated the z-statistics. A significance level (α = 0.05) was used to determine the confidence level of the results. There was no statistically significant difference between the KNN-Manhattan and KNN-Minkowski models (P = 0.8912). However, the KNN-Manhattan model exhibited a statistically significant difference compared to the KNN-Euclidean model (P = 0.0458), suggesting variation in their performance. Similarly, the KNN-Manhattan model showed a statistically significant difference from the KNN-Chebyshev model (P = 0.0090), indicating variation in their performance. However, there was no statistically significant difference between the KNN-Minkowski and KNN-Euclidean models (P = 0.1038). However, the KNN-Minkowski model demonstrated a statistically significant difference compared with the KNN-Chebyshev model (P = 0.0228), suggesting a variation in their performance. Finally, the KNN-Euclidean model showed a statistically significant difference compared to the KNN-Chebyshev model (P = 0.0180), indicating a variation in their performance.

Table 11. Results of Wilcoxon signed-rank test for model comparison.

Download CSV Display Table

These results provide insight into the performance and statistical significance of each model. KNN-Manhattan demonstrated the highest AUC, whereas KNN-Minkowski, KNN-Euclidean, and KNN-Chebyshev exhibited slightly lower but comparable performance. Significance tests highlight the differences between the models, aiding the selection of the most effective model for flood modelling and susceptibility analysis. Overall, the results demonstrate that KNN-Manhattan and KNN-Minkowski outperformed KNN-Euclidean and KNN-Chebyshev in flood modelling and susceptibility analysis. These models exhibited higher AUC values, lower MAE and RMSE values, and better overall performance. Therefore, KNN-Manhattan and KNN-Minkowski are preferred for accurately predicting flood-prone areas and assessing their susceptibility.

Sensitivity analysis examines the impact of variations in model inputs on the corresponding model outputs. Assessing the significance of each influential criterion helps ascertain whether their inclusion or exclusion is essential (Razavi-Termeh, Sadeghi-Niaraki, and Choi Citation2021) (Equation 11). (11) $RD = \frac{AU C_{all} - AU C_{i}}{AU C_{all}} * 100$ (11) RD represents the relative decrease index, AUC_all signifies the overall AUC value considering all parameters, and AUC_i represents the AUC value when parameter i is excluded. For this purpose, the model with the best accuracy (KNN-Manhattan) was used for sensitivity analysis. The sensitivity analysis results in using the RD index indicate that the rainfall and NDVI criteria hold the highest significance in the modelling process. In particular, including these two criteria leads to a respective increase in modelling accuracy by 3.72% and 3.08%. Conversely, the Aspect criterion is identified as the least critical, resulting in a marginal decrease in modelling accuracy by −0.21%. The results of the sensitivity analysis are consistent with the results of determining the importance of the criteria with the PSO algorithm.

Table 12. Results of sensitivity analysis.

Download CSV Display Table

4. Discussion

The results obtained from this study provide valuable insights into the spatial modelling of flood-prone areas and determination of the optimal distance metric in the KNN algorithm. First, the feature selection process using the PSO algorithm plays a crucial role in identifying the most influential factors for flood modelling. The results indicate that altitude, rainfall, NDVI, land cover, and lithology are consistently important in determining flood susceptibility. These findings align with the existing literature, highlighting the significance of terrain characteristics, rainfall patterns, and land cover in flood dynamics (Askar et al. Citation2022; Bui et al. Citation2020; Rahmati and Pourghasemi Citation2017). Regarding the importance of factors, the findings of this study indicate that altitude, rainfall, NDVI, land cover, and lithology influence flood modelling. The importance of altitude can be attributed to its direct relationship with the topography of the study area. Altitude plays a critical role in determining water flow, as areas at lower elevations are more susceptible to flooding (Narimani et al. Citation2021). Similarly, rainfall is crucial, as it directly contributes to the volume and intensity of the water entering the system. Higher rainfall increases the likelihood of flooding, making it a significant predictor in flood susceptibility modelling (Mind’je et al. Citation2019). The influence of the NDVI can be attributed to its ability to reflect vegetation health and density (Stanton et al. Citation2017). Vegetation is vital for regulating water absorption, reducing runoff, and stabilizing soil. Areas with higher NDVI values indicate denser vegetation cover, which can help mitigate the risk of flooding by enhancing water retention and infiltration (Saha et al. Citation2021). Land cover and lithology are also important factors, as they provide information about the type and characteristics of surface and subsurface materials. Different land cover types, such as water bodies or agricultural fields, have varying capacities to absorb or repel water (Sugianto et al. Citation2022). Similarly, lithology affects the permeability and porosity of the soil, influencing water infiltration rates and groundwater flow (Allafta, Opp, and Patra Citation2020).

This study compared different distance metrics in the KNN algorithm to determine their impact on flood modelling and susceptibility mapping. Four distance metrics were evaluated: Manhattan, Minkowski, Euclidean, and Chebyshev metrics. Among these metrics, KNN-Manhattan and KNN-Minkowski demonstrated superior performance compared with KNN-Euclidean and KNN-Chebyshev. The reasons for their superiority can be attributed to the specific characteristics of the distance metrics and their suitability for a given problem. The KNN-Manhattan distance calculates the distance between two points by summing the absolute differences between their coordinates. It is well suited for cases where the data exhibit high variability and different scales across features (Kumbure and Luukka Citation2022). In flood modelling, other criteria such as altitude, rainfall, and land cover can have varying scales and ranges. The Manhattan distance effectively captures these differences and provides accurate measurements, thus improving the modelling performance (Zainal Citation2021). Similarly, the KNN-Minkowski distance is a generalized distance metric that includes both the Manhattan and Euclidean distances as exceptional cases. This allows for flexibility in adjusting the parameter, which controls the shape of the distance calculation (Sperti Citation2019). By adapting to the characteristics of the data, the KNN-Minkowski distance can handle different feature scales and accurately measure distances, resulting in effective flood modelling (Xie Citation2018). On the other hand, the KNN-Euclidean and KNN-Chebyshev distances showed relatively lower performance. The KNN-Euclidean distance calculates the straight-line distance between two points, assuming that all the features contribute equally (Khanna and Singh Citation2017). However, this assumption may not hold in flood susceptibility analysis, where some criteria may have a higher relevance than others. This can lead to suboptimal distance calculations and less accurate predictions. The KNN-Chebyshev distance calculates the maximum difference between coordinates, considering only the most significant feature discrepancy (Dissanayake et al. Citation2022). Although this metric may be suitable for specific applications, it may only partially capture the complexities of flood modelling, where multiple factors interact to determine flood susceptibility. Overall, the superiority of the KNN-Manhattan and KNN-Minkowski distances in flood modelling can be attributed to their ability to handle varying feature scales, adapt to the characteristics of the data, and consider the relative importance of different criteria. In flood susceptibility analysis, these distance metrics offer better accuracy and performance than the KNN-Euclidean and KNN-Chebyshev distances. This suggests that the Manhattan distance metric is more suitable for capturing spatial relationships between flood-prone areas and their influencing factors.

The research findings presented a significant breakthrough in flood spatial modelling, demonstrating an impressive accuracy of 94% by implementing the KNN-Manhattan algorithm. Compared with several pertinent studies in the field, this approach consistently outperforms various KNN-based methodologies, affirming its superiority in FSM. The following comparative analysis elucidates the notable advancements and contributions of this study: Shahabi et al. (Citation2020) explored flood susceptibility modelling using Coarse-KNN (79.5%), Weighted-KNN (71.9%), Cosine-KNN (69.2%), and Cubic-KNN (66.2%). In a study by Prasad et al. (Citation2022), an accuracy of 83.88% was achieved in flood susceptibility modelling using the KNN algorithm. Saleh et al. (Citation2022) employed a combination of statistical index (SI) with the KNN algorithm, achieving an accuracy of 92.9% for FSM. While Saleh et al.’s approach achieved commendable accuracy, the KNN-Manhattan algorithm in this study maintained a superior level of precision. Meliho, Khattabi, and Asinyo (Citation2021) attained an exceptional accuracy of 98.6% in flood spatial modelling using the KNN algorithm. Although Meliho et al. demonstrated remarkable accuracy, the KNN-Manhattan algorithm in the current study remains competitive, achieving an accuracy of 94%. Madhuri, Sistla, and Srinivasa Raju (Citation2021) achieved 77% accuracy for flood susceptibility modelling with the KNN algorithm. Costache et al. (Citation2020) used a combination of the KNN algorithm and the analytical hierarchy process (AHP) method to achieve 90.1% accuracy in preparing a flood susceptibility map. In a study by R. Al-Aizari et al. (Citation2022), flood spatial modelling with the KNN algorithm achieved an accuracy of 92.8%. This collective evidence underscores the robustness and efficacy of the KNN-Manhattan algorithm in flood spatial modelling, positioning it as a promising and high-precision methodology.

The findings of this study significantly contribute to FSM by enhancing the spatial modelling precision, promoting multidimensional feature selection, and showcasing the versatility of machine learning algorithms. The research's meticulous validation process and sensitivity analyses further strengthened the robustness of the results, setting a benchmark for future studies. The significance of the findings lies in their potential applications in real-world scenarios, including informed land-use planning, proactive emergency response, resilient infrastructure development, climate change adaptation, and community awareness. The findings of this study have significant practical implications for decision-making, planning, and mitigation strategies in the context of flood susceptibility. The refined flood models offer a nuanced understanding of spatial dynamics, informing decision makers in land use planning regarding resource allocation and resilient infrastructure development. Targeted mitigation strategies can be devised based on the identified influential factors to optimize the efficiency of interventions. These models also play a crucial role in emergency response planning, enabling timely and well-informed actions during extreme weather events. Resilient infrastructure development benefits from these insights, and ensuring structures are designed to withstand and mitigate flooding impacts.

Furthermore, these models contribute to climate change adaptation by anticipating how environmental shifts may influence flood dynamics. Flood susceptibility maps foster awareness and preparedness at the community level, empowering residents to make informed decisions and participate in proactive disaster preparedness.

The findings of this study have several implications for flood modelling and susceptibility assessments. By identifying the key factors influencing flood susceptibility, such as altitude, rainfall, NDVI, and land cover, it is possible to prioritize and allocate resources effectively for flood risk management and mitigation measures. The optimized KNN algorithm with the Manhattan or Minkowski distance metric can be utilized to accurately predict flood-prone areas, aiding decision-making processes related to land use planning, emergency response, and infrastructure development. Furthermore, the use of the PSO algorithm for feature selection and hyper parameter optimization demonstrated its effectiveness in improving the performance of the flood susceptibility model. This highlights the potential of metaheuristic algorithms to enhance the accuracy and efficiency of spatial modelling techniques. The present study acknowledges the limitations that could be addressed in future research to improve the scientific understanding of flood-prone area modelling. First, it is essential to note that a direct comparison between the results obtained from the spatial modelling approach employed in this research and more complex flood dynamic models, such as hydrodynamic or hydraulic models, was not conducted. Our spatial modelling approach offers notable advantages, particularly in terms of computational efficiency. Unlike hydrodynamic and hydraulic models, our approach is computationally efficient, making it well suited for large-scale FSM (Razavi-Termeh et al. Citation2023). Its ease of implementation is another advantage, as the simplicity of the spatial modelling approach renders it accessible, even in data-scarce environments where specialized expertise may be limited. Interpretability is a significant strength of the spatial modelling approach as it provides clear insights into the relative importance of features and spatial patterns influencing flood susceptibility (Kumar et al. Citation2023). This contrasts with the more complex outputs of hydrodynamic and hydraulic models, which can be challenging to interpret and may require specialized knowledge. However, our spatial modelling approach has certain limitations. It provides a simplified representation of flood susceptibility by focusing on static features and spatial relationships, potentially missing nuanced details of flood mechanisms. Additionally, its applicability may be constrained when addressing highly localized or site-specific scenarios (Reed et al. Citation2022).

Hydrodynamic and hydraulic models, on the other hand, offer a more detailed representation by explicitly simulating the physical processes of flooding, including flow dynamics, terrain characteristics, and hydraulic properties (Saksena, Merwade, and Singhofen Citation2019). These models capture fine-scale details, making them more suitable for site-specific flood risk assessment. The choice of a spatial and dynamic model approach depends on specific objectives, available data, and computational resources. Our spatial modelling approach is valuable for large-scale FSM, for which simplicity, interpretability, and computational efficiency are paramount. However, dynamic models remain indispensable for detailed assessments, requiring a comprehensive understanding of physical processes and accurate representation of local conditions (Liu, Merwade, and Jafarzadegan Citation2019).

While our spatial modelling approach holds promise for FSM, it is essential to acknowledge a notable limitation regarding excluding explicit temporal dynamics in the current methodology. The model primarily focuses on static features and spatial relationships, overlooking crucial temporal aspects such as flood duration, intensity, and recurrence intervals. Our approach's absence of temporal considerations limits its ability to capture the evolving nature of flood events fully. Floods are inherently dynamic phenomena, and their characteristics, such as duration and intensity, play a significant role in shaping their impact on vulnerable areas. Our model's reliance on static features might result in a simplified representation that does not fully encapsulate the nuanced temporal variations in flood susceptibility.

The exclusion of temporal dynamics also restricts the model's adaptability to changing conditions and may overlook critical patterns that unfold over specific timeframes. Flood events exhibit varying durations, intensities, and recurrence intervals, and a model that accounts for these temporal dimensions would likely provide a more accurate and contextually relevant representation of flood susceptibility. It is crucial to consider various environmental and socioeconomic factors to enhance our understanding of flood dynamics. Soil erosion, composition, urbanization, land use change, and climate change indicators significantly affect flood vulnerability. Socioeconomic variables influence vulnerability and resilience, including population density, economic activities, and infrastructure development. The characteristics of river and drainage networks contribute to the hydrological context, while climate change indicators add adaptability to flood models. Additionally, socio-demographic vulnerability indices, considering factors like age, income, and education, provide insights into community resilience, enriching the understanding of how flooding affects different demographics.

Regarding the employed algorithmic approach, the study primarily utilizes the KNN algorithm for flood-prone area modelling. However, the research does not compare the performance of the KNN algorithm with other machine-learning approaches. Conducting such a comparison would offer insights into the relative strengths and weaknesses of different algorithms in the context of flood-prone area modelling. Various machine learning algorithms can be used to improve the accuracy of flood susceptibility modelling. Ensemble methods, such as stacking or bagging, enable combining predictions from multiple models, enhancing overall accuracy and robustness (Razavi-Termeh et al. Citation2023). Also, improving machine learning algorithms by different metaheuristic algorithms can help to improve the accuracy of these algorithms. Combining machine learning algorithms with deep learning can also help improve the accuracy of flood modelling by taking advantage of both algorithms. This diversity in machine learning algorithms allows for an adaptable and nuanced approach to flood modelling, addressing various data characteristics and complexities.

The choice of distance metric in the KNN algorithm plays a pivotal role in shaping its performance when predicting flood-prone areas. Understanding the impact of these metrics is crucial for optimizing decision-making processes in various domains, including land use planning, emergency response, and infrastructure development. Future research can use Mahalanobis, Canberra, and Cosine similarity distances in the KNN algorithm. Mahalanobis distance adjusts for correlations between variables and accounts for the shape of the data distribution. This metric is valuable when variables are correlated, and considering the distribution shape is crucial for accurate predictions (Leys et al. Citation2018). Cosine similarity measures the cosine of the angle between two vectors, indicating the similarity in direction rather than magnitude. It is beneficial when the magnitude of variables is less important than their orientation, making it suitable for sparse datasets (Xia, Zhang, and Li Citation2015). Canberra distance, which adjusts for differences in magnitude and is less sensitive to outliers, becomes relevant in scenarios where variable magnitudes differ significantly. Its application can enhance robustness, especially in the presence of extreme values (Faisal and Zamzami Citation2020). By incorporating these varied distance metrics into the KNN algorithm, decision-makers gain a more nuanced understanding of spatial relationships, allowing for improved identification of flood-prone areas.

5. Conclusion

This study focused on spatial modelling of flood-prone areas by determining the optimal distance metric in the KNN algorithm. Using feature selection with the PSO algorithm, the most influential factors affecting floods were identified, including altitude, rainfall, NDVI, land cover, and lithology. The KNN algorithm was applied using four distance metrics: Manhattan, Minkowski, Euclidean, and Chebyshev. The comparison of different distance metrics in the KNN algorithm revealed that KNN-Manhattan and KNN-Minkowski distances outperformed KNN-Euclidean and KNN-Chebyshev distances in flood modelling. The Manhattan and Minkowski distances demonstrated better accuracy and suitability for capturing the varying scales and complexities of the flood-related criteria, achieving the lowest MAE and RMSE values and the highest R2. The ROC curve and AUC index were utilized to validate the susceptibility maps, demonstrating the models’ overall good performance with AUC values ranging from 0.924 to 0.940. These results confirm the reliability and accuracy of the models in predicting flood susceptibility. Overall, this study contributes to flood risk assessment by providing insights into the spatial modelling of flood-prone areas. The findings highlight the importance of feature selection, the choice of distance metric in the KNN algorithm, and the significance of influential factors. These results can support effective decision-making, planning, and mitigation strategies to reduce the impact of floods and enhance resilience in vulnerable areas. Future research can further explore other machine learning algorithms and additional factors to improve flood modelling accuracy and expand the understanding of flood dynamics.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

All the derived data based on these original data sources are available from the corresponding author upon reasonable request.

Additional information

Funding

This work was supported in part by the ITRC Support Program [grant number IITP-2023-RS-2022-00156354] and in part by the Metaverse Support Program to Nurture the Best Talents [grant number IITP-2023-RS-2023-00254529] funded by the Ministry of Science and ICT of Korea and the Institute of Information and Communications Technology Planning and Evaluation (IITP) and in part by the Ministry of Trade, Industry and Energy and Korea Institute for Advancement of Technology [grant number P0016038].

References

Abualigah, Laith Mohammad, Ahamad Tajudin Khader, and Essam Said Hanandeh. 2018. “A New Feature Selection Method to Improve the Document Clustering Using Particle Swarm Optimization Algorithm.” Journal of Computational Science 25: 456–466. https://doi.org/10.1016/j.jocs.2017.07.018
Web of Science ®Google Scholar
Al-Aizari, Ali R., Yousef A. Al-Masnay, Ali Aydda, Jiquan Zhang, Kashif Ullah, Abu Reza Md Towfiqul Islam, Tayyiba Habib, et al. 2022. “Assessment Analysis of Flood Susceptibility in Tropical Desert Area: A Case Study of Yemen.” Remote Sensing 14 (16): 4050. https://doi.org/10.3390/rs14164050
Web of Science ®Google Scholar
Al-Areeq, Ahmed M., S. I. Abba, Mohamed A. Yassin, Mohammed Benaafi, Mustafa Ghaleb, and Isam H. Aljundi. 2022. “Computational Machine Learning Approach for Flood Susceptibility Assessment Integrated with Remote Sensing and GIS Techniques from Jeddah, Saudi Arabia.” Remote Sensing 14 (21): 5515. https://doi.org/10.3390/rs14215515
Web of Science ®Google Scholar
AL-Hussein, Asaad A. M., Shuhab Khan, Kaouther Ncibi, Noureddine Hamdi, and Younes Hamed. 2022. “Flood Analysis Using HEC-RAS and HEC-HMS: A Case Study of Khazir River (Middle East—Northern Iraq).” Water 14 (22): 3779. https://doi.org/10.3390/w14223779
Web of Science ®Google Scholar
Alfeilat, Abu, Haneen Arafat, Ahmad B. A. Hassanat, Omar Lasassmeh, Ahmad S. Tarawneh, Mahmoud Bashir Alhasanat, Hamzeh S. Eyal Salman, and V. B. Surya Prasath. 2019. “Effects of Distance Measure Choice on k-Nearest Neighbor Classifier Performance: A Review.” Big Data 7 (4): 221–248. https://doi.org/10.1089/big.2018.0175
Web of Science ®Google Scholar
Allafta, Hadi, Christian Opp, and Suman Patra. 2020. “Identification of Groundwater Potential Zones Using Remote Sensing and GIS Techniques: A Case Study of the Shatt Al-Arab Basin.” Remote Sensing 13 (1): 112. https://doi.org/10.3390/rs13010112
Web of Science ®Google Scholar
Andaryani, Soghra, Vahid Nourani, Ali Torabi Haghighi, and Saskia Keesstra. 2021. “Integration of Hard and Soft Supervised Machine Learning for Flood Susceptibility Mapping.” Journal of Environmental Management 291: 112731. https://doi.org/10.1016/j.jenvman.2021.112731
PubMed Web of Science ®Google Scholar
Askar, Shavan, Sajjad Zeraat Peyma, Mohanad Mohsen Yousef, Natalia Alekseevna Prodanova, Iskandar Muda, Mohamed Elsahabi, and Javad Hatamiafkoueieh. 2022. “Flood Susceptibility Mapping Using Remote Sensing and Integration of Decision Table Classifier and Metaheuristic Algorithms.” Water 14 (19): 3062. https://doi.org/10.3390/w14193062
Web of Science ®Google Scholar
Augustine, Ijigah Edoka, and Akinyemi Tobi Akinlolu. 2015. “Flood Disaster: An Empirical Survey of Causative Factors and Preventive Measures in Kaduna, Nigeria.” International Journal of Environment and Pollution Research 3 (3): 53–66.
Google Scholar
Bates, Paul D, Niall Quinn, Christopher Sampson, Andrew Smith, Oliver Wing, Jeison Sosa, James Savage, Gaia Olcese, Jeff Neal, and Guy Schumann. 2021. “Combined Modeling of US Fluvial, Pluvial, and Coastal Flood Hazard Under Current and Future Climates.” Water Resources Research 57 (2): e2020WR028673. https://doi.org/10.1029/2020WR028673
Web of Science ®Google Scholar
Bui, Quang-Thanh, Quoc-Huy Nguyen, Xuan Linh Nguyen, Vu Dong Pham, Huu Duy Nguyen, and Van-Manh Pham. 2020. “Verification of Novel Integrations of Swarm Intelligence Algorithms Into Deep Learning Neural Network for Flood Susceptibility Mapping.” Journal of Hydrology 581: 124379. https://doi.org/10.1016/j.jhydrol.2019.124379
Web of Science ®Google Scholar
Chowdhuri, Indrajit, Subodh Chandra Pal, and Rabin Chakrabortty. 2020. “Flood Susceptibility Mapping by Ensemble Evidential Belief Function and Binomial Logistic Regression Model on River Basin of Eastern India.” Advances in Space Research 65 (5): 1466–1489. https://doi.org/10.1016/j.asr.2019.12.003
Web of Science ®Google Scholar
Costache, Romulus, Quoc Bao Pham, Ehsan Sharifi, Nguyen Thi Thuy Linh, Sani Isah Abba, Matej Vojtek, Jana Vojteková, Pham Thi Thao Nhi, and Dao Nguyen Khoi. 2020. “Flash-flood Susceptibility Assessment Using Multi-Criteria Decision Making and Machine Learning Supported by Remote Sensing and GIS Techniques.” Remote Sensing 12 (1): 106. https://doi.org/10.3390/rs12010106
Web of Science ®Google Scholar
Cover, Thomas, and Peter Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13 (1): 21–27. https://doi.org/10.1109/TIT.1967.1053964
Web of Science ®Google Scholar
Cunningham, Padraig, and Sarah Jane Delany. 2021. “k-Nearest Neighbour Classifiers – A Tutorial.” ACM Computing Surveys 54 (6): 1–25. https://doi.org/10.1145/3459665
Web of Science ®Google Scholar
DeVries, Ben, Chengquan Huang, John Armston, Wenli Huang, John W. Jones, and Megan W. Lang. 2020. “Rapid and Robust Monitoring of Flood Events Using Sentinel-1 and Landsat Data on the Google Earth Engine.” Remote Sensing of Environment 240: 111664. https://doi.org/10.1016/j.rse.2020.111664
Web of Science ®Google Scholar
Dissanayake, Sayuru, Sankha Gunathunga, Dimalka Jayanetti, Kavindu Perera, Chethana Liyanapathirana, and Lakmal Rupasinghe. 2022. “An Analysis on Different Distance Measures in KNN with PCA for Android Malware Detection.” Paper presented at the 2022 22nd International Conference on Advances in ICT for Emerging Regions (ICTer).
Google Scholar
Disse, M., T. G. Johnson, J. Leandro, and Th Hartmann. 2020. “Exploring the Relation Between Flood Risk Management and Flood Resilience.” Water Security 9: 100059. https://doi.org/10.1016/j.wasec.2020.100059
Google Scholar
Du, Bo, Shaodong Wang, Nan Wang, Lefei Zhang, Dacheng Tao, and Lifu Zhang. 2016. “Hyperspectral Signal Unmixing Based on Constrained non-Negative Matrix Factorization Approach.” Neurocomputing 204: 153–161. https://doi.org/10.1016/j.neucom.2015.10.132
Web of Science ®Google Scholar
Dutta, Madhurima, Sunil Saha, Nur Islam Saikh, Debabrata Sarkar, and Prolay Mondal. 2023. “Application of Bivariate Approaches for Flood Susceptibility Mapping: A District Level Study in Eastern India.” HydroResearch 6: 108–121. https://doi.org/10.1016/j.hydres.2023.02.004
Google Scholar
Eberhart, Russell, and James Kennedy. 1995. “Particle Swarm Optimization.” Paper presented at the Proceedings of the IEEE International Conference on Neural Networks.
Google Scholar
Eldho, T. I., P. E. Zope, and A. T. Kulkarni. 2018. “Urban Flood Management in Coastal Regions Using Numerical Simulation and Geographic Information System.” In Integrating Disaster Science and Management, 205–219. Elsevier.
Google Scholar
Esmin, Ahmed AA, Germano Lambert-Torres, and AC Zambroni De Souza. 2005. “A Hybrid Particle Swarm Optimization Applied to Loss Power Minimization.” IEEE Transactions on Power Systems 20 (2): 859–866. https://doi.org/10.1109/TPWRS.2005.846049
Web of Science ®Google Scholar
Faisal, M., and E. M. Zamzami. 2020. “Comparative Analysis of Inter-Centroid K-Means Performance Using Euclidean Distance, Canberra Distance and Manhattan Distance.” In Journal of Physics: Conference Series, vol. 1566, no. 1, 012112. IOP Publishing. 10.1088/1742-6596/1566/1/012112.
Google Scholar
Fang, Zhice, Yi Wang, Ling Peng, and Haoyuan Hong. 2021. “Predicting Flood Susceptibility Using LSTM Neural Networks.” Journal of Hydrology 594: 125734. https://doi.org/10.1016/j.jhydrol.2020.125734
Web of Science ®Google Scholar
Farahani, Mahsa, Seyed Vahid Razavi-Termeh, and Abolghasem Sadeghi-Niaraki. 2022. “A Spatially Based Machine Learning Algorithm for Potential Mapping of the Hearing Senses in an Urban Environment.” Sustainable Cities and Society 80: 103675. https://doi.org/10.1016/j.scs.2022.103675
Web of Science ®Google Scholar
Farhangi, Farbod, Abolghasem Sadeghi-Niaraki, Seyed Vahid Razavi-Termeh, and Soo-Mi Choi. 2021. “Evaluation of Tree-Based Machine Learning Algorithms for Accident Risk Mapping Caused by Driver Lack of Alertness at a National Scale.” Sustainability 13 (18): 10239. https://doi.org/10.3390/su131810239
Web of Science ®Google Scholar
Gauhar, Noushin, Sunanda Das, and Khadiza Sarwar Moury. 2021. “Prediction of Flood in Bangladesh Using K-Nearest Neighbors Algorithm.” Paper presented at the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST).
Google Scholar
Ghorbanian, Arsalan, Mohammad Kakooei, Meisam Amani, Sahel Mahdavi, Ali Mohammadzadeh, and Mahdi Hasanlou. 2020. “Improved Land Cover Map of Iran Using Sentinel Imagery Within Google Earth Engine and a Novel Automatic Workflow for Land Cover Classification Using Migrated Training Samples.” ISPRS Journal of Photogrammetry and Remote Sensing 167: 276–288. https://doi.org/10.1016/j.isprsjprs.2020.07.013
Web of Science ®Google Scholar
Haghizadeh, Ali, Safoura Siahkamari, Amir Hamzeh Haghiabi, and Omid Rahmati. 2017. “Shear Heating by Translational Brittle Reverse Faulting Along a Single, Sharp and Straight Fault Plane.” Journal of Earth System Science 126 (1): 1–11. https://doi.org/10.1007/s12040-016-0788-5
Web of Science ®Google Scholar
Hamed, Yaman, Ahmed Ibrahim Alzahrani, Zahiraniza Mustaffa, Mokhtar Che Ismail, and Kee Kok Eng. 2020. “Two Steps Hybrid Calibration Algorithm of Support Vector Regression and K-Nearest Neighbors.” Alexandria Engineering Journal 59 (3): 1181–1190. https://doi.org/10.1016/j.aej.2020.01.033
Web of Science ®Google Scholar
Harb, Hany M, and Abeer S Desuky. 2014. “Feature Selection on Classification of Medical Datasets Based on Particle Swarm Optimization.” International Journal of Computer Applications 104 (5): 14–17.
Google Scholar
Hemmati, Mona, Bruce R Ellingwood, and Hussam N Mahmoud. 2020. “The Role of Urban Growth in Resilience of Communities Under Flood Risk.” Earth's Future 8 (3): e2019EF001382. https://doi.org/10.1029/2019EF001382
PubMed Web of Science ®Google Scholar
Ilia, Ioanna, Paraskevas Tsangaratos, Ploutarchos Tzampoglou, Wei Chen, and Haoyuan Hong. 2022. “Flash Flood Susceptibility Mapping Using Stacking Ensemble Machine Learning Models.” Geocarto International 37 (27): 15010–15036. https://doi.org/10.1080/10106049.2022.2093990
Web of Science ®Google Scholar
Isma'il, Muhammad, and Iyortim Opeluwa Saanyol. 2013. “Application of Remote Sensing (RS) and Geographic Information Systems (GIS) in Flood Vulnerability Mapping: Case Study of River Kaduna.” International Journal of Geomatics and Geosciences 3 (3): 618–627.
Google Scholar
Jain, Meetu, Vibha Saihjpal, Narinder Singh, and Satya Bir Singh. 2022. “An Overview of Variants and Advancements of PSO Algorithm.” Applied Sciences 12 (17): 8392. https://doi.org/10.3390/app12178392
Google Scholar
Janizadeh, Saeid, Mohammadtaghi Avand, Abolfazl Jaafari, Tran Van Phong, Mahmoud Bayat, Ebrahim Ahmadisharaf, Indra Prakash, Binh Thai Pham, and Saro Lee. 2019. “Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran.” Sustainability 11 (19): 5426. https://doi.org/10.3390/su11195426
Web of Science ®Google Scholar
Khanna, Upasna, and Prabhdeep Singh. 2017. “Hybrid Approch of KNN+ Euclidean Distance to Detect Intrusion Within Cloud Based Systems.” International Research Journal of Advanced Engineering and Science 2 (3): 7–11.
Google Scholar
Khosravi, Khabat, Hamid Reza Pourghasemi, Kamran Chapi, and Masoumeh Bahri. 2016. “Landscape Ecological Security Response to Land Use Change in the Tidal Flat Reclamation Zone, China.” Environmental Monitoring and Assessment 188 (1): 1–21. https://doi.org/10.1007/s10661-015-4999-z
Web of Science ®Google Scholar
Kumar, Vijendra, Kul Vaibhav Sharma, Tommaso Caloiero, Darshan J. Mehta, and Karan Singh. 2023. “Comprehensive Overview of Flood Modeling Approaches: A Review of Recent Advances.” Hydrology 10 (7): 141. https://doi.org/10.3390/hydrology10070141
Web of Science ®Google Scholar
Kumbure, Mahinda Mailagaha, and Pasi Luukka. 2022. “A Generalized Fuzzy k-Nearest Neighbor Regression Model Based on Minkowski Distance.” Granular Computing 7 (3): 657–671. https://doi.org/10.1007/s41066-021-00288-w
Google Scholar
Kundzewicz, Zbigniew W., Shinjiro Kanae, Sonia I. Seneviratne, John Handmer, Neville Nicholls, Pascal Peduzzi, Reinhard Mechler, Laurens M. Bouwer, Nigel Arnell, and Katharine Mach. 2014. “Flood Risk and Climate Change: Global and Regional Perspectives.” Hydrological Sciences Journal 59 (1): 1–28. https://doi.org/10.1080/02626667.2013.857411
Web of Science ®Google Scholar
Lee, Ickjai, and Christopher Torpelund-Bruin. 2012. “Geographic Knowledge Discovery from Web Map Segmentation Through Generalized Voronoi Diagrams.” Expert Systems with Applications 39 (10): 9376–9388. https://doi.org/10.1016/j.eswa.2012.02.129
Web of Science ®Google Scholar
Leys, Christophe, Olivier Klein, Yves Dominicy, and Christophe Ley. 2018. “Detecting Multivariate Outliers: Use a Robust Variant of the Mahalanobis Distance.” Journal of Experimental Social Psychology 74: 150–156. https://doi.org/10.1016/j.jesp.2017.09.011
Web of Science ®Google Scholar
Lin, Qiaoying, Bingqing Lin, Dejian Zhang, and Jiefeng Wu. 2022. “Web-based Prototype System for Flood Simulation and Forecasting Based on the HEC-HMS Model.” Environmental Modelling & Software 158: 105541. https://doi.org/10.1016/j.envsoft.2022.105541
Web of Science ®Google Scholar
Liu, Zhu, Venkatesh Merwade, and Keighobad Jafarzadegan. 2019. “Investigating the Role of Model Structure and Surface Roughness in Generating Flood Inundation Extents Using One- and Two-Dimensional Hydraulic Models.” Journal of Flood Risk Management 12 (1): e12347. https://doi.org/10.1111/jfr3.12347
Web of Science ®Google Scholar
Madhuri, R., S. Sistla, and K. Srinivasa Raju. 2021. “Application of Machine Learning Algorithms for Flood Susceptibility Assessment and Risk Management.” Journal of Water and Climate Change 12 (6): 2608–2623. https://doi.org/10.2166/wcc.2021.051
Web of Science ®Google Scholar
Manzoor, Zaira, Muhsan Ehsan, Muhammad Bashir Khan, Aqsa Manzoor, Malik Muhammad Akhter, Muhammad Tayyab Sohail, Asrar Hussain, Ahsan Shafi, Tamer Abu-Alam, and Mohamed Abioui. 2022. “Floods and Flood Management and its Socio-Economic Impact on Pakistan: A Review of the Empirical Literature.” Frontiers in Environmental Science 10: 2480.
Web of Science ®Google Scholar
Marini, Federico, and Beata Walczak. 2015. “Particle Swarm Optimization (PSO). A Tutorial.” Chemometrics and Intelligent Laboratory Systems 149: 153–165. https://doi.org/10.1016/j.chemolab.2015.08.020
Web of Science ®Google Scholar
Masroor, Md, Seyed Vahid Razavi-Termeh, Md Hibjur Rahaman, Pandurang Choudhari, Luc Cimusa Kulimushi, and Haroon Sajjad. 2023. “Adaptive Neuro Fuzzy Inference System (ANFIS) Machine Learning Algorithm for Assessing Environmental and Socio-Economic Vulnerability to Drought: A Study in Godavari Middle Sub-Basin, India.” Stochastic Environmental Research and Risk Assessment 37 (1): 233–259. https://doi.org/10.1007/s00477-022-02292-1
Web of Science ®Google Scholar
Mehravar, Soroosh, Seyed Vahid Razavi-Termeh, Armin Moghimi, Babak Ranjgar, Fatemeh Foroughnia, and Meisam Amani. 2023. “Flood Susceptibility Mapping Using Multi-Temporal SAR Imagery and Novel Integration of Nature-Inspired Algorithms Into Support Vector Regression.” Journal of Hydrology 617: 129100. https://doi.org/10.1016/j.jhydrol.2023.129100
Web of Science ®Google Scholar
Meliho, Modeste, Abdellatif Khattabi, and Joseph Asinyo. 2021. “Spatial Modeling of Flood Susceptibility Using Machine Learning Algorithms.” Arabian Journal of Geosciences 14 (21): 2243. https://doi.org/10.1007/s12517-021-08610-1
Google Scholar
Merkuryeva, Galina, Yuri Merkuryev, Boris V Sokolov, Semyon Potryasaev, Viacheslav A Zelentsov, and Arnis Lektauers. 2015. “Advanced River Flood Monitoring, Modelling and Forecasting.” Journal of Computational Science 10: 77–85. https://doi.org/10.1016/j.jocs.2014.10.004
Web of Science ®Google Scholar
Mind’je, Richard, Lanhai Li, Amobichukwu Chukwudi Amanambu, Lamek Nahayo, Jean Baptiste Nsengiyumva, Aboubakar Gasirabo, and Mapendo Mindje. 2019. “Flood Susceptibility Modeling and Hazard Perception in Rwanda.” International Journal of Disaster Risk Reduction 38: 101211. https://doi.org/10.1016/j.ijdrr.2019.101211
Web of Science ®Google Scholar
Modarres, Reza, Ali Sarhadi, and Donald H Burn. 2016. “Changes of Extreme Drought and Flood Events in Iran.” Global and Planetary Change 144: 67–81. https://doi.org/10.1016/j.gloplacha.2016.07.008
Web of Science ®Google Scholar
Mojaddadi, Hossein, Biswajeet Pradhan, Haleh Nampak, Noordin Ahmad, and Abdul Halim bin Ghazali. 2017. “Ensemble Machine-Learning-Based Geospatial Approach for Flood Risk Assessment Using Multi-Sensor Remote-Sensing Data and GIS.” Geomatics, Natural Hazards and Risk 8 (2): 1080–1102. https://doi.org/10.1080/19475705.2017.1294113
Web of Science ®Google Scholar
Nachappa, Thimmaiah Gudiyangada, Sepideh Tavakkoli Piralilou, Khalil Gholamnia, Omid Ghorbanzadeh, Omid Rahmati, and Thomas Blaschke. 2020. “Flood Susceptibility Mapping with Machine Learning, Multi-Criteria Decision Analysis and Ensemble Using Dempster Shafer Theory.” Journal of Hydrology 590: 125275. https://doi.org/10.1016/j.jhydrol.2020.125275
Web of Science ®Google Scholar
Narimani, Roya, Changhyun Jun, Saqib Shahzad, Jeill Oh, and Kyoohong Park. 2021. “Application of a Novel Hybrid Method for Flood Susceptibility Mapping with Satellite Images: A Case Study of Seoul, Korea.” Remote Sensing 13 (14): 2786. https://doi.org/10.3390/rs13142786
Web of Science ®Google Scholar
Pandya, D. H., Sanjay H Upadhyay, and Suraj Prakash Harsha. 2013. “Fault Diagnosis of Rolling Element Bearing with Intrinsic Mode Function of Acoustic Emission Data Using APF-KNN.” Expert Systems with Applications 40 (10): 4137–4145. https://doi.org/10.1016/j.eswa.2013.01.033
Web of Science ®Google Scholar
Pham, Binh Thai, Chinh Luu, Tran Van Phong, Phan Trong Trinh, Ataollah Shirzadi, Somayeh Renoud, Shahrokh Asadi, Hiep Van Le, Jason von Meding, and John J. Clague. 2021. “Can Deep Learning Algorithms Outperform Benchmark Machine Learning Algorithms in Flood Susceptibility Modeling?.” Journal of Hydrology 592: 125615. https://doi.org/10.1016/j.jhydrol.2020.125615
Web of Science ®Google Scholar
Pourghasemi, Hamid Reza, Mahdis Amiri, Mohsen Edalat, Amir Hossein Ahrari, Mahdi Panahi, Nitheshnirmal Sadhasivam, and Saro Lee. 2020a. “Assessment of Urban Infrastructures Exposed to Flood Using Susceptibility Map and Google Earth Engine.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 1923–1937. https://doi.org/10.1109/JSTARS.2020.3045278
Web of Science ®Google Scholar
Pourghasemi, Hamid Reza, Seyed Vahid Razavi-Termeh, Narges Kariminejad, Haoyuan Hong, and Wei Chen. 2020b. “An Assessment of Metaheuristic Approaches for Flood Assessment.” Journal of Hydrology 582: 124536. https://doi.org/10.1016/j.jhydrol.2019.124536
Web of Science ®Google Scholar
Prasad, Pankaj, Victor Joseph Loveson, Bappa Das, and Mahender Kotha. 2022. “Novel Ensemble Machine Learning Models in Flood Susceptibility Mapping.” Geocarto International 37 (16): 4571–4593. https://doi.org/10.1080/10106049.2021.1892209
Web of Science ®Google Scholar
Rahmati, Omid, and Hamid Reza Pourghasemi. 2017. “Identification of Critical Flood Prone Areas in Data-Scarce and Ungauged Regions: A Comparison of Three Data Mining Models.” Water Resources Management 31 (5): 1473–1487. https://doi.org/10.1007/s11269-017-1589-6
Web of Science ®Google Scholar
Ramly, Salwa, and Wardah Tahir. 2016. “Application of HEC-GeoHMS and HEC-HMS as Rainfall–Runoff Model for Flood Simulation.” Paper presented at the ISFRAM 2015: Proceedings of the International Symposium on Flood Research and Management 2015.
Google Scholar
Razavi-Termeh, Seyed Vahid, Abolghasem Sadeghi-Niaraki, and Soo-Mi Choi. 2021a. “Spatial Modeling of Asthma-Prone Areas Using Remote Sensing and Ensemble Machine Learning Algorithms.” Remote Sensing 13 (16): 3222. https://doi.org/10.3390/rs13163222
Web of Science ®Google Scholar
Razavi-Termeh, Seyed Vahid, Abolghasem Sadeghi-Niaraki, and Soo-Mi Choi. 2021. “Asthma-prone Areas Modeling Using a Machine Learning Model.” Scientific Reports 11 (1): 1912. https://doi.org/10.1038/s41598-021-81147-1
Google Scholar
Razavi-Termeh, Seyed Vahid, Abolghasem Sadeghi-Niaraki, Farbod Farhangi, and Soo-Mi Choi. 2021b. “Covid-19 Risk Mapping with Considering Socio-Economic Criteria Using Machine Learning Algorithms.” International Journal of Environmental Research and Public Health 18 (18): 9657. https://doi.org/10.3390/ijerph18189657
Web of Science ®Google Scholar
Razavi-Termeh, Seyed Vahid, Abolghasem Sadeghi-Niaraki, MyoungBae Seo, and Soo-Mi Choi. 2023. “Application of Genetic Algorithm in Optimization Parallel Ensemble-Based Machine Learning Algorithms to Flood Susceptibility Mapping Using Radar Satellite Imagery.” Science of the Total Environment 873: 162285. https://doi.org/10.1016/j.scitotenv.2023.162285
Web of Science ®Google Scholar
Reed, Connor, Weston Anderson, Andrew Kruczkiewicz, Jennifer Nakamura, Dominy Gallo, Richard Seager, and Sonali Shukla McDermid. 2022. “The Impact of Flooding on Food Security Across Africa.” Proceedings of the National Academy of Sciences 119 (43): e2119399119. https://doi.org/10.1073/pnas.2119399119
PubMed Web of Science ®Google Scholar
Rezaie, Fatemeh, Sayed M. Bateni, Essam Heggy, and Saro Lee. 2021. “Utilizing the sar, gis, and Novel Hybrid Metaheuristic-Gmdh Algorithm for Flood Susceptibility Mapping.” Paper presented at the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS.
Google Scholar
Rosser, Julian F., Didier G. Leibovici, and Margaret J. Jackson. 2017. “Rapid Flood Inundation Mapping Using Social Media, Remote Sensing and Topographic Data.” Natural Hazards 87 (1): 103–120. https://doi.org/10.1007/s11069-017-2755-0
Web of Science ®Google Scholar
Saha, Asish, Subodh Chandra Pal, Alireza Arabameri, Thomas Blaschke, Somayeh Panahi, Indrajit Chowdhuri, Rabin Chakrabortty, Romulus Costache, and Aman Arora. 2021. “Flood Susceptibility Assessment Using Novel Ensemble of Hyperpipes and Support Vector Regression Algorithms.” Water 13 (2): 241. https://doi.org/10.3390/w13020241
Web of Science ®Google Scholar
Saksena, Siddharth, Venkatesh Merwade, and Peter J. Singhofen. 2019. “Flood Inundation Modeling and Mapping by Integrating Surface and Subsurface Hydrology with River Hydrodynamics.” Journal of Hydrology 575: 1155–1177. https://doi.org/10.1016/j.jhydrol.2019.06.024
Web of Science ®Google Scholar
Saleh, Azlan, Ali Yuzir, Nuridah Sabtu, Sohaib K. M. Abujayyab, Mudashiru Rofiat Bunmi, and Quoc Bao Pham. 2022. “Flash Flood Susceptibility Mapping in Urban Area Using Genetic Algorithm and Ensemble Method.” Geocarto International 37 (25): 10199–10228. https://doi.org/10.1080/10106049.2022.2032394
Web of Science ®Google Scholar
Samanta, Ratan Kumar, Gouri Sankar Bhunia, Pravat Kumar Shit, and Hamid Reza Pourghasemi. 2018. “Flood Susceptibility Mapping Using Geospatial Frequency Ratio Technique: A Case Study of Subarnarekha River Basin, India.” Modelling Earth Systems and Environment 4 (1): 395–408. https://doi.org/10.1007/s40808-018-0427-z
Web of Science ®Google Scholar
Saravanan, Subbarayan, Devanantham Abijith, Nagireddy Masthan Reddy, K. S. S. Parthasarathy, Niraimathi Janardhanam, Subbarayan Sathiyamurthi, and Vivek Sivakumar. 2023. “Flood Susceptibility Mapping Using Machine Learning Boosting Algorithms Techniques in Idukki District of Kerala India.” Urban Climate 49: 101503. https://doi.org/10.1016/j.uclim.2023.101503
Web of Science ®Google Scholar
Schmidt, Lennart, Falk Heße, Sabine Attinger, and Rohini Kumar. 2020. “Challenges in Applying Machine Learning Models for Hydrological Inference: A Case Study for Flooding Events Across Germany.” Water Resources Research 56 (5): e2019WR025924. https://doi.org/10.1029/2019WR025924
Web of Science ®Google Scholar
Seleem, Omar, Georgy Ayzel, Arthur Costa Tomaz de Souza, Axel Bronstert, and Maik Heistermann. 2022. “Towards Urban Flood Susceptibility Mapping Using Data-Driven Models in Berlin, Germany.” Geomatics, Natural Hazards and Risk 13 (1): 1640–1662. https://doi.org/10.1080/19475705.2022.2097131
Web of Science ®Google Scholar
Şen, Zekâi. 2018. Flood Modelling, Prediction and Mitigation. Cham, Switzerland: Springer.
Google Scholar
Shafapour Tehrany, Mahyat, Lalit Kumar, Mustafa Neamah Jebur, and Farzin Shabani. 2019. “Evaluating the Application of the Statistical Index Method in Flood Susceptibility Mapping and its Comparison with Frequency Ratio and Logistic Regression Methods.” Geomatics, Natural Hazards and Risk 10 (1): 79–101. https://doi.org/10.1080/19475705.2018.1506509
Web of Science ®Google Scholar
Shahabi, Himan, Ataollah Shirzadi, Kayvan Ghaderi, Ebrahim Omidvar, Nadhir Al-Ansari, John J. Clague, Marten Geertsema, Khabat Khosravi, Ata Amini, and Sepideh Bahrami. 2020. “Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on k-Nearest Neighbor Classifier.” Remote Sensing 12 (2): 266. https://doi.org/10.3390/rs12020266.
Web of Science ®Google Scholar
Siam, Zakaria Shams, Rubyat Tasnuva Hasan, Soumik Sarker Anik, Fahima Noor, Mohammed Sarfaraz Gani Adnan, and Rashedur M Rahman. 2021. “Study of Hybridized Support Vector Regression Based Flood Susceptibility Mapping for Bangladesh.” Paper presented at the Advances and Trends in Artificial Intelligence. From Theory to Practice: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part II 34.
Google Scholar
Sperti, Michela. 2019. Cardiovascular Risk Prediction in Rheumatic Patients by Artificial Intelligence Paradigms. Turin, Italy: Politecnico di Torino.
Google Scholar
Stanton, Carly, Michael J Starek, Norman Elliott, Michael Brewer, Murilo M Maeda, and Tianxing Chu. 2017. “Unmanned Aircraft System-Derived Crop Height and Normalized Difference Vegetation Index Metrics for Sorghum Yield and Aphid Stress Assessment.” Journal of Applied Remote Sensing 11 (2): 026035. https://doi.org/10.1117/1.JRS.11.026035
Web of Science ®Google Scholar
Sugianto, Sugianto, Anwar Deli, Edy Miswar, Muhammad Rusdi, and Muhammad Irham. 2022. “The Effect of Land Use and Land Cover Changes on Flood Occurrence in Teunom Watershed, Aceh Jaya.” Land 11 (8): 1271. https://doi.org/10.3390/land11081271
Web of Science ®Google Scholar
Sumayli, Abdulrahman. 2023. “Development of Advanced Machine Learning Models for Optimization of Methyl Ester Biofuel Production from Papaya Oil: Gaussian Process Regression (GPR), Multilayer Perceptron (MLP), and K-Nearest Neighbor (KNN) Regression Models.” Arabian Journal of Chemistry 16 (7): 104833. https://doi.org/10.1016/j.arabjc.2023.104833
Web of Science ®Google Scholar
Tang, Yaochi, Yunchi Chang, and Kuohao Li. 2023. “Applications of K-Nearest Neighbor Algorithm in Intelligent Diagnosis of Wind Turbine Blades Damage.” Renewable Energy 212: 855–864. https://doi.org/10.1016/j.renene.2023.05.087
Web of Science ®Google Scholar
Taunk, Kashvi, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification.” Paper presented at the 2019 International Conference on Intelligent Computing and Control Systems (ICCS).
Google Scholar
Tehrany, Mahyat Shafapour, Moung-Jin Lee, Biswajeet Pradhan, Mustafa Neamah Jebur, and Saro Lee. 2014. “Flood Susceptibility Mapping Using Integrated Bivariate and Multivariate Statistical Models.” Environmental Earth Sciences 72 (10): 4001–4015. https://doi.org/10.1007/s12665-014-3289-3
Web of Science ®Google Scholar
Tehrany, Mahyat Shafapour, Biswajeet Pradhan, and Mustafa Neamah Jebur. 2013. “Spatial Prediction of Flood Susceptible Areas Using Rule Based Decision Tree (DT) and a Novel Ensemble Bivariate and Multivariate Statistical Models in GIS.” Journal of Hydrology 504: 69–79. https://doi.org/10.1016/j.jhydrol.2013.09.034
Web of Science ®Google Scholar
Vojtek, Matej, and Jana Vojteková. 2019. “Flood Susceptibility Mapping on a National Scale in Slovakia Using the Analytical Hierarchy Process.” Water 11 (2): 364. https://doi.org/10.3390/w11020364
Web of Science ®Google Scholar
Wang, Yi, Zhice Fang, Haoyuan Hong, and Ling Peng. 2020. “Flood Susceptibility Mapping Using Convolutional Neural Network Frameworks.” Journal of Hydrology 582: 124482. https://doi.org/10.1016/j.jhydrol.2019.124482
Web of Science ®Google Scholar
Wu, Jinru, Xiaoling Chen, and Jianzhong Lu. 2022. “Assessment of Long and Short-Term Flood Risk Using the Multi-Criteria Analysis Model with the AHP-Entropy Method in Poyang Lake Basin.” International Journal of Disaster Risk Reduction 75: 102968. https://doi.org/10.1016/j.ijdrr.2022.102968
Web of Science ®Google Scholar
Xia, Peipei, Li Zhang, and Fanzhang Li. 2015. “Learning Similarity with Cosine Similarity Ensemble.” Information Sciences 307: 39–52. https://doi.org/10.1016/j.ins.2015.02.024
Web of Science ®Google Scholar
Xie, Xiaozhen. 2018. “A k-Nearest Neighbor Technique for Brain Tumor Segmentation Using Minkowski Distance.” Journal of Medical Imaging and Health Informatics 8 (2): 180–185. https://doi.org/10.1166/jmihi.2018.2285
Web of Science ®Google Scholar
Xing, Wenchao, and Yilin Bei. 2019. “Medical Health Big Data Classification Based on KNN Classification Algorithm.” IEEE Access 8: 28808–28819. https://doi.org/10.1109/ACCESS.2019.2955754
Google Scholar
Yang, Byungyun. 2016. “GIS Based 3-D Landscape Visualization for Promoting Citizen's Awareness of Coastal Hazard Scenarios in Flood Prone Tourism Towns.” Applied Geography 76: 85–97. https://doi.org/10.1016/j.apgeog.2016.09.006
Web of Science ®Google Scholar
Yang, Jun, Lifu He, and Siyao Fu. 2014. “An Improved PSO-Based Charging Strategy of Electric Vehicles in Electrical Distribution Grid.” Applied Energy 128: 82–92. https://doi.org/10.1016/j.apenergy.2014.04.047
Web of Science ®Google Scholar
Youssef, Ahmed M, Biswajeet Pradhan, and Saleh A Sefry. 2016. “Flash Flood Susceptibility Assessment in Jeddah City (Kingdom of Saudi Arabia) Using Bivariate and Multivariate Statistical Models.” Environmental Earth Sciences 75 (1): 12. https://doi.org/10.1007/s12665-015-4830-8
Web of Science ®Google Scholar
Zainal, Anna Gustina. 2021. “Identification of Lampung Script Using K-Neighbor, Manhattan Distance And Population Matrix Algorithm.”
Google Scholar
Zainodin, H. J., A. Noraini, and S. J. Yap. 2011. “An Alternative Multicollinearity Approach in Solving Multiple Regression Problem.” Trends in Applied Sciences Research 6 (11): 1241–1255. https://doi.org/10.3923/tasr.2011.1241.1255
Google Scholar
Zhang, Zhongheng. 2016. “Introduction to Machine Learning: K-Nearest Neighbors.” Annals of Translational Medicine 4 (11): 1–17.
Web of Science ®Google Scholar

Enhancing flood-prone area mapping: fine-tuning the K-nearest neighbors (KNN) algorithm for spatial modelling

ABSTRACT

1. Introduction

2. Material and methods

2.1. Methodology

2.2. Study area

2.3. Flood inventory map

2.4. Flood effective factors

Table 1. Factors affecting floods.

2.5. Multicollinearity test

2.6. Determining the probability of flooding with the frequency ratio (FR) method

2.7. K-nearest neighbor (KNN) algorithm

2.8. Feature selection and hyper parameter optimization

Table 2. Advantages and disadvantages of distances used in KNN algorithm.

2.9. Validation techniques

3. Result

3.1. Results of multicollinearity between effective factors

Table 3. Multicollinearity analysis results for effective factors.

3.2. Determining the probability of flooding using the FR method

Table 4. Probability of flooding determination using the FR method.

3.3. Results of feature selection with PSO algorithm

Table 5. Parameters used in the PSO algorithm.

Table 6. Optimal factors in different iterations using PSO algorithm.

Table 7. Importance of factors affecting floods using the PSO algorithm.

3.4. Hyper parameter optimization of KNN algorithm

Table 8. Optimized hyperparameters in the KNN algorithm using PSO algorithm.

3.5. Flood modelling and susceptibility

Table 9. Evaluation indices for the KNN-based algorithm performance.

3.6. Validation of susceptibility maps

Table 10. The AUC values for the flood-prone areas were generated using different algorithms.

Table 11. Results of Wilcoxon signed-rank test for model comparison.

Table 12. Results of sensitivity analysis.

4. Discussion

5. Conclusion

Disclosure statement

Data availability statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date