Search in:

Engineering Applications of Computational Fluid Mechanics Volume 18, 2024 - Issue 1

Submit an article Journal homepage

Open access

332

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Fluvial bedload transport modelling: advanced ensemble tree-based models or optimized deep learning algorithms?

Khabat Khosravia School of Climate Change and Adaptation, University of Prince Edward Island, Charlottetown, CanadaCorrespondence[email protected]
View further author information

Aitazaz A. Farooquea School of Climate Change and Adaptation, University of Prince Edward Island, Charlottetown, CanadaView further author information

Sayed M. Batenib Department of Civil, Environmental and Construction Engineering and Water Resources Research Center, University of Hawaii at Manoa, Honolulu, HI, USAView further author information

Changhyun Junc Department of Civil and Environmental Engineering, College of Engineering, Chung-Ang University, Seoul, Republic of KoreaCorrespondence[email protected]
View further author information

Dorsa Mohammadid Earth & Environmental Sciences Department, Boston University, Boston, MA, USAView further author information

Zahra Kalantarie Department of Sustainable Development, Environmental Science and Engineering (SEED), KTH Royal Institute of Technology, Stockholm, SwedenView further author information

James R. Cooperf Department of Geography & Planning, School of Environmental Sciences, University of Liverpool, Liverpool, UKView further author information

show all

Article: 2346221 | Received 18 Jan 2024, Accepted 15 Apr 2024, Published online: 10 May 2024

Cite this article
https://doi.org/10.1080/19942060.2024.2346221
CrossMark

In this article

1. Introduction
2. Methodology
3. Results
4. Discussion
5. Conclusions
Supplemental material
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The potential of advanced tree-based models and optimized deep learning algorithms to predict fluvial bedload transport was explored, identifying the most flexible and accurate algorithm, and the optimum set of readily available and reliable inputs. Using 926 datasets for 20 rivers, the performance of three groups of models was tested: (1) standalone tree-based models Alternating Model Tree (AMT) and Dual Perturb and Combine Tree (DPCT); (2) ensemble tree-based models Iterative Absolute Error Regression (IAER), ensembled with AMT and DPCT; and (3) optimized deep learning models Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) ensembled with Grey Wolf Optimizer. Comparison of the predictive performance of the models with that of commonly used empirical equations and sensitivity analysis of the driving variables revealed that: (i) the coarse grain-size percentile D₉₀ was the most effective variable in bedload transport prediction (where D_x is the xth percentile of the bed surface grain size distribution), followed by D₈₄, D₅₀, flow discharge, D₁₆, and channel slope and width; (ii) all tree-based models and optimized deep learning algorithms displayed ‘very good’ or ‘good’ performance, outperforming empirical equations; and (iii) all algorithms performed best when all input parameters were used. Thus, a range of different input variable combinations must be considered in the optimization of these models. Overall, ensemble algorithms provided more accurate predictions of bedload transport than their standalone counterpart. In particular, the ensemble tree-based model IAER-AMT performed most strongly, displaying great potential to produce robust predictions of bedload transport in coarse-grained rivers based on a few readily available flow and channel variables.

KEYWORDS:

Bedload sediment
machine learning
empirical equations
deep learning
IAER-AMT
Einstein (Citation1950)

1. Introduction

Bedload transport is the key driver of morphological change in coarse-grained rivers, exacerbating flooding (e.g. Nones, Citation2019) and posing risks to infrastructure (e.g. Feeney et al., Citation2022; Li et al., Citation2021) and benthic habitats (e.g. Fisher et al., Citation1982). Predicting bedload transport rate accurately is a major challenge due to the vast number of flow and channel properties that control bedload transport, its non-linear relationship with these variables, its stochastic nature, and high complexity in its spatio-temporal patterns. Influential variables include upstream source of sediment supply, storage, and delivery (Gao, Citation2011), river channel characteristics such as slope, wide, riverbed structure, and roughness (e.g. Zhang et al., Citation2010), bed material size and its variation (e.g. Recking et al., Citation2023), and river flow properties such as discharge and bed shear stress (e.g. Gomez and Church, Citation1989).

Direct measurement of bedload is costly, time-consuming, and associated with high uncertainty, particularly during flooding (Graf, Citation1971). To overcome these difficulties, a vast array of laboratory flume experiments have been conducted under different flow and bed material conditions, from which many empirical equations have been developed, e.g. those reported by Meyer-Peter and Müller (Citation1948), Einstein (Citation1950), Bagnold (Citation1966), Wilcock and Crowe (Citation2003), and Recking (Citation2013). For example, Poorhosein et al. (Citation2014) developed two types of empirical/linear equations for bedload transport rate prediction, one based on hydraulic parameters and one based on geometric parameters, and found good predictive performance for both types. They also identified Froude number, Shields parameter, and shape factor as the three most effective hydraulic variables in bedload transport prediction, while grain size distribution and water channel slope were the most important and effective geometric variables (Poorhosein et al., Citation2014). Using 2600 datasets, Hinton et al. (Citation2018) tested a number of empirical equations, including those developed by Barry et al. (Citation2004), Parker (Citation1990; both calibrated and uncalibrated), Meyer-Peter and Müller (Citation1948), Wilcock (Citation2001), Rosgen et al. (Citation2006; ‘Pagosa good condition’), Elhakeem and Imran (Citation2016), and Recking (Citation2013). Their results showed that the ‘Pagosa good condition’ and Barry et al. equations outperformed the others, while the Meyer-Peter and Müller (Citation1948) and uncalibrated Parker (Citation1990) equations gave the lowest predictive power.

Alternatively, bedload transport can be predicted using numerical approaches, which attempt to mathematically represent the physics behind the processes of entrainment, transportation, and deposition. For example, Jilani and Hashemi (Citation2013) developed a smoothed particle hydrodynamic (SPH) model and found it be reliable and efficient, while Barzgaran et al. (Citation2019) developed and implemented a second-order finite volume method and wave propagation algorithm and found it to be efficient. Both models have been successfully applied in later studies, but model implementation is difficult, they require vast amounts of data for calibration and validation, and calibration is time-consuming, limiting their wider application. Various approaches have been employed to simplify these models, including prediction of flow variables using a depth-averaged method, the Manning’s (Citation1891) equation with estimates of the Manning roughness coefficient, and using transport capacity equations under unlimited sediment supply conditions (Mustafa et al., Citation2017; Shahiri et al., Citation2016; Wainwright et al., Citation2015).

The use of machine learning (ML) models in hydrology and river science, and in many other fields of study, is now increasing. These models seek to find a robust relationship between readily available input and output parameters. The main advantages of ML models are that they are user-friendly, require only small amounts of data, are simple and fast to calibrate, are able to handle large amounts of data, and have a non-linear structure that is able to replicate complicated environmental behaviour (e.g. Asheghi & Hosseini, Citation2020; Hosseiny et al., Citation2023; Khosravi et al., Citation2020; Kisi & Yaseen, Citation2019; Latif et al., Citation2023; Roushangar & Koosheh, Citation2015).

Artificial Neural Network (ANN) is one of the oldest and most widely used ML models in hydrology and water science. Hosseiny et al. (Citation2023) found an ANN model to be efficient in the prediction of bedload transport based on 8117 measurements from 134 rivers. However, ANN algorithms have slow coverage speed during the training procedure, high errors in the modelling phase, and low convergence and generalization power (Kisi et al., Citation2012). Thus, ANN algorithms have poor predictive power when the range of the testing dataset is outside the range of the training data (Kisi et al., Citation2016; Melesse et al., Citation2011), and they require a large dataset to achieve reasonable results. To overcome this weakness, ANN algorithms have been ensembled with fuzzy logic algorithms to create Adaptive Neural Fuzzy Inference System (ANFIS) models. Riahi-Madvar and Seifi (Citation2018) developed an ANFIS model for bedload transport prediction and found that it outperformed an ANN model. However, in other environmental fields of study, ANFIS models have been found to be poor at finding the best weight parameters, heavily influencing the prediction accuracy (Tien Bui et al., Citation2016). Furthermore, ANFIS algorithms suffer from the need for a large number of model operators, each of which must be set accurately, especially the weights of membership function. Additionally, ANFIS algorithms lack a systematic approach in the design of fuzzy rules and in the choice of membership functions variables (Tien Bui et al., Citation2016; Khosravi et al., Citation2018).

The ANFIS model is neuron-based and several other algorithms of this type, such as Support Vector Regression (SVR), have been widely used in river science. For example, Roushangar and Koosheh (Citation2015) developed a hybridized model, SVR-GA, by combining SVR with the Genetic Algorithm (GA) approach, and found that it had better predictive power than empirical equations of bedload transport rate. However, SVR models have many hyper-parameters, making calibration time-consuming and model implementation difficult (Ahmad et al., Citation2018). Generally, the prediction power of neuron-based models to are improved when combined with metaheuristic models such as GA, heap-based optimizer (HBO), political optimizer (PO), teaching-learning based optimization (TLBO), backtracking search algorithm (BSA) and jellyfish search optimization (JFSO) (Moayedi et al., Citation2024; Vakharia et al., Citation2023).

New types of neuron-based models, called deep learning (DL) algorithms, have been developed to overcome the weaknesses of conventional ML models. The two main advantages of DL models are their greater flexibility, and their ability to handle large and complex data, both structured and unstructured. Thus DL have higher predictive performance (Ghorbanzadeh et al., Citation2019). Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks are among the most popular and widely used DL approaches, owing to superior performance. For example, Latif et al. (Citation2023) found that a LSTM model achieved better performance in prediction of bedload transport rate than SVR and ANN, while Shakya et al. (Citation2023) found that a different DL algorithm, Deep Neural Network (DNN), performed better in prediction of total sediment load in rivers than SVR, linear regression (LR), and extreme learning machine (ELM) models.

Another type of ML model which is widely used in hydrology and water resources, especially for spatial modelling of natural hazards, are tree-based algorithms such as random forest (RF), M5Prime (M5P), and Reduced Error Pruning Tree (REPT). Khosravi et al. (Citation2018) applied several tree-based models, including Logistic Model Trees (LMT), REPT, Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT), in flood susceptibility mapping in Iran and found that all models achieved very good performance, although ADT outperformed the other models. Rahmati et al. (Citation2019) applied numerous tree-based models, including Rule-Based Decision Tree (RBDT), Boosted Regression Trees (BRT), Classification And Regression Tree (CART), and a RF model in land subsidence susceptibility mapping and found that the RF model achieved the best performance. Hussain and Khan (Citation2020) developed a RF model for monthly river flow forecasting and found that it achieved around 18% and 34% higher performance (based on root mean square error, RMSE) than MLP and SVM, respectively. However, there is a significant knowledge gap regarding the potential of DL algorithms for bedload transport prediction. Thus the challenge lies in establishing the most flexible and accurate algorithm for this purpose, and identifying readily available, reliable, and optimum inputs.

The aim of this study was to address this challenge through comparing the performance of empirical models, standalone and ensemble tree-based models, and optimized DL models in prediction of bedload transport rate in coarse-grained rivers. Specific objectives were to establish, using 926 datasets for 20 rivers: (1) the potential of tree-based and DL algorithms to provide accurate predictions using a few readily available and measurable river properties, such as channel size (width and slope), flow discharge, and sediment size; (2) the most effective variable in bedload transport prediction; (3) the most effective input variable combination in optimizing predictive power; and (4) the effect of hybridization and ensemble-based approaches on prediction accuracy. This study is the first to apply a wide range of tree-based and DL models in prediction of bedload transport and offers new insights into the potential of these algorithms to provide simple, fast, accurate, and efficient predictions of bedload transport.

2. Methodology

2.1. Data

The data used in the analysis comprised 926 sets of bedload transport rate for 20 rivers, compiled from BedloadWeb (http://en.bedloadweb.com) (Recking, Citation2019) and (Hosseiny et al., Citation2023; https://doi.org/10.5281/zenodo.7641313). In addition to measured bedload sediment transport rate per unit width (q_b; g/m/s), the data included river bed slope (S; m/m), river discharge (Q_; m³/s), river width (w; m), and bed surface sediment sizes (D₁₆, D₅₀, D₈₄, and D₉₀, where D_x is the xth percentile of the bed surface grain size distribution in m). Summary statistics on the dataset are presented in Table .

Table 1. Summary statistics on the training/testing data.

Download CSV Display Table

The datasets were split in two in a ratio of 70:30, with 633 datasets used for model development, calibration, and training (training data), and the remaining 293 datasets used for model validation and performance comparison (testing data). There is no consensus on how best to split data for training and testing, but a 70:30 split is the most widely used approach in spatial (e.g. Khosravi et al., Citation2018) and time series (e.g. Kouadio et al., Citation2018; Samadianfard et al., Citation2019) modelling by ML/DP. Although the training and testing datasets were selected randomly, a manual check was performed to ensure that they were separated correctly in terms of representing a range of q_b values.

Three main approaches were used to construct different input data scenarios: a manual approach and two feature selection ML-based models, CfsSubsetEval (CSE) and Principal Component Analysis (PCA). These are the most common approaches among feature ranking methods, such as Fisher score, ReliefF, Wilcoxon rank, Gain ratio and Memetic feature (Vakharia et al., Citation2016).

2.2.1 Manual approach

Eight different data input scenarios were constructed and explored to find the most effective input combination (Table ). First, the parameter/variable with the highest correlation coefficient was selected as the first input scenario to explore whether the most correlated parameter/variable was efficient in predicting q_b individually. Then other variables with the second, third, fourth, etc. highest correlation coefficient were added step-by-step to construct the eight different input combinations.

Table 2. Input combination scenarios.

Download CSV Display Table

2.2.2. Cfssubseteval approach

CfsSubsetEval is a correlation-based feature subset selection and multivariate filter evaluator approach that embraces the worth of a subset of attributes by considering the individual predictive ability of each feature and the degree of redundancy between features (Hall, Citation1999). Subsets of features that are highly correlated with the class, but have low intercorrelation, are preferred. CSE is calculated as (Qiao et al., Citation2022): (1) $CSE = max_{sk} [\frac{r_{c f_{1}} + r_{c f_{2}} + \dots + r_{c f_{k}}}{\sqrt{\begin{matrix} k + 2 (r_{f_{1} f_{2}} + \dots + r_{f_{i} f_{j}} + \dots \\ + r_{f_{k} f_{k}} - 1) \end{matrix}}}]$ (1) where sk is feature subset S consisting of k features, r_cfi is correlation between input features and the output target, and r_fifj is intercorrelation between input features. This, along with the PCA approach, was implemented in Waikato Environment for Knowledge Analysis (WEKA) 3.9 software. The CSE approach produced input No. 3 in Table .

2.2.3. Principal component analysis approach

Principal Component Analysis is a popular linear feature extractor used for unsupervised feature selection based on eigenvector analysis to identify critical original features for principal components. PCA is a statistical method applied to decrease the dimensionality of a dataset through linearly transforming the data into a new coordinate system where (most of) the variation in the data can be described with fewer dimensions than the initial data. The PCA approach produced input No. 5 in Table . All eight input combinations were implemented, and the resulting RMSE was calculated to assess the most efficient input combination.

Metaheuristic algorithms were applied for determination of the most effective and optimum values of DL model hyperparameters, using MATLAB programming software. In this approach, the Grey Wolf Optimizer (GWO) algorithm was combined with DL algorithms to identify optimum hyperparameter values automatically. For tree-based models, which were implemented in WEKA software, the most common and basic trial and error approaches were utilized for tuning model hyperparameters. This approach involved calculating the RMSE for the default values, and then considering higher and lower values, to identify the most effective values (see Table A and B in supplementary material).

2.4. Model description

2.4.1. Dual perturb and combine tree (DPCT)

A DPCT model is a regression and classification tree-based model. Perturb and combine algorithms (PC algorithms) are used to develop and construct different subset models from the training dataset. All predicted values are then combined to generate the final target value (Breiman, Citation1998). Geurts and Wehenkel (Citation2005) showed that the PC model is reliable, and delivers high accuracy. The DPCT model is a more advanced kind of PC model that only generates one model for prediction through delays to the prediction stage for generation of multiple prediction. This delay is produced by perturbing the attribute vector corresponding to a test case.

2.4.2. Alternating model tree (AMT)

Introduced by Frank et al. (Citation2015), AMT is a type of regression tree-based model that uses forward additive regression (AR) and a cross-validation approach to build the tree model.. This type of ensemble model benefits from numerous advanced algorithms for development and growing. AMT models grow based on two nodes; splitter node (divides the quantitative attributes at the median value) and predictor node (forecasts the system’s response through linear regression) (Gao et al., Citation2019).

2.4.3. Iterative absolute error regression (IAER)

IAER iteratively fits a regression model by attempting to minimize absolute error, using a base learner that minimizes weighted squared error. Weights are bounded from below by 1.0 / Utils.SMALL. The algorithm re-samples data based on weights if the base learner is not a Weighted Instances Handler. More information can be found in Schlossmacher (Citation1973).

2.4.4. Recurrent neural network (RNN)

The RNN model is a popular and robust DL model for sequential data modelling and prediction, and is a form of advanced bi-directional ANN model (i.e. it feeds back the output from some nodes to affect subsequent input to the same nodes). This process has a significant impact on the learning ability of the model. In other words, for each new input, the output is identified and then fed back as the modified input to the modelling process. This operation is continued until a constant output has been attained. RNN uses the same weights for each element of the sequence, decreasing the number of parameters and allowing the model to generalize to sequences of varying lengths.

2.4.5. Long short-term memory (LSTM)

LSTM is a type of RNN model which is capable of learning long-term dependencies, especially in time series problems or in processing sequential data (Hochreiter & Schmidhuber, Citation1997). LSTM is composed of memory blocks. These blocks are memory cells that are capable of storing or remembering sequential dataset/information through units called gates (Azzouni & Pujolle, Citation2017). Input gates, forget gates, and output gates are the three main gates in the LSTM network, and they control the flow of incoming information, amount of information retained from the previous memory, and flow of outgoing information, respectively (Vu et al., Citation2021). When networks in a LSTM model forget a previous hidden state, they are capable of combining memory blocks to cause the networks to learn.

2.4.6. Grey wolf optimizer (GWO)

GWO is one of the most flexible, popular, strong, and efficient meteoritic algorithms that can be applied for ML model optimization, mimicking the leadership hierarchy and hunting mechanism of grey wolves in nature (Mirjalili, Mirjalili, & Lewis, Citation2014). The model structure is similar to a pyramid with four levels, of alpha (α), beta (β), delta (δ), and omega (ω) wolves. Alpha wolves are located at the top of the pyramid and are the optimal and efficient solutions that wolf leaders make. Beta and delta wolves at the second and third level are responsible for sub-optimal decisions or are subservient wolves in decision-making (Li et al., Citation2021). Omega wolves at the bottom of the pyramid play the role of scapegoat. GWO achieves an efficient solution by updating the positions of other wolves according to the positions of α, β, and δ wolves.

2.4.7. Einstein (1950) equation

The Einstein (Citation1950) equation considers bedload transport as a probabilistic phenomenon, relating the flow intensity to the bedload transport rate: (2) $\begin{aligned} q_{Bed} = 1 - \frac{1}{\sqrt{Π}} \int_{- (0.413 / τ^{*}) - 2}^{(0.413 / τ^{*}) - 2} e^{- t^{2}} dt = \frac{43.5 q^{*}}{1 + 43.5 q^{*}} \end{aligned}$ (2) where τ* is Shields stress, t is an integral parameter, and q* is the Einstein bedload number. More information about the Einstein (Citation1950) equation can be found in Hosseiny et al. (Citation2023).

2.4.8. Recking (2013) bedload equation

Recking (Citation2013) developed a bedload transport equation based on 6319 field observations and 1317 flume measurements: (3) $q_{Bed} = 14 τ_{84}^{* 2.5} / [1 + (τ_{m}^{*} / τ_{84}^{*})^{4}]$ (3) where $τ_{m}^{*}$ is non-dimensional mobility Shields stress related to transition from partial to full mobility, and $τ_{84}^{*}$ is non-dimensional Shields stress related to bed surface sediment size D₈₄.

2.5. Model evaluation

A number of quantitative and qualitative/visual approaches were used for model evaluation and comparison. The quantitative group included coefficient of determination (R²), RMSE, Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of RMSE to standard deviation of measured data (RSR). These error metrics were calculated as follows: (4) $\begin{aligned} R^{2} & = {(\frac{\sum_{i = 1}^{n} (q_{Be d_{M}} - {\bar{q}}_{Be d_{M}}) (q_{Be d_{P}} - {\bar{q}}_{Be d_{P}})}{\sqrt{\begin{matrix} \sum_{i = 1}^{n} {(q_{Be d_{M}} - {\bar{q}}_{Be d_{M}})}^{2} \\ \times \sum_{i = 1}^{n} {(q_{Be d_{P}} - {\bar{q}}_{Be d_{P}})}^{2} \end{matrix}}})}^{2} \\ 0 \leq R^{2} \leq 1 Optimum = 1 \end{aligned}$ (4) (5) $\begin{aligned} RMSE & = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(q_{Be d_{P}} - q_{Be d_{M}})}^{2}} \\ 0 \leq RMSE \leq + ∞Optimum = 0 \end{aligned}$ (5) (6) $\begin{aligned} NSE & = 1 - \frac{\sum_{i = 1}^{n} {(q_{Be d_{P}} - q_{Be d_{M}})}^{2}}{\sum_{i = 1}^{n} {(q_{Be d_{P}} - {\bar{q}}_{Be d_{P}})}^{2}} - \infty \\ \leq NSE \leq 1 Optimum = 1 \end{aligned}$ (6) (7) $\begin{aligned} PBIAS & = {(\frac{\sum_{i = 1}^{n} (q_{Be d_{M}} - q_{Be d_{P}})}{\sum_{i = 1}^{n} q_{Be d_{M}}})}^{*} 100 \\ - \infty \leq PBIAS \leq + ∞Optimum = 0 \end{aligned}$ (7) (8) $\begin{aligned} RSR & = \sqrt{\frac{\sum_{i = 1}^{n} {(q_{Be d_{P}} - q_{Be d_{M}})}^{2}}{\sum_{i = 1}^{n} {(q_{Be d_{M}} - {\bar{q}}_{Be d_{M}})}^{2}}} 0 \\ \leq RSR \leq + ∞Optimum = 0 \end{aligned}$ (8) where $q_{Be d_{M}}$ and $q_{Be d_{P}}$ is measured and predicted bedload transport rate, respectively, ${\bar{q}}_{Be d_{M}}$ and ${\bar{q}}_{Be d_{M}}$ is mean measured and predicted q_b value, respectively, and n is number of data points.

The qualitative/visual approaches used in the comparison of model performance were scatter plots, line-variation graphs, Taylor diagrams, and violin plots, allowing the model fit to be seen across the full range of bedload transport values, particularly at the extreme end of the range. One distinct advantage of the Taylor diagram is that it benefits from the use of two common correlation statistics: correlation and standard deviation (SD) (Taylor, Citation2001).. The measured data point in the Taylor diagram is considered the reference point. The closer the predicted value to this reference value in terms of R² and SD, the higher the prediction capability.

The Freidman test was applied for the different model outputs. If the test was significant, then an additional posthoc Nemenyi test was carried out to check for statistically significant differences between the models. The null hypothesis was that there was a statistically significant difference between the models at α = 0.05. At p-value < 0.05 the null hypothesis was rejected.

3. Results

3.1. Variable importance

The effectiveness and importance of each potential input variable in q_b prediction was explored through a correlation coefficient and relief attribute evaluator (RAE) approach (Figure ). RAE evaluates the worth of an attribute by repeatedly sampling an instance and considering the value of the given attribute for the nearest instance of the same and different class.

Figure 1. Radar-chart of variable importance, determined by (a) correlation coefficient and (b) relief attribute evaluator (RAE). Variables: River bed slope (S), river width (w), river discharge (Q), bed surface sediment size (D₁₆, D₅₀, D₈₄, D₉₀).

According to the correlation coefficient, presented in terms of a radar-chart (Figure a), river bed slope (S) had the largest impact on q_b prediction, followed by D₈₄, D₅₀, D₉₀, D₁₆, w, and Q. The results from the RAE approach broadly agreed, with D₉₀ shown as the most effective variable, followed by D₈₄, D₅₀, Q, D₁₆, S, and w (Figure b).

3.2. Best input combination

On adding more input variables to the input combination, the prediction accuracy of the different models increased (Figure ). According to IAER-AMT (the most reliable model), the best input combination gave 32.9% and 39.3% higher performance (lower RMSE) during the training and testing phase, respectively, than the worst performing model. The best input scenario (generated manually) had around 28% and 29% higher predictive power than the scenarios proposed by CSE and PCA ML-based methods, respectively, in terms of RMSE during the training phase. In the testing this phase, this equated to 30% and 4% higher predictive power, respectively. These RMSE values were only used to explore the best input combination, and model hyperparameter tuning for tree-based models was not implemented in this step; tuning should only occur once the most efficient input scenario has been determined.

Figure 2. Change in model performance with input combination scenarios for (a) training data and (b) testing data (dashed red boxes show the best input scenario).

3.3. Model performance evaluation

The scatter plots and R² values showed that the new ensemble tree-based algorithm IAER-AMT had the highest prediction capability (R²= 0.80), with the data points being more closely distributed around the line of equality across a fuller range of q_b values (Figure ). The second best performer was also a new ensemble tree-based model, IAER-DPCT (R²= 0.76), followed by AMT (R²= 0.73), DPCT (R²= 0.72), LSTM-GWO (R²= 0.69), and RNN-GWO (R²= 0.67). The two lowest performing models by some margin were the empirical equations, Einstein (Citation1950) (R²= 0.09) and Recking (Citation2013) (R²= 0.08). According to the R² values, IAER-AMT, IAER-DPCT, LSTM-GWO, RNN-GWO, AMT, and DPCT all achieved ‘very good’ performance ( $0.7 \leq R^{2} \leq 1$ ), LSTM and RNN ‘good’ performance ( $0.6 \leq R^{2} \leq 0.7$ ), and Einstein (Citation1950) and Recking (Citation2013) ‘unsatisfactory’ performance ( $R^{2} \leq 0.5$ ).

Figure 3. Scatter plot of measured and predicted q_b within the testing phase for different modelling approaches tested.

According to the line-variation graphs (Figure ), all tree-based models were able to predict q_b values well. In particular, the ensemble tree-based models predicted extreme values more accurately than the other models, while the empirical models overestimated the higher range of q_b values (Figure ).

Figure 4. Line variation graph of measured and predicted bedload sediment transport rate per unit width (q_b) within the testing phase for different modelling approaches.

The Taylor diagram (Figure ) revealed that the IAER-AMT model had the highest correlation, $\approx 0.90$ , with the predicted standard deviation in q_b being closest to the standard deviation of the observed data, followed by IAER-DPCT. The empirical equations had the lowest performance and higher standard deviation than the measured data. Although IAER-DPCT showed lower performance than IAER-AMT, the model produced a standard deviation closer to the measured value.

Figure 5. Taylor diagram displaying statistical comparison with observations of 10 model estimates of bedload sediment transport rate per unit width.

An examination of summary statistics of predicted q_b revealed that IAER-DPCT predicted the minimum, first quartile, and median q_b most accurately (Table ). The LSTM-GWO model performed most strongly in predicting the third quartile and the DPCT model in predicting the maximum value.

Table 3. Summary statistics on predicted bedload sediment transport rate per unit width (q_b).

Download CSV Display Table

All quantitative error metrics showed that the IAER-AMT model had the highest predictive power (Table ), followed by IAER-DPCT, AMT, DPCT, LSTM-GWO, RNN-GWO, Einstein (Citation1950), and Recking (Citation2013). According to the NSE values, the IAER-AMT and IAER-DPCT models had ‘very good performance’ ( $0.75 \leq NSE \leq 1$ ), LSTM-GWO, RNN-GWO, AMT, and DPCT had ‘good’ performance ( $0.65 \leq NSE \leq 0.75$ ), and the empirical equations had ‘unsatisfactory’ performance ( $NSE \leq 0.5$ ). These differences in performance were statistically significant in most comparisons under the Freidman (Chi-Square statistic = 453; p-value < 0.001) and Nemenyi tests (and 5) Table .

Table 4. Comparison of performance of the different models, based on root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of RMSE to standard deviation of measured data (RSR)

Download CSV Display Table

Table 5. The p-values of a Nemenyi test of model performance difference (yellow cells show a statistically significant difference between models at the 0.05 significance level, while green cells show there is no statistically significant difference)

Display Table

4. Discussion

4.1 Comparison of prediction performance achieved by empirical equations, tree-based models, and optimized deep learning algorithms

A large dataset of bedload transport measurements collected from various field-based studies was used to investigate model efficiency. The empirical equations performed poorly, particularly for higher rates of bedload transport in which accurate prediction is most required for understanding morphological change and forecasting erosion hazards (Feeney et al., Citation2022; Li et al., Citation2021). This result indicates that these equations should be used with due caution when applied outside the conditions for which they were developed. The high degree of uncertainty associated with empirical equations when applied to field-based studies is because most have been developed based on flume experiments involving simplified flow and bed conditions, such as steady and uniform flow (Mao, Citation2012), equilibrium sediment transport conditions (Wainwright et al., Citation2015), and non water-water gravel beds (Cooper & Tait, Citation2009). Problems then arise in trying to scale flow and sediment properties correctly, and the magnitude of transport that can be reproduced is limited (Kleinhans et al., Citation2014). Therefore producing an estimate of bedload transport rate for a field setting that is within the same order of magnitude as a measured value is often considered ‘reasonable’ prediction for an empirical equation, and no single empirical formula can be applied to all datasets (Gomez & Church, 1989). This flaw is because most empirical equations are linear and unable to capture non-linearity in input and output data.

In contrast, all tree-based models and optimized DL algorithms tested displayed ‘very good’ or ‘good’ performance. Among the standalone models, the tree-based models outperformed the optimized DL models for a number of reasons: (1) tree-based models have higher accuracy on tabular data (Shwartz-Ziv & Armon, Citation2022), because they require less tuning and processing effort; (2) DL models are biased to overly smooth solutions (Grinsztajn et al., Citation2022) and fit low-frequency functions (Rahaman et al., Citation2019), and thus they struggle to fit irregular target functions, such as those within the bedload datasets, compared with tree-based models; (3) tree-based models can handle data that are not normally distributed and therefore do not require scaling or normalization; and (4) tree-based models require little data preparation. The best performing standalone tree-based model was AMT, because the algorithm uses step-wise forward cumulative regression (statistical boosting version) and cross-validation techniques to reduce square error and limit tree development (Moayedi et al., Citation2020).

In all cases, the ensemble algorithms outperformed their standalone counterpart. This enhancement of performance occurred because hybridization produces a coupled model with higher flexibility that is better trained and has a non-linear structure (De’ath & Fabricius, Citation2000). High flexibility and non-linear structure are particularly important in the prediction of bedload transport rate because of the non-linearity between variables, the low correlation between individual variables and bedload transport rate, and the general complexity of bedload transport.

4.2. Effect of input variables on model prediction performance

The combination of input variables used in the models had a strong effect on predictive power, confirming that determination of the optimum combination of input variables is one of the most significant steps in producing accurate ML and DL models. Manual development of input variable combinations led to a more efficient and practical input scenario than the use of intelligent approaches (CSE and PCA). This advantage largely stemmed from being able to test the efficiency of numerous input combinations and the impact of adding each parameter on model performance. Thus, through this manual approach it was possible to determine the most sensitive hyperparameters and understand the hyperparameter reaction and trend of a model. When using this approach, inclusion of all input variables resulted in the highest performance. The intelligent approaches proposed an input scenario based only on the parameters that were most highly correlated with q_b (S, D₅₀, D₈₄, and Q), while ignoring parameters with a low degree of correlation (D₁₆, D₉₀, and w). As a result, the intelligence approaches produced models with a RMSE value in the testing phase that was 30% (CSE) and 4% (PCA) higher than the optimal input combination identified in the manual approach. This aspect further highlights the complex, non-linear nature of the interaction of bedload transport with flow mechanics and channel conditions, and the requirement for multiple input parameters to represent this interaction, even when some might have a low degree of correlation.4.3 Applying ensemble tree-based models to predict bedload transport rate in rivers.

Overall, the results showed that ensemble tree-based models have great potential to produce robust predictions of bedload transport in coarse-grained rivers. Unlike empirical equations, these models performed well over a range of flow and channel conditions, while also remaining simple, and easy and inexpensive to build and run, unlike theoretical and numerical models. Although other parameters, such as Shields stress and turbulent kinetic energy, have a significant impact on bedload transport rates, the aim was to find a model that could produce high-accuracy estimates of bedload transport based on a few readily available and measurable river properties, such as channel size (width and slope), flow discharge, and sediment size. Given that inclusion of all input variables produced the highest performance, addition of more variables can be expected to further improve performance. However, while a model with a high degree of complexity might be able to capture more of the variation in the data (reduce the training error), it will be more difficult to train and more prone to overfitting (model fitting to the noise in the data rather than the underlying pattern). Overfitting can be a significant issue for bedload prediction because measured data are noisy due to the stochastic behaviour of bedload entrainment and transport, the difficulty in obtaining representative samples, and the highly non-linear relationship of bedload with river properties. Thus, a higher-complexity model could perform poorly when applied to new and unseen data, causing loss of model generalization. With these considerations in mind and noting the very good performance of the ensemble tree-based models using readily available parameters, the models developed in this study appear to strike the correct balance between model complexity, generalization, and performance.

The major disadvantages of the types of model developed here are two-fold. First, like all statistical methods, they only relate directly to the rivers considered, and their application to other rivers may prove inappropriate. The input parameter range will also likely be wider than the range examined in this paper, despite using datasets composed from a large variety of sources. Thus, future studies should develop and apply ensemble tree-based model to rivers with differing flow and channel conditions, to test their wider applicability. Second, due to their ‘black-box’ structure, these models provide poor explanatory power, and are thus unable to improve understanding of the physical processes that determine bedload entrainment and transport.

This study has shown that incorporating just seven controlling parameters (channel slope, channel width, flow discharge, and four key bed surface grain size percentiles) can produce very good predictions of bedload transport rate. Future studies should examine the potential of other tree-based models, such as Random Forest and M5 model tree, as well as models that combine ML methods with the seasonal adjustment method (Li & Yang, Citation2022). Where data are available, future studies should assess how other factors affect the performance of these models, such as grain-size sorting (e.g. Recking et al., Citation2023) and grain shelter-exposure (armour ratio D_x/D₅₀; Fu et al., Citation2023), whilst trying to not make the developed model overly complex, and continuing to use readily available and easily measured data. Such an approach would help determine the most influential parameters in bedload transport and why they vary between rivers with differing flow and channel properties.

5. Conclusions

The morphodynamics of coarse-grained rivers depend predominantly on bedload transport rate. Due to the non-linear interactions between channel and flow mechanics, tree-based models and optimized deep learning algorithms have great potential to produce accurate predictions of flow velocity. Using 926 datasets from 20 rivers, this study explored this potential by examining the predictive power of (1) standalone tree-based models (alternating model tree (AMT) and Dual Perturb and Combine Tree (DPCT)); (2) ensemble tree-based models Iterative Absolute Error Regression (IAET) ensembled with AMT and DPCT (IAER-AMT and IAER-DPCT); and (3) optimized deep learning models Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN), ensembled with Grey Wolf Optimizer (LSTM-GWO and RNN-GWO). Their performance was benchmarked against two commonly used empirical equations. The main findings were as follows:

Sensitivity analysis identified D₉₀ as the most effective variable in bedload transport prediction, followed by D₈₄, D₅₀, Q, D₁₆, S, and w.
All algorithms tested performed best when all input parameters were used in building the model. Variables with low correlation coefficient with bedload transport rate enhanced the predictive power. Thus a range of different input variable combinations must be considered in the optimization of tree-based and optimized deep learning models.
Assessment of model performance showed that all tree-based models and optimized deep learning algorithms displayed ‘very good’ or ‘good’ performance and outperformed empirical equations, which had ‘unsatisfactory’ performance. The tree-based algorithms were more efficient and reliable than the deep learning models.
In all cases, ensemble algorithms outperformed their standalone counterpart, with the ensemble tree-based model IAER-AMT being the best performing model overall.

Together, these findings reveal that ensemble tree-based models have great potential for predicting bedload transport rates based on a few readily available and easily measured flow and channel variables. These algorithms could play a particularly important role in predicting morphological change and assessing erosion hazards in coarse-grained rivers where an understanding of the physical processes may be lacking. Thus, investigating the potential of other tree-based models across a wide range of different flow and channel conditions can be an important future research direction for river scientists. In addition, the results obtained in the present study indicate that tree-based models can be a promising tool for decision makers and beneficial for stakeholders that manage the impacts of river erosion.

Data

Data related to this study are available upon request. In addition, it is publicly available in BedloadWeb.

Author contributions

KK: Conceptualization, methodology, software, writing – original draft, review, and editing; AAF: Conceptualization, methodology, Supervision, review, funding and editing; SMB and CJ: methodology, review, and editing; DM: Conceptualization, writing – original draft; ZK and JRC: Conceptualization, methodology, review, and editing

Supplemental material

Supplemental Material

Download MS Word (15.8 KB)

Acknowledgements

We thank the creators of BedloadWeb for providing free access to the bedload data used in this publication, and to Hosseiny et al. (Citation2023) for providing free access to their input data through https://doi.org/10.5281/zenodo.7641313 under a GNU General Public License v2.0 or later. James Cooper was supported by a UK Natural Environment Research Council grant (NE/V008404/1).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This study was funded in part by a grant from the National Research Foundation of Korea (NRF) (grant number NRF-2022R1A4A3032838), supported by the Ministry of Science and ICT (MSIT) of the Korean government (grant number c). Additional support was provided by the Korea Environmental Industry & Technology Institute (KEITI) as part of the Wetland Ecosystem Value Evaluation and Carbon Absorption Value Promotion Technology Development Project, funded by the Korea Ministry of Environment (MOE) (grant number 2022003640001). In addition, the authors are thankful to Dr. Mahdi Panahi for his assistance in implementing deep learning and Nemenyi test.

References

Ahmad, M. W., Reynolds, J., & Rezgui, Y. (2018). Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. Journal of Cleaner Production, 203, 810–821. https://doi.org/10.1016/j.jclepro.2018.08.207
Web of Science ®Google Scholar
Asheghi, R., & Hosseini, S. A. (2020). Prediction of bed load sediments using different artificial neural network models. Frontiers of Structural and Civil Engineering, 14(2), 374–386. https://doi.org/10.1007/s11709-019-0600-0
Web of Science ®Google Scholar
Azzouni, A., & Pujolle, G. (2017). A long short-term memory recurrent neural network framework for network traffic matrix prediction.
Google Scholar
Bagnold, R. A. (1966). An approach to the sediment transport problem from general physics. US Geol. Surv. Prof. Paper, 422(1), 231–291.
Google Scholar
Barry, J. J., Buffington, J. M., & King, J. G. (2004). A general power equation for predicting bed load transport rates in gravel bed rivers. Water Resources Research, 40(10), W10401. https://doi.org/10.1029/2004WR003190
Web of Science ®Google Scholar
Barzgaran, M., Mahdizadeh, H., & Sharifi, S. (2019). Numerical simulation of bedload sediment transport with the ability to model wet/dry interfaces using an augmented Riemann solver. Journal of Hydroinformatics, 21(5), 834–850. https://doi.org/10.2166/hydro.2019.046
Web of Science ®Google Scholar
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(801), 849.
Google Scholar
Bui, D., Pradhan, B., Nampak, H., Bui, Q-H., Tran, Q-A., Nguyen, Q-P. (2016). Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. Journal of Hydrology, 540, 317–330. https://doi.org/10.1016/j.jhydrol.2016.06.027
Web of Science ®Google Scholar
Cooper, J. R., & Tait, S. J. (2009). Water-worked gravel beds in laboratory flumes - a natural analogue? Earth Surface Processes and Landforms, 34(3), 384–397. https://doi.org/10.1002/esp.1743
Web of Science ®Google Scholar
De’ath, G., & Fabricius, K. E. (2000). Classification and regression trees: A powerful yet simple technique for ecological data analysis Ecology (2000) https://doi/abs/10.1890/0012-9658%282000%29081%5B3178%3ACARTAP%5D2.0.CO%3B2.
Google Scholar
Einstein, H. A. (1950). The bed-load function for sediment transportation in open channel flows (No. 1026). Department of Agriculture, Washington, D.C.: US.
Google Scholar
Elhakeem, M., & Imran, J. (2016). Bedload model for nonuniform sediment. Journal of Hydraulic Engineering, 142(6), 06016004. https://doi.org/10.1061/(ASCE)HY.1943-7900.0001139
Web of Science ®Google Scholar
Feeney, C. J., Godfrey, S., Cooper, J. R., Plater, A. J., & Dodds, D. (2022). Forecasting riverine erosion hazards to electricity transmission towers under increasing flow magnitudes. Climate Risk Management, 36, 100439. https://doi.org/10.1016/j.crm.2022.100439
Web of Science ®Google Scholar
Fisher, S. G., Gray, L. J., Grimm, N. B., & Busch, D. E. (1982). Temporal succession in a desert stream ecosystem following flash flooding. Ecological Monographs, 52(1), 93–110. https://doi.org/10.2307/2937346
Web of Science ®Google Scholar
Frank, E., Mayo, M., & Kramer, S. (2015). Alternating model trees. In Sac ‘15 Proceedings of the 30th Annual ACM Symposium on Applied Computing (pp. 871–878). ACM New York.
Google Scholar
Fu, H., Shan, Y., & Liu, C. (2023). A model for predicting the grain size distribution of an armor layer under clear water scouring. Journal of Hydrology, 623, 129842. https://doi.org/10.1016/j.jhydrol.2023.129842
Web of Science ®Google Scholar
Gao, P. (2011). An equation for bed-load transport capacities in gravel-bed rivers. Journal of Hydrology, 402(3–4), 297–305. https://doi.org/10.1016/j.jhydrol.2011.03.025
Web of Science ®Google Scholar
Gao, W., Guirao, J. L. G., Abdel-Aty, M., & Xi, W. (2019). An independent set degree condition for fractional critical deleted graphs. Discret Contin Dyn Syst S, 12(4–5), 877–886.
Google Scholar
Geurts, P., & Wehenkel, L. (2005, October 3–7). Segment and combine approach for non-parametric time-series classification. Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal (pp. 478–485. LNCS.
Google Scholar
Ghorbanzadeh, O., Meena, S. R., Blaschke, T., & Aryal, J. (2019). UAV-based slope failure detection using deep-learning convolutional neural networks. Remote Sensing, 11(17), 2046. https://doi.org/10.3390/rs11172046
Web of Science ®Google Scholar
Gomez, B., & Church, M. (1989). An assessment of bed load sediment transport formulae for gravel bed rivers. Water Resources Research, 25(6), 1161–1186.
Web of Science ®Google Scholar
Graf, W. H. (1971). Hydraulics of Sediment Transport. McGraw-Hill.
Google Scholar
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on tabular data? http://arxiv.org/abs/2207.08815.
Google Scholar
Hall, M. A. (1999). Correlation-based feature selection for machine learning, no. April.
Google Scholar
Hinton, D., Hotchkiss, R., & Cope, M. (2018). Comparison of calibrated empirical and semi-empirical methods for bedload transport rate prediction in gravel bed streams. Journal of Hydraulic Engineering, 144(7), 1–17. https://doi.org/10.1061/(ASCE)HY.1943-7900.0001474.
Web of Science ®Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
PubMed Web of Science ®Google Scholar
Hosseiny, H., Masteller, C., Dale, J., & Phillips, C. (2023). Development of a machine learning model for river bed load. Earth Surface Dynamics, 11(4), 681–693. https://doi.org/10.5194/esurf-11-681-2023
Web of Science ®Google Scholar
Hussain, D., & Khan, A. A. (2020). Machine learning techniques for monthly river flow forecasting of Hunza River, Pakistan. Earth Science Informatics, 13(3), 939–949. https://doi.org/10.1007/s12145-020-00450-z
Web of Science ®Google Scholar
Jilani, A. N., & Hashemi, S. U. (2013). Numerical investigations on bed load sediment transportation using SPH method. Iranian Journal of Science, 20(2), 294–299.
Google Scholar
Khosravi, K., Cooper, J. R., Daggupati, P., Pham, B., & Bui, D. (2020). Bedload transport rate prediction: Application of novel hybrid data mining techniques. Journal of Hydrology, 585, 124774. https://doi.org/10.1016/j.jhydrol.2020.124774
Web of Science ®Google Scholar
Khosravi, K., Panahi, M., & Tien Bui, D. (2018). Spatial prediction of groundwater spring potential mapping based on an adaptive neuro-fuzzy inference system and metaheuristic optimization. Hydrology and Earth System Sciences, 22(9), 4771–4792. https://doi.org/10.5194/hess-22-4771-2018
Web of Science ®Google Scholar
Kisi, O., Dailr, A. H., Cimen, M., & Shiri, J. (2012). Suspended sediment modeling using genetic programming and soft computing techniques. Journal of Hydrology, 450–451, 48–58. https://doi.org/10.1016/j.jhydrol.2012.05.031
Web of Science ®Google Scholar
Kisi, O., Genc, O., Dinc, S., & Zounemat-Kermani, M. (2016). Daily pan evaporation modeling using chi-squared automatic interaction detector, neural networks, classification and regression tree. Computers and Electronics in Agriculture, 122, 112–117. https://doi.org/10.1016/j.compag.2016.01.026
Web of Science ®Google Scholar
Kisi, O., & Yaseen, Z. M. (2019). The potential of hybrid evolutionary fuzzy intelligence model for suspended sediment concentration prediction. Catena, 174, 11–23. https://doi.org/10.1016/j.catena.2018.10.047
Web of Science ®Google Scholar
Kleinhans, M. G., van Dijk, W. M., van de Lageweg, W. I., Hoyal, D. C. J. D., Markies, H., van Maarseveen, M., Roosendaal, C., van Weesep, W., van Breemen, D., Hoendervoogt, R., & Cheshier, N. (2014). Quantifiable effectiveness of experimental scaling of river- and delta morphodynamics and stratigraphy. Earth-Science Reviews, 133, 43–61. https://doi.org/10.1016/j.earscirev.2014.03.001
Web of Science ®Google Scholar
Kouadio, L., Deo, R. C., Byrareddy, V., Adamowski, J. F., Mushtaq, S., & Phuong Nguyen, V. (2018). Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties. Computers and Electronics in Agriculture, 155, 324–338. doi:https://doi.org/10.1016/j.compag.2018.10.014
Web of Science ®Google Scholar
Latif, S. D., Chong, K. L., Ahmed, A. N., Huang, Y. F., Sherif, M., & El-Shafie, A. (2023). Sediment load prediction in Johor river: Deep learning versus machine learning models. Applied Water Science, 13(3), 79. https://doi.org/10.1007/s13201-023-01874-w
Web of Science ®Google Scholar
Li, S., & Yang, J. (2022). Modelling of suspended sediment load by Bayesian optimized machine learning methods with seasonal adjustment. Engineering Applications of Computational Fluid Mechanics, 16(1), 1883–1901. https://doi.org/10.1080/19942060.2022.2121944
Web of Science ®Google Scholar
Li, X., Cooper, J. R., & Plater, A. J. (2021). Quantifying erosion hazards and economic damage to critical infrastructure in river catchments: Impact of a warming climate. Climate Risk Management, 32, 100287. https://doi.org/10.1016/j.crm.2021.100287
Web of Science ®Google Scholar
Manning, R. (1891). On the flow of water in open channels and pipes. Transactions of the Institution of Civil Engineers of Ireland, 20, 161–207.
Google Scholar
Mao, L. (2012). The effect of hydrographs on bed load transport and bed sediment spatial arrangement. Journal of Geophysical Research: Earth Surface, 117(F3). http://doi.org/10.1029/2012JF002428
Google Scholar
Melesse, A., Ahmad, S., McClain, M. E., Wang, X., & Lim, Y. H. (2011). Suspended sediment load prediction of river systems: An artificial neural network approach. Agricultural Water Management, 98(5), 855–866. https://doi.org/10.1016/j.agwat.2010.12.012
Web of Science ®Google Scholar
Meyer-Peter, E., & Müller, R. (1948). Formulas for bed-load transport. In Proceedings of the 2nd Meeting of the International Association for Hydraulic Structures Research, 39–64.
Google Scholar
Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey wolf optimizer. Advances in Engineering Software, 69, 46–61.
Web of Science ®Google Scholar
Moayedi, H., Aghel, B., Foong, L. K., & Bui, D. T. (2020). Feature validity during machine learning paradigms for predicting biodiesel purity. Fuel, 262, 116498. https://doi.org/10.1016/j.fuel.2019.116498
Web of Science ®Google Scholar
Moayedi, H., Ahmadi Dehrashid, A., & Nguyen Le, B. (2024). A novel problem-solving method by multi-computational optimization of artificial neural network for modelling and prediction of the flow erosion processes. Engineering Applications of Computational Fluid Mechanics, 18(1), 2300456. https://www.tandfonline.com/doi/full/10.108019942060.2023.2300456
Web of Science ®Google Scholar
Mustafa, A. S., Sulaiman, S. O., & Al_Alwani, K. M. (2017). Application of HEC-RAS model to predict sediment transport forEuphrates River fromHaditha toHeet. Journal of Engineering Sciences, 20(3), 570–577.
Google Scholar
Nones, M. (2019). Dealing with sediment transport in flood risk management. Acta Geophysica, 67(2), 677–685. https://doi.org/10.1007/s11600-019-00273-7
Web of Science ®Google Scholar
Parker, G. (1990). Surface-based bedload transport relation for gravel rivers. Journal of Hydraulic Research, 28(4), 417–436. https://doi.org/10.1080/00221689009499058
Web of Science ®Google Scholar
Poorhosein, M., Afzalimehr, H., Sui, J., Singh, V. P., & Azareh, S. (2014). Empirical bed load transport equations. International Journal of Hydraulic Engineering, 3(3), 93–101. https://doi.org/10.5923/j.ijhe.20140303.03
Google Scholar
Qiao, Q., Yunusa-Kaltungo, A., & Edwards, R. (2022). Feature selection strategy for machine learning methods in building energy consumption prediction. Energy Reports, 8, 13621–13654. https://doi.org/10.1016/j.egyr.2022.10.125
Web of Science ®Google Scholar
Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., & Courville, A. (2019). On the spectral bias of neural networks, International Conference on Machine Learning, May 2019.
Google Scholar
Rahmati, O., Falah, F., Naghibi, A., Biggs, T., Soltani, M., Deo, R. C., Cerdà, A., Mohammadi, F., & Tien Bui, D. (2019). Land subsidence modelling using tree-based machine learning algorithms. Science of The Total Environment, 672, 239–252. https://doi.org/10.1016/j.scitotenv.2019.03.496
PubMed Web of Science ®Google Scholar
Recking, A. (2013). Simple method for calculating reach-averaged bed-load transport. Journal of Hydraulic Engineering, 139(1), 70–75. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000653
Web of Science ®Google Scholar
Recking, A. (2019). BedloadWeb. Retrieved April 25, 2022, from https://en.bedloadweb.com/.
Google Scholar
Recking, A., Vázquez Tarrío, D., & Piton, G. (2023). The contribution of grain sorting to the dynamics of the bedload active layer. Earth Surface Processes and Landforms, 48(5), 979–996. https://doi.org/10.1002/esp.5530
Web of Science ®Google Scholar
Riahi-Madvar, H., & Seifi, A. (2018). Uncertainty analysis in bed load transport prediction of gravel bed rivers by ANN and ANFIS. Arabian Journal of Geosciences, 11(21), 688. https://doi.org/10.1007/s12517-018-3968-6
Web of Science ®Google Scholar
Rosgen, D. L., Silvey, H. L., & Frantila, D. (2006.). Watershed assessment of river stability and sediment supply (WARSSS). Wildland Hydrology.
Google Scholar
Roushangar, K., & Koosheh, A. (2015). Evaluation of GA-SVR method for modeling bed load transport in gravel-bed rivers. Journal of Hydrology, 527, 1142–1152. https://doi.org/10.1016/j.jhydrol.2015.06.006
Web of Science ®Google Scholar
Samadianfard, S., Jarhan, S., Salwana, E., Mosavi, A., Shamshirband, S., & Akib, S. (2019). Support vector regression integrated with fruit fly optimization algorithm for river flow forecasting in Lake Urmia Basin. Water, 11(9), 1934. https://doi.org/10.3390/w11091934
Web of Science ®Google Scholar
Schlossmacher, E. J. (1973). An iterative technique for absolute deviations curve fitting. Journal of the American Statistical Association, 68(344), 857–859. https://doi.org/10.1080/01621459.1973.10481436
Web of Science ®Google Scholar
Shahiri, P., Noori, M., Heydari, M., & Rashidi, M. (2016). Floodplain zoning simulation by using HEC-RAS and CCHE2D Models in the Sungai Maka River. Air Soil Water Res., 9(9), 55–62.
Google Scholar
Shakya, D., Deshpande, V., Kumar, B., & Agarwal, M. (2023). Predicting total sediment load transport in rivers using regression techniques, extreme learning and deep learning models. Artificial Intelligence Review, 56(9), 10067–10098. https://doi.org/10.1007/s10462-023-10422-6
Web of Science ®Google Scholar
Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need Inf. Fusion.., 81, 84–90. https://doi.org/10.1016/j.inffus.2021.11.011
Google Scholar
Taylor, K. E. (2001). Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research: Atmospheres, 106(D7), 7183–7192. https://doi.org/10.1029/2000JD900719
Web of Science ®Google Scholar
Vakharia, V., Gupta, V. K., & Kankar, P. K. (2016). A comparison of feature ranking techniques for fault diagnosis of ball bearing. Soft Computing, 20(4), 1601–1619. https://doi.org/10.1007/s00500-015-1608-6
Web of Science ®Google Scholar
Vakharia, V., Shah, M., Nair, P., Borade, H., Sahlot, P., & Wankhede, V. (2023). Estimation of Lithium-ion battery discharge capacity by integrating optimized explainable-AI and stacked LSTM model. Batteries, 9(2), 125. https://doi.org/10.3390/batteries9020125
Google Scholar
Vu, M. T., Jardani, A., Massei, N., & Fournier, M. (2021). Reconstruction of missing groundwater level data by using Long Short-Term Memory (LSTM) deep neural network. Journal of Hydrology, 597, 125776. https://doi.org/10.1016/j.jhydrol.2020.125776
Web of Science ®Google Scholar
Wainwright, J., Parsons, A. J., Cooper, J. R., Gao, P., Gillies, J. A., Mao, L., Orford, J, & Knight, P. G. (2015). The concept of transport capacity in geomorphology. Reviews of Geophysics, 53(4), 1155–1202. https://doi.org/10.1002/2014RG000474
Web of Science ®Google Scholar
Wilcock, P. R. (2001). Toward a practical method for estimating sediment transport rates in gravel bed rivers. Earth Surface Processes and Landforms, 26(13), 1395–1408. https://doi.org/10.1002/esp.301
Web of Science ®Google Scholar
Wilcock, P. R., & Crowe, J. C. (2003). Surface-based transport model for mixed-size sediment. Journal of Hydraulic Engineering, 129(2), 120–128. https://doi.org/10.1061/(ASCE)0733-9429(2003)129:2(120)
Web of Science ®Google Scholar
Zhang, K., Wang, Z. Y., & Liu, L. (2010). The effect of riverbed structure on bed load transport in mountain streams. River Flow 2010 - Dittrich, Koll, Aberle & Geisenhainer (eds),Bundesanstalt für Wasserbau ISBN 978-3-939230-00-7.
Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Fluvial bedload transport modelling: advanced ensemble tree-based models or optimized deep learning algorithms?

Abstract

1. Introduction

2. Methodology

2.1. Data

Table 1. Summary statistics on the training/testing data.

2.2.1 Manual approach

Table 2. Input combination scenarios.

2.2.2. Cfssubseteval approach

2.2.3. Principal component analysis approach

2.4. Model description

2.4.1. Dual perturb and combine tree (DPCT)

2.4.2. Alternating model tree (AMT)

2.4.3. Iterative absolute error regression (IAER)

2.4.4. Recurrent neural network (RNN)

2.4.5. Long short-term memory (LSTM)

2.4.6. Grey wolf optimizer (GWO)

2.4.7. Einstein (1950) equation

2.4.8. Recking (2013) bedload equation

2.5. Model evaluation

3. Results

3.1. Variable importance

3.2. Best input combination

3.3. Model performance evaluation

Table 3. Summary statistics on predicted bedload sediment transport rate per unit width (qb).

Table 4. Comparison of performance of the different models, based on root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of RMSE to standard deviation of measured data (RSR)

Table 5. The p-values of a Nemenyi test of model performance difference (yellow cells show a statistically significant difference between models at the 0.05 significance level, while green cells show there is no statistically significant difference)

4. Discussion

4.1 Comparison of prediction performance achieved by empirical equations, tree-based models, and optimized deep learning algorithms

4.2. Effect of input variables on model prediction performance

5. Conclusions

Data

Author contributions

Supplemental Material

Acknowledgements

Disclosure statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 3. Summary statistics on predicted bedload sediment transport rate per unit width (q_b).