1,219
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Monthly rainfall forecasting modelling based on advanced machine learning methods: tropical region as case study

, , , , , , & show all
Article: 2243090 | Received 31 Mar 2023, Accepted 27 Jul 2023, Published online: 10 Aug 2023

Abstract

Existing forecasting methods employed for rainfall forecasting encounter many limitations, because the difficulty of the underlying mathematical proceeding in dealing with the patterning and imitation of rainfall data. This study attempts to provide a robust methodology for detecting the nonlinearity of the rainfall pattern by integrating several optimizer algorithms with an Artificial Neural Network (ANN). The Artificial Bee Colony, Particle Swarm Optimization, and Imperialism Competitive Algorithm have been integrated to improve and optimize the internal parameters of the ANN method. In Malaysia, a real-world case study was set up, and the ANN model was created using 54 years (1967–2020) worth of local monthly data. The created artificial neural network method is being utilized for rainfall forecasting in real-time. A variety of network types were evaluated with various input information types with the goal of producing accurate rainfall forecasts. Statistical analysis was conducted using various statistical indicators to evaluate the model’s accuracy in forecasting rainfall. The study revealed that the model based on the integration of the Imperial Competitive Algorithm with Artificial Neural Network (ICA-ANN) outperformed other predictive models. The results confirmed that the proposed model (ICA-ANN) is a promising predictive model for forecasting monthly rainfall with high accuracy.

1. Introduction

For the planning and management of water resources, accurate rainfall data is critical. Rainfall also has a significant impact on transportation, sewage systems, as well as other human activities in metropolitan settings. Nonetheless, due to the complicated of climatic processes that generate rainfall and the huge variety of variability across multiple spatiotemporal scales, rainfall is one of the most challenging parts of the hydrological cycle to comprehend and analyze (Asadi et al., Citation2013; Kashiwao et al., Citation2017). There have been many improvements in weather prediction in the last few decades, but reliable rainfall forecasting remains one of the most difficult tasks in operational hydrology (Ramal et al., Citation2022; Whigham & Crapper, Citation2001).

It should be noted that the state of Malaysia is located in the tropical region. Therefore, the rainfall parameter is a determining factor in many fields such as agriculture and water resources management. As a result, efforts should be made to improve rainfall forecasts using robust predictive models. However, rainfall forecasting is currently unsatisfactory, owing to inaccuracies in baseline circumstances, parameterization techniques for subscale phenomena, and spatial resolution limitations (Zin et al., Citation2013).

Researchers put forth a number of inadequate predicting strategies. As a result, it is essential to understand the weather from a wider angle. Forecasting rainfall is difficult because of the seasonal rainfall and its quantity (Adham et al., Citation2019). Empirical and dynamical methods are the two main ways used to forecast rainfall. The initial method made predictions using the connections between historical data from the past. For the purpose of predicting the climate, regression and artificial neural networks were used (kadhim et al., Citation2022). The second method forecasted seasonal rainfall using statistical and physical models. Researchers in many fields have recently developed successful prediction and classification models using machine learning (ML) methods like regression, support vector machines (SVM), and k-nearest neighbours (KNN) (Coulibaly et al., Citation2005; Kibler & Langley, Citation1988).

Despite significant technological improvements, reliable and precise forecasting has continued to be a major cause for concern, owing to the magnitude of the problem. To develop decent rainfall forecasting with improved accuracy, a range of strategies have been used, among them a variety of dynamical and empirical or combination techniques (Davolio et al., Citation2008). Rainfall can be forecast using dynamical approaches, which rely on physical methods based on a system of equations, like the Multi-variable Polynomial Regression (MPR) developed by Zaw and Naing (Citation2008) (Zaw & Naing, Citation2008). Scientists have also devised mathematical methods to estimate pressure and temperature variations. The empirical techniques are depended on the test of historical data and its association with numerous atmospheric factors across a certain zone (Poornima & Pushpalatha, Citation2019).

Artificial Neural Networks (ANN), Regression, Fuzzy Logic and deep-learning methods are widely employed by past studies for data fitting and time series data analytics. Predictions of climatic variables and hydrological parameters using ANN have become increasingly popular over the past few decades. For the first time, (French et al., Citation1992) utilized the feed-forward ANN method to forecast rainfall. (Kajornrit et al., Citation2012) created a fuzzy interference system for an ANN model to forecast omitted monthly rainfall that was tested on multiple stations in Thailand’s northeast region and provided satisfactory results. (Patel & Joshi, Citation2017) created a rain-runoff simulation model using an efficient ANN model. Other research such as (Sharda & Patil, Citation1992; Toth et al., Citation2000; Wu et al., Citation2018) suggests that employing an appropriate ANN model could improve prediction. (Hamzaçebi, Citation2008) created the Seasonal Artificial Neural Network model (SANN), which was designed exclusively for predicting seasonal time series. By comparing to other techniques employed in the study, the improved network structure increased accuracy and reduced prediction error. Researchers have utilized a variety of ways to create several hybrid models for assessment (Solgi et al., Citation2014; Venkata Ramana et al., Citation2013).

It is possible that the non-stationary behaviour of time series would inundate a single ANN model in case the input variables were not pre-processed (Noori et al., Citation2011). Among the many preprocessing techniques available, the Wavelet Transformation (WT) is widely used to reveal previously hidden patterns in a time series. The effectiveness of neural network methods to forecast rainfall is improved by this pre-processed input. For the purpose of modelling rainfall and runoff, (Shoaib et al., Citation2014) developed wavelet-based hybrid network techniques, which they successfully applied to daily data spanning 10 years from the Brosna watershed in Ireland. To estimate water table depth, (Kisi & Shiri, Citation2011) created the wavelet-based neuro-fuzzy technique called DWT. (Venkata Ramana et al., Citation2013) used a hybrid model that combines ANN and wavelet approaches to anticipate rain and applied it to the data of monthly rainfall. (Wang et al., Citation2022b) improved the performance of machine-learning methods to forecast river streamflow. Five different models have been proposed to obtain accurate forecasting results. The study concluded that integrating Wavelet algorithm with Long Short-Term Memory method could provide high level accuracy in forecasting river streamflow. Extreme Learning Machine (ELM) method was employed to forecast hydrological drought by (Wang et al., Citation2022a). The results indicated that the suggested model outperformed other predictive models.

Duong and Bui (Citation2018) proposed the Long Short-Term Memory (LSTM) model for use in forecasting rainfall measurements taken at Vietnam’s Ca Mau meteorological station. When LSTM was evaluated by comparing it with seasonal artificial neural network (SANN) and ANN models, it was discovered that LSTM was the most accurate. Intensified LSTM was developed by (Poornima & Pushpalatha, Citation2019) to accurately forecast rainfall records. Rainfall-runoff was simulated using ANN and LSTM models by (Hu et al., Citation2018) in the Fen River basin from 1973 to 2003. For the purpose of predicting univariate meteorological parameters using intermediary parameters, (Salman et al., Citation2018) developed single-layer and multi-layer Long Short-Term Memory (LSTM) methods for the Hang Nadim Indonesia Airport weather station. Machine Learning based on the LSTM method was applied to forecast rainfall distribution by (Chen et al., Citation2022). The statistical indicators revealed that the proposed model has significant ability and reliability in forecasting monthly rainfall parameter. Adaptive Neuro Fuzzy Inference System (ANFIS) has also been employed to forecast rainfall parameter by (Choubin et al., Citation2017). (Choubin et al., Citation2018) used classification and regression trees (CART) model for rainfall forecasting. The results showed that the CART model attained satisfactory forecasting results compared to other models. On the other, (Singh et al., Citation2022) study integrating several machine learning models with genetic algorithm. The study found that the forecasting accuracy was improved when using the hybrid model compared to the single models.

Data-driven models and artificial intelligent methods, including neural network approach are being put to use in a variety of fields for the estimation and solution of various geotechnical issues has been considerable (Bozorg-Haddad et al., Citation2016; Rajaee et al., Citation2009). Despite its capacity to solve a variety of difficult engineering problems, ANN has certain drawbacks, including lacklustre progress in training and the possibility of becoming stuck in a nearby minimum. It has been suggested that several optimization algorithms, be used to alter the bias and weight of ANN in order to optimize their effectiveness capability. The current research attempts to enhance the ANN model’s performance by integrating it with reliable optimizers such as the artificial bee colony (ABC), genetic algorithm (GA), and particle swarm optimization (PSO).

The main novelty of the current study is the introduction of new hybrid model using the integration between ANN and a new optimization model namely Imperialism Competitive Algorithm (ICA) as a modelling approach to provide reliable rainfall forecasting accuracy as it has been proven its effectiveness in other engineering applications. In fact, the ICA as an optimization algorithm showed a new procedure mathematical process achieving the global optima rather than trapping in a local optima in the searching domain. The ICA method has demonstrated exceptional performance as an optimizer in a variety of applications and technical specialties. By combining the ICA technique with machine learning, various factors can be investigated to enhance the model. This is because using the ICA method as a springboard to develop an early solution or as a tool to encourage exploration or exploitation could enhance the search process for the data’s ideal condition. Utilizing the ICA simultaneously is one consideration that should be made in order to speed up the search and produce solutions that are of a higher quality.

The mixture of these methods has been proposed to optimize ANN as much as feasible, and they have proven effectiveness in performing their defined duties in several sectors of engineering and science. The current article employs three hybrid intelligent systems to forecast rainfall parameters. They’ve been changed in such a way that they can forecast rainfall data across several zones. Additionally, this study compares the findings produced by these three built-in hybrid models in order to determine which models are best suited for rainfall data in the tropical region. The fundamentals of the used intelligent techniques will be described next, followed by some information about field rainfall tests and databases.

2. Case study

Peninsular Malaysia, often known as West Malaysia (100°E-104°E, 1°N-7°N), has area around of (131598) square kilometres (AlDahoul et al. Citation2022). Figure shows the case study area and the location of the rainfall stations. In fact, there are two main monsoons in Malaysia that are the Northeast and the Southwest. The first monsoon starts from November to February while the second monsoon lasts from May to August. Rainfall often falls in the central region during the monsoon seasons (March to April and September to October). This analysis used monthly rainfall data from the Malaysian Department of Irrigation and Drainage (DID) (Figure ). The observed data records for the Ldg Getah Kukup Pontian station span 54 years, from January 1967 to December 2020. The station is situated at longitude 1°21′00″N and latitude 103°27′36″E in the Pontian Kecil River Basin in Johor, southern Malaysia. Rainfall in the 135 square kilometre catchment area is difficult to forecast because of large amounts of rainfall during the southwest and northeast monsoons, the two main monsoon seasons (Irwanto et al., Citation2014; Zainal et al., Citation2002).

Figure 1. The case study region and rainfall gauging stations.

Figure 1. The case study region and rainfall gauging stations.

3. Methodology

3.1 Artificial neural network

In essence, a mathematical model called an artificial neural network (ANN) imitates the cognitive processes of the human brain. In practice, ANN simulates one or more output(s) in order to reveal the complex interrelationships between variables (Allawi & El-Shafie, Citation2016; Tabari et al., Citation2010). An ANN model is composed of three main components: activation function, connected links and mathematical rules. These components can be selected based on the type of problem being studied, and their weights are changed to train the network. The output layer, multiple hidden layers, and many input layers represent the three successive layers of nodes that form a multi-layer perceptron (MLP), which is a kind of feeder neural network. Each one consists of a unique collection of neurons with unique mathematical relationships (Allawi et al., Citation2018).

Numerous methods have been developed to train neural networks iteratively (Ali et al., Citation2022). Back propagation (BP) is the method that is most frequently employed when using MLP-learning algorithms (Kamel et al., Citation2021). By switching the input weights between the mathematical nodes succeeding layers, this method produces a single output at the end of each epoch. Each node’s net weighted input netj is calculated as the following: (1) netj=i=1nxiwiθ(1) Two factors represented by xi while the symbol (wi) indicate an incoming signals and weights for the ith node. The total number of the input variables is represented by symbol (n). The threshold used in the layers is determined by such option. An activation function, like a sigmoid, linear function, or step, is applied to this net input. This is considered to as a training method in technical terms. The output error is then calculated by comparing the projected outputs to the actual outputs (Sulaiman et al., Citation2019; Wu et al., Citation2014; Yaseen et al., Citation2015). Eventually, the generated mistake is transmitted backwards through the network in order to fine-tune the individual weights. Backward pass is the name given to this stage. The performance of network is calculate using a suitable statistical function, such as root-mean-square error (RMSE), throughout each training epoch. The weight update process continues until the RMSE falls below a predetermined threshold. It’s vital to note that insufficient number of datasets can cause over fitting during the ANN model’s training procedure (Hatem, Citation2022; Wu et al., Citation2005).

3.2 Imperialism competitive algorithm

Atashpaz-Gargari and Lucas (Atashpaz-Gargari & Lucas, Citation2007) developed the imperialism competitive algorithm (ICA), a population-based global search algorithm that can be used to solve science and engineering challenges. As other methods, ICA begins with a random nation (initial population) to solve an optimization problem. When N countries (also known as Ncountry) are established, the ones with the objective functions or lowest costs, such as RMSE, become imperialists (Nimp), while the rest become colonies (Ncol). Because all colonies are distributed and match to the empires’ initial power, it is clear that addition colonies attracts more intense imperialists (lowests RMSE). Several method operators that act in ICA include assimilation, revolution and competition, to name a few. A colony can acquire a superior state than its imperialist state through assimilation and revolt, and thereby take over the place of the prior imperialist’s entire imperial controller.

Competitive operators, on the other hand, desire to take in additional colonies, so each empire tries to snare colonies from other. At such a level, any hegemony colony might have at least one of the weaker empires, so the strength of the empire is the most important factor at such moment. The process will be complete when all other empires, except for the strongest, have collapsed, or the user-specified termination criterion has been reached (desirable RMSE or maximum number of decades or else). It’s worth noting that the number of decades in ICA, denoted Ndecade, is theoretically equivalent to the number of repeats in the PSO method. This work, however, will not present a mathematical description of ICA.

3.3 Particle swarm optimization

The optimization of particle swarm computation approach is developing by Kennedy and Eberhart (Kennedy & Eberhart, Citation1997) for optimum continuous problems (PSO). PSO is means an approach for nonlinear inspired by social systems like fish shoals. PSO is a method consisting of a number of randomly arranged particles. Another iteration stage of the PSO algorithm is the search for the optimum value/goal. The particles modify their placements in this stage based on their own experiences as well as those of other particles (Kennedy & Eberhart, Citation1997).

To reach the ideal position, all particles follow their own personal best position (PBEST) and the global best position (GBEST) via other particles. Each particle wants to move closer to its own PBEST and GBEST throughout training, which are determined by a new velocity term and the separation between its best locations during the learning stage. As a result, the new velocity value determines the new position of the particle in the subsequent iteration (Armaghani et al., Citation2014). Equation (Equation2) and Equation (Equation3) are utilized in the PSO to determine the real movement of the particles through its velocity vector, whereas EquationEq. 3 adjusts the vector to the PBEST and GBEST: (2) Vnew=V+C1×(PBESTP)+C2×(GBESTP)(2) (3) Pnew=P+Vnew(3) Where, the new particle velocity and current velocity represented by Vnew and V respectively. Pnew and P are a new and current position of the particle, C1 and C2 indicate to the PBEST and GBEST coefficients for each particle.

3.4 Artificial bee colony

The artificial bee colony, or ABC (invented by Karaboga, Citation2005), is a common optimization technique that focuses on the bees’s social lives, with each bee being a simple element. These components or individuals can be merged to create a complex bee colony, and this colony is then able to create the combined system for finding syrup from flowers. Every colony consists of three different bee colonies, each of them have set of responsibilities. The first group, dubbed scout, is on the lookout for fresh sources. Their responsibilities include exploring for the periphery surroundings, then looking for a source of food, once they discover it they will keep it in their memory.

After each bee has been brought back to the hive, the hive will engage in a waggle dance with the other bees to share knowledge about the sources and to reward some of them for using the sources. The second category of bees discovered in hives are the paid bees. Utilizing the available food sources is the responsibility of bees that are hired. Spectator bees are the last category of hive bees. At the conclusion of the waggle dance, where the bees exchange information, these bees in the hive choose the source depending on the fitness of the response to exploit (Nozohour-leilabady & Fazelabdolabadi, Citation2016). ABC algorithms have been utilized in the areas of science and engineering to solve a variety of problems, including optimization and the optimizing of ANN weights and biases to reduce system errors (de Oliveira et al., Citation2009; Nozohour-leilabady & Fazelabdolabadi, Citation2016).

3.5 Hybrid model

Back propagation is a local search learning technique; hence ANNs’ optimal search process rarely produces good outcomes. Therefore, using OAs to modify the bias and weight of the ANN may enhance forecasting of ANN performance. ANNs often have local minimums that are more likely to converge, while OAs can also find global minimums. Therefore, hybrid systems such as imperial, swarm and bee take advantage of all the research characteristic of the ANN method. In these hybrid models, ANN seeks outcomes that are perfect for the system, whereas ICA, PSO and ABC hunt for a minimization problem in the search space.

3.6 Evaluation indicators

The efficacy of the proposed methods built for rainfall forecasting in this research that could be determine using a variety of statistical methodologies. The decision of which performance evaluation indicators to utilize is significantly influenced by the frameworks employed in a work and their results. This study employed a number of statistical tests, including the Root Mean Squared Error (RMSE), Mean Absolut Error (MAE), Coefficient of Determination (R2), percentage of relative error (%RE) and Nash Sutcliffe Efficiency (NSE), to evaluate the effectiveness of our novel approach (Nash & Sutcliffe, Citation1970). (4) R2=t=1n[((Ra)(Ra)¯)((Rf)(Rf)¯)] t=1n((Ra)(Ra¯))2t=1n((Rf)(Rf)¯)2(4) (5) RMSE=1Nt=1n((Rf)(Ra) )2(5) (6) NSE=1t=1n(RaRf)t=1n(RaRa¯)(6) (7) BIAS=t=1NRfRaN(7) (8) MAE =1Nt=1N|RfRa|(8) (9) \%RE= Rf Ra  Ra 100(9)

Where (Ra)¯ is average actual rainfall, n represents the number of data, and Ra and Rf are the actual and forecast rainfall records obtained by proposed models, respectively.

The R2 value refers to the degree of correlation between the forecasted values and observed. The degree of similarity between the two variables is represented by this coefficient of determination. A higher score (near to 1) suggests greater resemblance, whereas a lower value (closer to −1) indicates the opposite. RMSE or, the squared errors mean value square root, is a measurement of the average distance between the regression line and data points. The NSE is a metric that assesses a model’s ability to forecast measured data. NSE ranges from −γ to 1 (Cinar et al., Citation2018; Shoaib et al., Citation2014, Citation2016).

4. Results and discussion

Four different data-driven models were applied to forecast monthly rainfall data. Eleven summer and winter seasons in Malaysia, from 2010 to 2020, were included in the testing period. To help the data-driven models learn the different monsoon production mechanisms individually, the time series were divided into two seasons. The winter season includes the months of December, January and February. While, the months of June, July and August can be considered the summer season in Malaysia.

As there is ‘no rule of thumb’ for that, it is required to find the optimal internal parameters and pertinent input variables used to build and evaluate the final model in order to create an efficient machine-learning model. In actuality, the problem under examination will probably affect this optimization.

Figures compare actual rainfall with forecasts made using the ANN, PSO-ANN, ABC-ANN, and ICA-ANN models. Only one meteorological station was chosen for each summer and winter month under study to qualitatively analyze the forecast.

Figure 2. Actual rainfall versus forecasted data obtained by predictive models for the month of June.

Figure 2. Actual rainfall versus forecasted data obtained by predictive models for the month of June.

Figure 3. Actual rainfall versus forecasted data obtained by predictive models for the month of July.

Figure 3. Actual rainfall versus forecasted data obtained by predictive models for the month of July.

Figure 4. Actual rainfall versus forecasted data obtained by predictive models for the month of August.

Figure 4. Actual rainfall versus forecasted data obtained by predictive models for the month of August.

Figure 5. Actual rainfall versus forecasted data obtained by predictive models for the month of December.

Figure 5. Actual rainfall versus forecasted data obtained by predictive models for the month of December.

Figure –3 presented the pattern of the actual rainfall data compare to forecasted data during June and July months. It was found that the predictive methods ICA-ANN, PSO-ANN, and ABC-ANN forecast the actual rainfall periods but not necessarily their intensity. The proposed models accurately forecast the actual period of continued rainfall in June and July, while the ANN and ABC-ANN models underestimated or overestimated the values. However, in June, the PSO-ANN estimate was much higher than the actual number.

The pattern of the rainfall parameter against the forecasted data by the proposed models for the month of August is presented in Figure . It can be seen that all predictive models suffered to forecast the data of August 2016 and 2018, as they had the worst forecasts for such months. This may be due to the lack of such values during the training period. The results reveal that ANN and PSO-ANN made overestimated forecasts. Overall, the ICA-ANN model was better than other models in forecasting rainfall parameter for the month of August.

Figure displays the forecasted values versus actual data for the month of December. The pattern of the forecasted data obtained by ANN model is slightly far from actual data. It should be noted that ABC-ANN attained good forecasts at times during testing period. Figure shows that ICA-ANN has a high ability to simulate and follow the actual pattern.

During the study period, January has exceptionally the high amount of rainfall (Figure ). In this scenario, the ICA-ANN forecast model significantly underestimated the rainfall over most time, including the extreme peak. On the other hand, it correctly forecasted a lone occurrence that the other models did not. It can be seen that the PSO-ANN method only estimated few periods of rain and indicated high overestimation in the first three months of the testing period (Jan 2010, Jan 2011 and Jan 2012), while the ICA-ANN forecasted more accurate forecasting than other predictive methods.

Figure 6. Actual rainfall versus forecasted data obtained by predictive models for the month of January.

Figure 6. Actual rainfall versus forecasted data obtained by predictive models for the month of January.

Figure depicts the forecasted February rainfall during the study period comparing the actual data. It was found that the ANN overestimated the rainfall more than the other models. Both the ANN and ABC-ANN methods, particularly in February 2015 and February 2017, yielded poor results. The observed rainfall times, however, were forecasted by ICA-ANN, though their levels were slightly overstated. Overall, the ICA-ANN model forecasted rainfall data during the winter with higher accuracy than other predictive models.

Figure 7. Actual rainfall versus forecasted data obtained by predictive models for the month of February.

Figure 7. Actual rainfall versus forecasted data obtained by predictive models for the month of February.

It is important to note that cold fronts were responsible for the prolonged spells of rainfall that Malaysia experienced during the summer. In this type of scenario, both the ANN and ABC-ANN models captured the observed rainfall data, but the isolated peaks were not. It is anticipated that the ICA-ANN and PSO-ANN do capture the rainy times, despite the fact that the data used to train and validation the proposed were complicated. However, because of its nonlinearity, the ICA-ANN was able to forecast several isolated peaks and more rainy days than the other suggested models.

The analysis of statistical indicators is presented to give an overall picture of the ability to forecast rainfall using ANN, ABC-ANN, PSO-ANN and ICA-ANN for several months. For each summer months of the study period, Tables and provide NSE and BIAS values that are computed between the actual and projected rainfall data.

Table 1. Nash Sutcliffe Efficiency indicator for the predictive models during summer season in Malaysia.

Table 2. BIAS indicator for the predictive models during summer season in Malaysia.

In most cases, the values of the Nash coefficient were small when using ICA-ANN compared to those obtained from other predictive algorithms for the summertime (Table ). Only in July, PSO-ANN had a smaller Nash coefficient compared to the ANN model. The ABC-ANN and ANN models had relatively similar performance and behaviour during July and August. It can be seen that forecasting results based on the Nash index showed that ICA-ANN is a reliable tool for rainfall forecasting.

The mean forecast error (MFE) across all forecasts is represented by the BIAS. It reflects the overall average tendency (not the value) of a collection of the forecasting established using either confirmed analysis or observable data. Average errors that are positive or negative suggest overestimation (underestimation). Individual estimate inaccuracies may delete each other out because each estimate may be negatively or positively biased. Because of this, the BIAS does not by itself represent forecast accuracy. The mean estimate deviation from observed data is measured by the MAE, a non-negative quantity, whereas on the other hand. Additionally nonnegative, the RMSE is more susceptible to significant forecasting error than the MAE. For the good forecasting, the MAE, RMSE and BIAS magnitudes should be near Zero.

The RMSE and MAE for the ANN, ABC-ANN, PSO-ANN and ICA-ANN models throughout the dry summer are compared in Figure . The PSO-ANN model inaccuracies were significant, averaging 100 mm/month. In comparison to among models, the ICA-ANN model exhibits lower RMSE and MAE values. Such model provided reduced errors in the majority of stations, according to predictive statistics. The predictive model achieved low MAE and RMSE values during August month.

Figure 8. The statistical characteristics including RMSE and MAE obtained by predictive models.

Figure 8. The statistical characteristics including RMSE and MAE obtained by predictive models.

In certain instances, the MAE and RMSE values were generally lower, and Nash Sutcliffe Efficiency was greater than that of some models. Strong expectations are indicated by a poor BIAS, low RMSE, and low MAE. However, a strong predictor can have a low BIAS value but large MAE and RMSE values (if the forecasted values are not highly related with the actual data) or a relatively large BIAS value but lower MAE and RMSE values (in case the forecasting are good associated with the actual data).

Table presents the Nash Sutcliffe Efficiency (NSE) values of the proposed methods during winter season. The NSE values of the winter measurements period were, on average, higher in both December and February when using the ICA-ANN and PSO-ANN models, which performed better in these months. Comparing the ICA-ANN and PSO-ANN models, the ICA-ANN model achieved high NSE value for the wintertime.

Table 3. Nash Sutcliffe Efficiency indicator for the predictive models during winter season in Malaysia.

The BIAS readings for each month during winter season were unique as shown in Table . By highlighting to the ANN and ABC-ANN models, the December and January values were generally positive, indicating inflated expectations. The BIAS values for both ANN and ABC-ANN models were smaller in December than in January and February, most likely as a result of lower rainfall in the month. Compared with the other models, the BIAS for ICA-ANN forecasting was generally lower (also underestimated and overestimated).

Table 4. BIAS indicator for the predictive models during summer season in Malaysia.

The performance of the proposed models during winter season has been evaluated by common indicators (RMSE and MAE) as shown in Figure . It could be observed that the worst forecasting results have been obtained by PSO-ANN whiling all months. The performance of the ICA-ANN and ABC-ANN according to MAE is relatively close for December month. Overall, the ICA-ANN model forecasted rainfall data with high level accuracy compared to other predictive models, based on the RMSE and MAE indicators.

Figure 9. The statistical characteristics including RMSE and MAE obtained by predictive models.

Figure 9. The statistical characteristics including RMSE and MAE obtained by predictive models.

The ability score for these seasons shows that the RMSE and MAE of the ICA-ANN were lower during the summer. The June month saw the biggest improvement during the summer, RMSE and (MAE) errors have been reduced. The average forecasting errors (RMSE and MAE) for ICA-ANN forecast were (17.5 mm/month) during the winter season.

The performance score graphs for the ICA-ANN model over the summer and winter are shown in Figures . An increase in ICA-ANN performance over all predictive models, both in the summer and the winter, is indicated by a positive skill score. The forecast of the ICA-ANN model was near to the actual rainfall data in most values. It is noted that the model’s predictive ability during the winter is better than the model’s predictive ability during the summer season.

Figure 10. The correlation coefficient between actual and forecasted values during winter and summer season for ICA-ANN model.

Figure 10. The correlation coefficient between actual and forecasted values during winter and summer season for ICA-ANN model.

The best method for identifying changes in climatic and hydrologic data series is trend analysis. Nonparametric tests are frequently employed to find trends in time series (Uddin et al., Citation2022). The suggested model achieved a good trend and correlation between the forecasted and actual data. In this context, the percentage of relative error and trend line are shown in Figure . The results showed that ICA-ANN achieved satisfactory prediction results, with the maximum error being +28.01% in February. The best results were achieved when forecasting rainfall data for the month of June, where the error was less than 15%. The relative error indicator confirmed the reliability and ability of ICA-ANN in forecasting rainfall data.

Figure 11. The percentage relative error between the actual and forecasted data using ICA-ANN model.

Figure 11. The percentage relative error between the actual and forecasted data using ICA-ANN model.

To ensure the reliability of the proposed models in this study, comparison analyses have been conducted against the previous study. The study of (Beheshti et al., Citation2016) combined Chaos Adaptive Particle Swarm Optimization Algorithm (CAPSO) with the MLP to forecast rainfall within the Johor river basin. Basically, the comparison analyses between the current study and the study by (Beheshti et al., Citation2016) was carried out based on the common statistical indicators as shown in Table . The RMSE, MAE and NSE were adopted to compare the performance of the current best model with that best model proposed by previous study. Forecast results for January and June which are examples of winter and summer months, respectively, were used for comparison. It can be depicted that the current model (i.e. ICA-ANN) achieved lower RMSE and MAE compared to the previous model (i.e. CAPSO-MLP). Furthermore, the NSE indicator revealed that the ICA-ANN had higher agreement between forecasted and actual rainfall data than the CAPSO-MLP model. This comparison supported the reliability and capability of the ICA-ANN as a robust predictive model for rainfall forecasting in Johor river basin.

Table 5. Comparison the obtained best results with the previous study.

5. Conclusion

The study proposed for integrating between ANN model with three different optimization algorithms to forecast the rainfall parameter. The comparison was done between the ability of ANN, ABC-ANN, PSO-ANN and ICA-ANN models. Analysis for the models’ performance has been made using several statistical indicators. The suggested predictive models were applied for rainfall forecasting in Malaysia during summer (June, July and August) and winter (December, January and February) seasons.

An analysis of the models’ performance shows that the ANN and ABC-ANN achieved acceptable forecasting results in several times. The results revealed that the PSO-ANN did not succeed in providing good and accurate forecasts. On the other hand, the evaluation indicators imply that ICA-ANN yields more accurate forecasts.

It can be seen that the optimization algorithms have improved the forecasting results. In fact, the behavioural nature of ANN as a forecaster model turns it insufficient to produce adequate forecasts for a parameter with extremely non – linear physics. By integrating with optimization algorithm, the predictive model became more reliable and suitable tool for rainfall forecasting.

Despite of the proposed model (ICA-ANN) provided satisfactory forecasting results in the current research, the study has some limitations and shortcomings which can be addressed in the future studies. The suggested model suffered in mimicking the extreme values in some times. Accordingly, the present methodology needs further enhancement to capture the global optima. The following recommendation may improve model performance and its capability for rainfall forecasting. According to previous studies, it is necessary to conduct a study using different time scales to forecast rainfall coefficient. This method combines physical and climatic data into the proposed modelling, where variability has a significant impact on the rainfall pattern. For some regions, there is a significant influence with respect to special hydrological parameters such as runoff, evapotranspiration, and temperature. Thus, such parameters should be considered in the rainfall modelling for a better understanding of the rainfall coefficient pattern. Considering other machine-learning method such as (SVR, LSTM and CNN) for rainfall forecasting may yield satisfactory results.

Declarations

Ethical Approval

The manuscript is conducted within the ethical manner advised by the targeted journal.

Consent to publish

The research is scientifically consent to be published.

Authors contributions

Conceptualization: The first Author: Methodology: First, Sixth and Seventh author; Formal analysis and investigation: Second, third and fourth author; Review and editing: Fifth author and Eighth author.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Adham, A., Wesseling, J. G., Abed, R., Riksen, M., Ouessar, M., & Ritsema, C. J. (2019). Assessing the impact of climate change on rainwater harvesting in the Oum Zessar watershed in Southeastern Tunisia. Agricultural Water Management, 221, 131–140. https://doi.org/10.1016/j.agwat.2019.05.006
  • AlDahoul, N., Ahmed, A. N., Allawi, M. F., Sherif, M., Sefelnasr, A., Chau, K.-w., & El-Shafie, A. (2022). A comparison of machine learning models for suspended sediment load classification. Engineering Applications of Computational Fluid Mechanics, 16(1), 1211–1232. http://www.tandfonline.com/action/authorSubmission?journalCode=tcfm20&page=instructions. https://doi.org/10.1080/19942060.2022.2073565
  • Ali, A., bin Waheed, U., Ashiq, M., AL Asta, M. S., & Khorram, M. (2022). Machine learning model for estimation of local scour depth around cylindrical bridge piers. Iraqi Journal of Civil Engineering, 16, 1–13. https://doi.org/10.37650/ijce.2022.160201
  • Allawi, M. F., & El-Shafie, A. (2016). Utilizing RBF-NN and ANFIS methods for multi-lead ahead prediction model of evaporation from reservoir. Water Resources Management, 30(13), 4773–4788. https://doi.org/10.1007/s11269-016-1452-1
  • Allawi, M. F., Jaafar, O., Mohamad Hamzah, F., Abdullah, S. M. S., & El-shafie, A. (2018). Review on applications of artificial intelligence methods for dam and reservoir-hydro-environment models. Environmental Science and Pollution Research, 25(14), 13446–13469. https://doi.org/10.1007/s11356-018-1867-8
  • Armaghani, D. J., Hajihassani, M., Mohamad, E. T., Marto, A., & Noorani, S. A. (2014). Blasting-induced flyrock and ground vibration prediction through an expert artificial neural network based on particle swarm optimization. Arabian Journal of Geosciences, 7(12), 5383–5396. https://doi.org/10.1007/s12517-013-1174-0
  • Asadi, S., Shahrabi, J., Abbaszadeh, P., & Tabanmehr, S. (2013). A new hybrid artificial neural networks for rainfall-runoff process modeling. Neurocomputing, 121, 470–480. https://doi.org/10.1016/j.neucom.2013.05.023
  • Atashpaz-Gargari, E., & Lucas, C. (2007). Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition. 2007 IEEE Congress on Evolutionary Computation, Singapore, (pp. 4661–4667). https://doi.org/10.1109/CEC.2007.4425083
  • Beheshti, Z., Firouzi, M., Shamsuddin, S. M., Zibarzani, M., & Yusop, Z. (2016). A new rainfall forecasting model using the CAPSO algorithm and an artificial neural network. Neural Computing and Applications, 27(8), 2551–2565. https://doi.org/10.1007/s00521-015-2024-7
  • Bozorg-Haddad, O., Zarezadeh-Mehrizi, M., Abdi-Dehkordi, M., Loáiciga, H. A., & Mariño, M. A. (2016). A self-tuning ANN model for simulation and forecasting of surface flows. Water Resources Management, 30(9), 2907–2929. https://doi.org/10.1007/s11269-016-1301-2
  • Chen, C., Zhang, Q., Kashani, M. H., Jun, C., Bateni, S. M., Band, S. S., Dash, S. S., & Chau, K.-W. (2022). Forecast of rainfall distribution based on fixed sliding window long short-term memory. Engineering Applications of Computational Fluid Mechanics, 16, 248–261. https://doi.org/10.1080/19942060.2021.2009374
  • Choubin, B., Malekian, A., Samadi, S., Khalighi-Sigaroodi, S., & Sajedi-Hosseini, F. (2017). An ensemble forecast of semi-arid rainfall using large-scale climate predictors. Meteorological Applications, 24(3), 376–386. https://doi.org/10.1002/met.1635
  • Choubin, B., Zehtabian, G., Azareh, A., Rafiei-Sardooi, E., Sajedi-Hosseini, F., & Kişi, Ö. (2018). Precipitation forecasting using classification and regression trees (CART) model: A comparative study of different approaches. Environmental Earth Sciences, 77(8). https://doi.org/10.1007/s12665-018-7498-z
  • Cinar, Y. G., Mirisaee, H., Goswami, P., Gaussier, E., & Aït-Bachir, A. (2018). Period-aware content attention RNNs for time series forecasting with missing values. Neurocomputing, 312, 177–186. https://doi.org/10.1016/j.neucom.2018.05.090
  • Coulibaly, P., Haché, M., Fortin, V., & Bobée, B. (2005). Improving daily reservoir inflow forecasts with model combination. Journal of Hydrologic Engineering / American Society of Civil Engineers, Water Resources Engineering Division. https://doi.org/10.1016/(ASCE)1084-0699(2005)10:2(91)
  • Davolio, S., Miglietta, M. M., Diomede, T., Marsigli, C., Morgillo, A., & Moscatello, A. (2008). A meteo-hydrological prediction system based on a multi-model approach for precipitation forecasting. Natural Hazards and Earth System Sciences, 8(1), 143–159. https://doi.org/10.5194/nhess-8-143-2008
  • de Oliveira, I. M. S., Schirru, R., & de Medeiros, J. A. C. C. (2009). On the performance of an artificial Bee colony optimization algorithm applied to the accident diagnosis in a PWR nuclear power plant. Int Nucl Atl Conf – Ina 2009.
  • Duong, T. A., & Bui, M. D. (2018). Long short term memory for monthly rainfall prediction in Camau, VIETNAM.
  • French, M. N., Krajewski, W. F., & Cuykendall, R. R. (1992). Rainfall forecasting in space and time using a neural network. Journal of Hydrology, 137(1–4), 1–31. https://doi.org/10.1016/0022-1694(92)90046-X
  • Hamzaçebi, C. (2008). Improving artificial neural networks’ performance in seasonal time series forecasting. Information Sciences, 178(23), 4550–4559. https://doi.org/10.1016/j.ins.2008.07.024
  • Hatem, R. A. A. S. M. U. (2022). Forecasting the water level of the Euphrates river in western Iraq using artificial neural networks (ANN). Int J Des Nat Ecodynamics, 303–309. https://doi.org/10.18280/ijdne.170218
  • Hu, C., Wu, Q., Li, H., Jian, S., Li, N., & Lou, Z. (2018). Deep learning with a long short-term memory networks approach for rainfall-runoff simulation. Water, 10, 1543. https://doi.org/10.3390/w10111543
  • Irwanto, M., Gomesh, N., Mamat, M. R., & Yusoff, Y. M. (2014). Assessment of wind power generation potential in Perlis, Malaysia. Renewable and Sustainable Energy Reviews, 38, 296–308. https://doi.org/10.1016/j.rser.2014.05.075
  • kadhim, s., Mansor, K., & Abbood, M. (2022). Prediction of surface quality in electrical discharge machining process for 7024 AL alloy using artificial neural network model. Anbar Journal of Engineering Sciences, 13, 106–113. https://doi.org/10.37649/aengs.2022.176364
  • Kajornrit, J., Wong, K. W., & Fung, C. C. (2012, November 18th–21th). A comparative analysis of soft computing techniques used to estimate missing precipitation records. 19th Biennial Conference of the International Telecommunications Society (ITS): ‘Moving Forward with Future Technologies: Opening a Platform for All’, Bangkok, Thailand
  • Kamel, A. H., Afan, H. A., Sherif, M., Ahmed, A. N., & El-Shafie, A. (2021). RBFNN versus GRNN modeling approach for sub-surface evaporation rate prediction in arid region. Sustainable Computing: Informatics and Systems, 30, 100514. https://doi.org/10.1016/j.suscom.2021.100514
  • Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization (Technical Report-TR06, October, 2005). Univ Press Erciyes.
  • Kashiwao, T., Nakayama, K., Ando, S., Ikeda, K., Lee, M., & Bahadori, A. (2017). A neural network-based local rainfall prediction system using meteorological data on the internet: A case study using data from the Japan Meteorological Agency. Applied Soft Computing, 56, 317–330. https://doi.org/10.1016/j.asoc.2017.03.015
  • Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm algorithm. 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, (vol. 5, pp. 4104–4108). Orlando, FL, USA. https://doi.org/10.1109/ICSMC.1997.637339
  • Kibler, D., & Langley, P. (1988). Machine learning as an experimental science. 81–92.
  • Kisi, O., & Shiri, J. (2011). Precipitation forecasting using wavelet-genetic programming and wavelet-neuro-fuzzy conjunction models. Water Resour Manag, 25. https://doi.org/10.1007/s11269-011-9849-3
  • Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual model. Part 1–A discussion of principles. Journal of Hydrology, 10, 282–290.
  • Noori, R., Karbassi, A. R., Moghaddamnia, A., Han, D., Zokaei-Ashtiani, M. H., Farokhnia, A., & Gousheh, M. G. (2011). Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401(3-4), 177–189. https://doi.org/10.1016/j.jhydrol.2011.02.021
  • Nozohour-leilabady, B., & Fazelabdolabadi, B. (2016). On the application of artificial bee colony (ABC) algorithm for optimization of well placements in fractured reservoirs; efficiency comparison with the particle swarm optimization (PSO) methodology. Petroleum, 2(1), 79–89. https://doi.org/10.1016/j.petlm.2015.11.004
  • Patel, A., & Joshi, G. (2017). Annual rainfall-runoff modeling of Harnav watershed of a Sabarmati river basin, India using artificial neural network. International Journal of Advance Engineering and Research Development, 4), https://doi.org/10.21090/ijaerd.ncan01
  • Poornima, S., & Pushpalatha, M. (2019). Prediction of rainfall using intensified LSTM based recurrent neural network with weighted linear units. Atmosphere, 10, 668. https://doi.org/10.3390/atmos10110668
  • Rajaee, T., Mirbagheri, S. A., Zounemat-Kermani, M., & Nourani, V. (2009). Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models. Science of The Total Environment, 407(17), 4916–4927. https://doi.org/10.1016/j.scitotenv.2009.05.016
  • Ramal, M. M., Jalal, A. D., Sahab, M. F., & Yaseen, Z. M. (2022). River water turbidity removal using new natural coagulant aids: Case study of Euphrates river, Iraq. Water Supply, 22(3), 2721–2737. https://doi.org/10.2166/ws.2021.441
  • Salman, A. G., Heryadi, Y., Abdurahman, E., & Suparta, W. (2018). Single layer & multi-layer long short-term memory (LSTM) model with intermediate variables for weather forecasting. Procedia Computer Science.
  • Sharda, R., & Patil, R. B. (1992). Connectionist approach to time series prediction: An empirical test. Journal of Intelligent Manufacturing, 3(5), 317–323. https://doi.org/10.1007/BF01577272
  • Shoaib, M., Shamseldin, A. Y., & Melville, B. W. (2014). Comparative study of different wavelet based neural network models for rainfall-runoff modeling. Journal of Hydrology, 515), https://doi.org/10.1016/j.jhydrol.2014.04.055
  • Shoaib, M., Shamseldin, A. Y., Melville, B. W., & Khan, M. M. (2016). Hybrid wavelet neuro-fuzzy approach for rainfall-runoff modeling. Journal of Computing in Civil Engineering, 30(1), 4014125. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000457
  • Singh, V. K., Panda, K. C., Sagar, A., Al-Ansari, N., Duan, H.-F., Paramaguru, P. K., Vishwakarma, D. K., Kumar, A., Kumar, D., Kashyap, P. S., Singh, R. M., & Elbeltagi, A. (2022). Novel genetic algorithm (GA) based hybrid machine learning-pedotransfer function (ML-PTF) for prediction of spatial pattern of saturated hydraulic conductivity. Engineering Applications of Computational Fluid Mechanics, 16, 1082–1099. https://doi.org/10.1080/19942060.2022.2071994
  • Solgi, A., Nourani, V., & Pourhaghi, A. (2014). Forecasting daily precipitation using hybrid model of wavelet-artificial neural network and comparison with adaptive neurofuzzy inference system (case study: Verayneh station, Nahavand). Advances in Civil Engineering, 2014, 1–12. https://doi.org/10.1155/2014/279368
  • Sulaiman, S. O., Kamel, A. H., Sayl, K. N., & Alfadhel, M. Y. (2019). Water resources management and sustainability over the Western desert of Iraq. Environmental Earth Sciences, 78(16), 495. https://doi.org/10.1007/s12665-019-8510-y
  • Tabari, H., Marofi, S., & Sabziparvar, A. (2010). Estimation of daily pan evaporation using artificial neural network and multivariate non-linear regression. Irrig Sci.
  • Toth, E., Brath, A., & Montanari, A. (2000). Comparison of short-term rainfall prediction models for real-time flood forecasting. Journal of Hydrology, 239(1–4), 132–147. https://doi.org/10.1016/S0022-1694(00)00344-9
  • Uddin, M. J., Li, Y., Tamim, M. Y., Miah, M. B., & Ahmed, S. M. S. (2022). Extreme rainfall indices prediction with atmospheric parameters and ocean–atmospheric teleconnections using a random forest model. J Appl Meteorol Climatol, 61, 651–667. https://doi.org/10.1175/JAMC-D-21-0170.1
  • Venkata Ramana, R., Krishna, B., Kumar, S. R., & Pandey, N. G. (2013). Monthly rainfall prediction using wavelet neural network analysis. Water Resources Management, 27(10), 3697–3711. https://doi.org/10.1007/s11269-013-0374-4
  • Wang, G. C., Zhang, Q., Band, S. S., Dehghani, M., Chau, K. w., Tho, Q. T., Zhu, S., Samadianfard, S., & Mosavi, A. (2022). Monthly and seasonal hydrological drought forecasting using multiple extreme learning machine models. Engineering Applications of Computational Fluid Mechanics, 16, 1364–1381. https://doi.org/10.1080/19942060.2022.2089732
  • Wang, K., Band, S. S., Ameri, R., Biyari, M., Hai, T., Hsu, C.-C., Hadjouni, M., Elmannai, H., Chau, K.-W., & Mosavi, A. (2022). Performance improvement of machine learning models via wavelet theory in estimating monthly river streamflow. Engineering Applications of Computational Fluid Mechanics, 16, 1833–1848. https://doi.org/10.1080/19942060.2022.2119281
  • Whigham, P. A., & Crapper, P. F. (2001). Modelling rainfall-runoff using genetic programming. Mathematical and Computer Modelling, 33(6–7), 707–721. https://doi.org/10.1016/S0895-7177(00)00274-0
  • Wu, J. S., Han, J., Annambhotla, S., & Bryant, S. (2005). Artificial neural networks for forecasting watershed runoff and stream flows. Journal of Hydrologic Engineering, 10(3), 216–222. https://doi.org/10.1061/(ASCE)1084-0699(2005)10:3(216)
  • Wu, W., Dandy, G. C., & Maier, H. R. (2014). Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environmental Modelling & Software, 54. https://doi.org/10.1016/j.envsoft.2013.12.016
  • Wu, Y., Tan, H., Qin, L., Ran, B., & Jiang, Z. (2018). A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies, 90, 166–180. https://doi.org/10.1016/j.trc.2018.03.001
  • Yaseen, Z. M., El-shafie, A., Jaafar, O., Afan, H. A., & Sayl, K. N. (2015). Artificial intelligence based models for stream-flow forecasting: 2000–2015. Journal of Hydrology, 530, 829–844. https://doi.org/10.1016/j.jhydrol.2015.10.038
  • Zainal, A. R., Glover, I. A., & Watson, P. A. (2002). Rain rate and drop size distribution measurements in Malaysia. In Proceedings of IGARSS ‘93 – IEEE International Geoscience and Remote Sensing Symposium (vol. 1, pp. 309–311). Tokyo, Japan: IEEE. https://doi.org/10.1109/IGARSS.1993.322560
  • Zaw, W., & Naing, T. (2008). Empirical statistical modeling of rainfall prediction over Myanmar. World Academy of Science, Engineering and Technology, 2(10), 500–504.
  • Zin, W. Z. W., Jemain, A. A., & Ibrahim, K. (2013). Analysis of drought condition and risk in peninsular Malaysia using standardised precipitation index. Theoretical and Applied Climatology, 111(3-4), 559–568. https://doi.org/10.1007/s00704-012-0682-2