Publication Cover
Mathematical and Computer Modelling of Dynamical Systems
Methods, Tools and Applications in Engineering and Related Sciences
Volume 30, 2024 - Issue 1
305
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An iterative approach for deriving and solving an accurate regression equation

, &
Pages 73-90 | Received 04 Sep 2023, Accepted 25 Jan 2024, Published online: 03 Mar 2024

ABSTRACT

This paper introduces a method for deriving an accurate regression equation based on a set of any paired data, and a technique for solving the equation. For a practical example, we used five hundred seventy-one pairs of sediment concentration and river flow data to derive an accurate sediment rating equation. The graphs of the measured and predicted sediment concentrations matched each other, and data correlation showed Nash–Sutcliffe efficiency (NSE) of 0.9999860, coefficient of determination (R2) of 0.99998679, root mean square error (RMSE) of 0.0345, mean average error (MAE) of 0.0067, volume error (VE) of 1, and sum of square error (SSE) of 0.678631. To explain the technique of deriving and solving the accurate regression equation, separate files of video presentation and excel spreadsheet are provided as supplementary materials. In general, the method can be used to model any processes, and any calibration and validation processes can be addressed.

Introduction

The relationship between independent and dependent variables is governed by accepted scientific laws (Seber and Wild Citation2003) or it is expressed by mathematical, statistical, empirical, analytical, or numerical models. To find the best fit model to the measured data, parameters of the model can be estimated either through calibration or by regression analysis. The performance of the model is evaluated by using different statistical indicators.

Regression analysis, a technique for finding the relation among variables, is important to all scientific work where interpretations need to be drawn from measured data sets (Wu and Yen Citation1992). Authors (Seal Citation1967; Finney Citation1996; Barnes Citation1998; Galton Citation2001) highlighted the history related to the regression analysis, and authors (Fern andez-Delgado et al. Citation2019) provided an extensive experimental survey of regression methods.

If the relationship between dependent and independent variables is known or their relationship is defined by a chosen model, parameters of the model can be determined by the parametric regression method. The results of analysing data using a parametric model may heavily depend on the chosen model for regression and variance functions, moreover also on a possibly underlying preliminary transformation of the variables (Bunke et al. Citation1999). Non-parametric regression methods, on the other hand, have in general a slower rate of convergence, but need no explicit specification of the form of the regression function (Glad Citation1998); the resulting curve is hence completely determined by the data themselves (Glad Citation1998). Different types of parametric or non-parametric regression methods, and their descriptions or applications are given in (Linnet Citation1998; Seber and Wild Citation2003; Qian and Reckhow Citation2005; Li and Yin Citation2009; Lolli and Gasperini Citation2012; Wang and Du Citation2014; Yong Citation2014; Özsoy and Örkçü Citation2016; Fern andez-Delgado et al. Citation2019).

The regression methods which can be used for both parametric and non-parametric regression analysis are artificial neural network (Specht Citation1991; Wu and Yen Citation1992; Zhang et al. Citation1998) and fuzzy regression method (Bárdossy et al. Citation1993; Yang and Lin Citation2002; Hao and Chiang Citation2008). Compared to other regression approaches, the artificial neural network is more appropriate than other approaches (Wiese and Schaper Citation1993; Pao Citation2008; Rahman and Asadujjaman Citation2021). The artificial neural network was designed to study the behaviour of real, nonlinear, complex systems, and they are particularly effective in solving problems where the correlations between the dependent and independent variables are well-known (Kopal et al. Citation2022). However, their precise description by classical mathematical methods is too complicated, too simplified, or impossible (Du and Swamy Citation2014; Kopal et al. Citation2022) and they also embody much uncertainty and difficulty (Masters and Land Citation1997; Zhang et al. Citation1998; Tomandl and Schober Citation2001; Morala et al. Citation2021). The neural network model could be a more useful nonlinear regression tool if it successfully incorporates human knowledge (heuristics) and other regression techniques (Wang Citation1999).

In actual modelling, the underlying processes are generally complex and not well understood, this means that we have little or no idea about the form of the relationship (Seber and Wild Citation2003). For example, different authors indicate that the power function is a commonly used nonlinear regression approach to model the sediment rating curve (eg.,(Asselman Citation2000; Heng and Suetsugi Citation2014; Hapsari et al. Citation2019)). However, the error of the regression equation is very large. Therefore, finding an accurate regression method to derive an accurate regression equation based on a set of any paired data becomes important for the best accurate representation of any process. In this paper, we provide a procedure to derive a complex equation expressing the relationship between dependent and independent variables based on a set of any paired data.

2. Methodology

2.1. An iterative approach to derive an accurate regression equation

To arrive at iteration steps, let us begin from the following definition.

Definition 2.1.

For given values of paired variables S and Q, variables x and y are defined by

(1) y=i(bS)1/u+jQ(1)
(2) x=k(hQ)1/w+t(2)
where i, b, u, j, k, h, w and t are constants.

Let yy(x), where y(x) is the function.

Since a polynomial function can accommodate and generate negative value, positive value, or both negative and positive values, let us consider a polynomial function y(x)

(3) y(x)=anxn+an1xn1+an2xn2++c(3)

Therefore,

(4) y=y(x)+e(4)

where e is the error value

Substitute Equationequation 1 into 4

(5) i(bS)1/u+jQ=y(x)+e(5)

Rearrange equation 5

(6) S=1biuy(x)jQ+eu(6)

In Equationequation 6, variables y(x), Q and e are connected by plus and minus sign. It shows that values of variables y(x), Q or e have an individual effect on a value of variable S (i.e. if we use y(x) in place of Q and vice versa, or if we use Q in the place of e and vice versa, a value of S will be different). This is a reason why we defined x and y in the above way to arrive at Equationequation 6.

Let

(7) e=e1+e2+e3++ep1+ep(7)

Substitute Equationequation 7 into 6

(8) S=1biuy(x)jQ+e1+e2+e3+ep1+epu(8)

EquationEquation 8 represents an actual value of variable S. In Equationequation 8, if a value of error ep is the minimum tolerable error that could be ignored, then the sum of error values e1, e2, e3,….ep1 represents an approximate value of the total error e. Therefore, the predicted value of variable S (let us say Sp) is given by

(9) Sp=1biuy(x)jQ+e1+e2+e3++ep1u(9)

Therefore, the difference between S and Sp is an error, which is equal to Ep (i.e. Ep=SSp), where p1 refers to the number values of error e should be required to derive an accurate regression equation at p number of iteration steps. If there are p1 number of values of error e (i.e. e1, e2, e3ep1), there are also p1 number of values of corresponding error E (i.e. E1, E2, E3Ep1).

Logic is now if we are able to express error e as a function of error E, we can derive an accurate regression equation. This is because of both errors (i.e. e and E) is the function of variables Q and S. Therefore, we define an iterative procedure to approximate a value of error e based on a value of the corresponding error E. Let an approximate value of error e1, e2, e3ep1 be equal to r1, r2, r3rp1 respectively. Therefore, the following iteration steps are defined based on Equationequation 9 and the explanations above.

For the first iteration step (p=1), e0=0, E0=0 and r0=0. Therefore, the first predicted value of variable S (i.e. S1) is determined by

(10) S1=1biuy(x)jQu(10)

If S1S, no need to proceed to the next iteration step. If S1/ S, we proceed to the next iteration step.

For the second iteration step (p=2), e1, E1 and r1 is determined by

(11) e1=yy(x)(11)
(12) E1=SS1(12)
(13) r1=f1(E1)(13)

where f1 is the polynomial regression function between the values of e1 and E1. Therefore, at the second iteration step, the second predicted value of variable S (i.e. S2) is determined by

(14) S2=1biuy(x)jQ+r1u(14)

If S2S, no need to proceed to the next iteration step. If S2/ S, we proceed to the next iteration step.

For the third iteration step (p=3), e2, E2 and r2 is determined by

(15) e2=yy(x)+r1(15)
(16) E2=SS2(16)
(17) r2=f2(E2),(17)

where f2 is the polynomial regression function between the values of e2 and E2. Therefore, at the third iteration step, the third predicted value of variable S (i.e. S3) is determined by

(18) S3=1biuy(x)jQ+r1+r2u(18)

If S3S, no need to proceed to the next iteration step. If S3/ S, we proceed to the next iteration step.

For the fourth iteration steps (p=4), e3, E3 and r3 is determined by

(19) e3=yy(x)+r1+r2(19)
(20) E3=SS3(20)
(21) r3=f3(E3),(21)

where f3 is the polynomial regression function between the values of e3 and E3. Therefore, at the fourth iteration step, the fourth predicted value of variable S (i.e. S4) is determined by

(22) S4=1biuy(x)jQ+r1+r2+r3u(22)

If S4S, no need to proceed to the next iteration step. If S4/ S, we proceed to the next iteration step.

For the (p1)th iteration step, ep2, Ep2 and rp2 is determined by

(23) ep2=yy(x)+r1+r2+r3++rp3(23)
(24) Ep2=SSp2(24)
(25) rp2=fp2(Ep2),(25)

where fp2 is the polynomial regression function between the values of ep2 and Ep2. Therefore, at the (p1)th iteration step, the (p1)th predicted value of variable S (i.e. Sp1) is determined by

(26) Sp1=1biuy(x)jQ+r1+r2+r3++rp2u(26)

For the pth iteration step, ep1, Ep1 and rp1 is determined by

(27) ep1=yy(x)+r1+r2+r3++rp2(27)
(28) Ep1=SSp1(28)
(29) rp1=fp1(Ep1),(29)

where fp1 is the polynomial regression function between the values of ep1 and Ep1. Therefore, at the pth iteration step, the pth predicted value of variable S (i.e. Sp) is determined by

(30) Sp=1biuy(x)jQ+r1+r2+r3+rp2+rp1u(30)

2.2. Determining the final form of the accurate regression equation

Suppose at the pth iteration step, SSP. Then, the final form of an accurate regression equation is obtained through substitutions.

Substitute Equationequation 10 into 12

(31) E1=S1biuy(x)jQu(31)

Substitute Equationequation 14 into 16

(32) E2=S1biuy(x)jQ+r1u(32)

Substitute Equationequation 18 into 20

(33) E3=S1biuy(x)jQ+r1+r2u(33)

Substitute Equationequation 26 into 28

(34) Ep1=S1biuy(x)jQ+r1+r2+r3+rp2u(34)

Substitute Equationequation 31 into 13

(35) r1=f1S1biu(y(x)jQ)u(35)

Substitute Equationequation 32 into 17

(36) r2=f2S1biu(y(x)jQ+r1)u(36)

Substitute Equationequation 33 into 21

(37) r3=f2S1biu(y(x)jQ+r1+r2)u(37)

Substitute Equationequation 34 into 29

(38) rp1=fp1S1biu(y(x)jQ+r1+r2+r3+rp2)u(38)

Substitute Equationequation 35 into 36; Equationequations 35 and Equation36 into 37; Equationequations 35, Equation36 and Equation37 into 38 and so on. After all substitutions have been done one after the other, then the final resulting equation is very long. But, we can see that r1, r2, r3rp is the function of variables Q and S. For given values of paired variables, i, b, u, j, k, h, w and t are all constants. Therefore,

(39) r1+r2+r3++rp1=f1(Q,S)+f2(Q,S)+f3(Q,S)+fp1(Q,S)(39)

Substitute Equationequation 39 into 30

(40) Sp=1biuy(x)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+fp1(Q,S)u(40)

Substitute Equationequation 2 into 3

(41) y(x)=ank(hQ)1/w+tn+an1k(hQ)1/w+tn1+an2(k(hQ)1/w+t)n2++c(41)

From Equationequation 41, y(x) is the function of variable Q. Therefore,

(42) y(x)=f(Q)(42)

Substitute Equationequation 42 into 40

(43) Sp=1biuf(Q)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+fp1(Q,S)u(43)

Suppose at the pth iteration step, SSP. Therefore, Equationequation 43 is given by

(44) S1biuf(Q)jQ+f1(Q,S)+f2(Q,S)++fp1(Q,S)u(44)

EquationEquation 44 is the shorthand form of a very long equation. The substituting equations’ power constants u, w, and n make the equation complex and difficult to simplify. However, the substituting equations that form the complex equation are easily interconnected in an Excel spreadsheet or programmed in Matlab. As we can see from Equationequation 44, there are only two variables Q and S. Therefore, we can solve this equation for a given value of Q or S. A procedure to solve the equation is provided in Section 2.5.

2.3. Determining initial values for deriving an accurate regression equation

In Sections 2.1 and 2.2, we showed the steps to derive and determine the final form of the accurate regression equation based on values of paired variables S and Q. To start deriving the equation based on the values of the paired variables S and Q, we should have to first determine the constants (see Equationequations 1 and Equation2). The polynomial function (see Equationequation 3) directly describes the relationship between variables x and y, but it indirectly describes the relationship between variables S and Q. Therefore, for given values of paired variables S and Q, we find values of constants i, b, u, j, k, h, w, and t for Equationequations 1 and Equation2 such that plots of x versus y yield a smooth curve of a polynomial function. Accordingly, once all values of constants are known, the initial and final values of variables will be determined by following the iteration steps above.

2.4. Deriving an accurate sediment rating equation

In above sections, we indicated the general directions showing how to derive and determine the final form of the accurate regression equation, and we also indicated the direction showing how to determine the initial values to start deriving the equation. For a practical example, we use sediment concentration and corresponding river or streamflow data (see and ) to derive an accurate sediment rating equation. In the table, suspended sediment concentration data is represented by variable S, whereas flow data is represented by variable Q.

Table 1. Sediment concentration versus river or streamflow data.

Table 2. Sediment concentration versus river or streamflow data.

To make it clear, we use the following steps to derive an accurate sediment rating equation based on the above pairs of sediment concentration and river or streamflow data.

  1. For given values of paired variables S and Q, estimate constants i, b, u, j, k, h, w, and t such that plots of x versus y yields a smooth curve of polynomial function (refer to Equationequations 1 and Equation2)

  2. Choose a polynomial regression function that fits the plots of x versus y

  3. From the regression equation in step 2, find the constants of Equationequation 3

  4. Calculate y by using Equationequation 1

  5. Calculate x by using Equationequation 2

  6. Calculate y(x) based on steps 3 and 5

  7. Calculate S1 by using Equationequation 10, where S1 represents the first predicted value. Plot graphs of measured (S) and predicted (S1) values. If the graphs do not match each other, then we proceed to the next iteration step.

  8. Calculate e1 by using Equationequation 11

  9. Calculate E1 by using Equationequation 12

  10. Consider a polynomial regression function to correlate e1 and E1

  11. Calculate r1 by using the regression equation from step 10 (i.e. refer to Equationequation 35)

  12. Replace the calculated value of r1 from step 11 in Equationequation 14

  13. Calculate S2 by using Equationequation 14, where S2 represents the second predicted value. Plot graphs of measured (S) and predicted (S2) values. If the graphs do not match each other, then proceed to the next iteration step.

  14. Replace the calculated value of r1 from step 11 in Equationequation 15

  15. Then, calculate e2 by using Equationequation 15

  16. Calculate E2 by using Equationequation 16

  17. Consider a polynomial regression function to correlate e2 and E2

  18. Calculate r2 by using the regression equation from step 17 (i.e. refer to Equationequation 36)

  19. Replace the calculated value of r1 from step 11, and the calculated value of r2 from step 18 in Equationequation 18

  20. Then, calculate S3 by using Equationequation 18, where S3 represents the third predicted value. Plot graphs of measured (S) and predicted (S3) values. If the graphs do not match each other, then we proceed to the next iteration step, and so on.

We repeat the same procedure to calculate a value of Sp by using Equationequation 30, where subscript p stands for number of iteration steps. During each iteration step, we plot graphs of the measured (S) and predicted (Sp) values. Our iteration procedure ends when the graphs almost match each other.

Based on the paired data given in the table 0, the values of the required constants (i.e. i, b, u, j, k, h, w, and t) and variables (i.e. y(x), r1, r2, r3, r4….r14) had been determined by following the above steps. The values of these constants and variables are given below. shows the graph of the original river or streamflow (Q) versus sediment concentration (S) data, and the graph of the transformed data (x versus y) (see Section 2.3).

Figure 1. The original and transformed data according to step one.

Figure 1. The original and transformed data according to step one.
(45) k=0.000001(45)
(46) h=1(46)
(47) w=2(47)
(48) t=0(48)
(49) i=0.0000111(49)
(50) b=0.0022(50)
(51) u=2(51)
(52) j=2(52)
(53) y(x)=1999999995091.56x2+0.10499849x+0.0000001(53)
(54) r1=(0.00008E12+1.011E1+0.004E1)105(54)
(55) r2=(1.052E2+0.006E2)106(55)
(56) r3=(0.002E32+1.089E30.018E3)106(56)
(57) r4=(1.042E4E4)106(57)
(58) r5=(0.007E52+1.096E50.006E5)106(58)
(59) r6=(0.001E62+1.047E6E6)106(59)
(60) r7=(0.02E72+1.143E70.004E7)106(60)
(61) r8=(0.005E82+1.066E8E8)106(61)
(62) r9=(0.013E92+1.075E9E9)106(62)
(63) r10=(0.026E102+1.09E100.001E10)106(63)
(64) r11=(0.026E112+1.102E11E11)106(64)
(65) r12=(0.001E122+1.04E12E12)106(65)
(66) r13=(0.115E132+1.139E13E13)106(66)
(67) r14=(0.007E142+1.048E14E14)106(67)

As the values of the above constants and variables were already determined, the final form of the accurate regression equation is obtained by direct substitutions (refer to Section 2.2). Therefore, the final form of the accurate sediment rating equation is given by

(68) S1biuf(Q)jQ+f1(Q,S)+f2(Q,S)++f14(Q,S)u(68)

For the final form of the equation, the graphs of measured (S)and predicted sediment concentration (Sp) matched each other (see ).

Figure 2. Graphs of measured (S) and predicted (Sp) sediment concentration.

Figure 2. Graphs of measured (S) and predicted (Sp) sediment concentration.

Since the final form of the equation is a very large and complex equation, the above values of the variables are easily interconnected in an excel spreadsheet or programmed in Matlab. Separate files of video presentation and Excel spreadsheet are provided as supplementary files.

2.5. Solving the accurate sediment rating equation

In the above section, we showed the procedures to derive the accurate sediment rating equation. For the paired suspended sediment concentration (S) and flow data (Q), we calculated each value of E1, E2, E3E14, and the corresponding value of e1, e2, e3e14, respectively. At the fifteenth iteration step, at the values of E14 and e14, we found that S15S. Therefore, the last remaining errors are E15 and e15. According to the steps above or Section 2.1, a value of E15 is determined by

(69) E15=SS15(69)

Based on equation 43

(70) S15=1biuf(Q)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+f14(Q,S)u(70)

Therefore,

(71) E15=S1biuf(Q)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+f14(Q,S)u(71)

Based on Equationequations 1, Equation27 and 39

(72) e15=f0(Q,S)f(Q)+f1(Q,S)+f2(Q,S)+f3(Q,S)++f14(Q,S)(72)

For each paired values of S and Q, there are corresponding values of E15 and e15. Now, we take the values of E15 and e15 as paired input data to derive another equation that relates E15 and e15 by following the above steps, and so on. To derive the equation based on paired values of E15 and e15, we calculate another values of Ep and ep (see the steps above). To avoid confusion, let us express these other values of Ep and ep in terms of Ep and ep, respectively. Therefore, we define the following relationship.

(73) IflimEp0(E1+Ep)=E1,thenE14+EpE14(73)

For the given paired data (S and Q), at the value of E14, we found that S15S. According to steps above or section 2.1, the value of E14 is determined by

(74) E14=SS14(74)

Consider equation 43

(75) S14=1biuf(Q)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+f13(Q,S)u(75)

Therefore,

(76) E14=S1biuf(Q)jQ+f1(Q,S)+f2(Q,S)+f3(Q,S)+f13(Q,S)u(76)

For each paired value of S and Q, there is a corresponding value of E14, which is a unique value. It is to mean that, for a given value of Q, there is only one value of S which results in a corresponding value of E14, minimum value or zero value of E14 (i.e. there is no possibility to have two different values of E14 for the same values of paired data). From the relationship (refer to Equationequation 73) to approximate a value of E14 for an unknown value of suspended sediment concentration or flow data, we keep on deriving a series of equations until a value of Ep is approximately zero or it is far apart from a value of E1. In this case, the value of Ep determines the accuracy of the approximation. Therefore, to estimate an unknown value of suspended sediment concentration for a given value of flow data, a value of suspended sediment concentration that results in the minimum value of EP is the solution.

Since the systems of equations forming the complex equation are very long, the separate files of the video presentation and Excel spreadsheet on deriving and solving the accurate sediment rating equation are provided as the supplementary files.

3. Results

The iterative approach for deriving an accurate regression equation based on values of paired variables is given in Section 2.1. The procedures to determine the final form of the accurate regression equation are given in Section 2.2. Accordingly, the shorthand form of the final accurate regression equation is given by

(77) S1biu(f(Q)jQ+f1(Q,S)+f2(Q,S)++fp1(Q,S))u,(77)

where, S and Q are variables, b, i, u and j are constants for given values of paired data.

The accurate sediment rating equation which was derived based on five hundred seventy-one number of records of suspended sediment concentration and flow data is given by

(78) S1biuf(Q)jQ+f1(Q,S)+f2(Q,S)++f14(Q,S)u(78)

The graphs of measured and predicted suspended sediment concentration matched each other (see ), and statistical measures for the data correlation are given in . The procedures to solve the accurate regression equation are given in Section 2.5.

4. Discussions

The relationship between the sediment concentration and flow was given by the complex equation (it was not polynomial or other kinds of known function). This equation may reflect the complex relationship between the dynamic behaviour of flow and sediment transport.

A power function is a commonly used non-linear regression approach for predicting sediment from a given flow data. However, a regression error is very large. The comparison of sediment prediction accuracy of the proposed regression equation and power function are given in . The proposed regression equation is very accurate. We can minimize a regression error as small as possible by increasing iteration steps.

Figure 3. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.069Q0.9576) for the hombole watershed in Ethiopia.

Figure 3. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.069Q0.9576) for the hombole watershed in Ethiopia.

Figure 4. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.2036Q0.5475) for the Gumera watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 4. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.2036Q0.5475) for the Gumera watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 5. Comparison of sediment prediction accuracy of proposed regression equation and the power function (S=0.659Q0.839) for the mojo watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 5. Comparison of sediment prediction accuracy of proposed regression equation and the power function (S=0.659Q0.839) for the mojo watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 6. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.1901Q0.1916) for the Gilgel Gibe 1 watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 6. Comparison of sediment prediction accuracy of the proposed regression equation and the power function (S=0.1901Q0.1916) for the Gilgel Gibe 1 watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Model calibration and validation are challenging tasks to apply a model for a particular purpose, even for further improvement of the model. For example, if we consider the Modified Universal Soil Loss Equation (MUSLE) or the improved MUSLE, finding the coefficient, soil erodibility, cover, and conservation practice factors of the MUSLE or the improved MUSLE through calibration is not a feasible approach (Tsige et al. Citation2022a, Citation2022b). This is because only a product effect of the coefficient and these factors is reflected in the MUSLE or the improved MUSLE rather than their individual effect during the calibration of sediment yield (Tsige et al. Citation2022a, Citation2022b). Therefore, the individual effect of model variables rather than their product effect on the engaged physical processes is important. Therefore, expressing the relationship between model variables in such a way that their individual effects can be seen on the engaged physical process is essential. The proposed regression method may play a significant role in this regard.

5. Conclusions

The accurate sediment rating equation was derived by following the proposed iteration steps. For the paired values of suspended sediment concentration (S) and flow (Q) data, the shorthand form of the final accurate sediment rating equation is given by

(79) S1biu(f(Q)jQ+f1(Q,S)+f2(Q,S)++f14(Q,S))u,(79)

where, b, i, u and j are constants for given values of paired data

In this paper, the polynomial regression functions were considered to derive very long and complex accurate regression equation. However, we can use any other known functions. And also, variables x and y were defined in such a way that individual effects of other variables can reflect on variable S (refer to section 2.1). However, we can define variables x and y in another way, and we follow the proposed iterative approach to derive an accurate regression equation.

The proposed iterative approach can be used to derive an accurate regression equation based on given values of paired variables. Therefore, the iterative approach can be used to model any processes, and any calibration and validation processes can be addressed.

In this paper, the iterative procedure is provided to solve the accurate regression equation. For further research, the analytical solution of the equation is recommended.

Supplemental material

Supplemental Material

Download MS Excel (5.6 MB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13873954.2024.2313014

Additional information

Funding

This research was funded by the German Academic Exchange Service and Universität der Bundeswehr München.

References

  • Asselman NEM. 2000. Fitting and interpretation of sediment rating curves. J Hydrol. 234(3):228–248. doi:10.1016/S0022-1694(00)00253-5.
  • Bárdossy A, Bogárdi I, Duckstein L. 1993. Theory and methodology: fuzzy nonlinear regression analysis of dose-response relationships. Eur J Oper Res. 66(1):36–51. doi:10.1016/0377-2217(93)90204-Z.
  • Barnes TJ. 1998. A history of regression: actors, networks, machines, and numbers. Envir & Plan. 30(2):203–223. doi:10.1068/a300203.
  • Bunke O, Droge B, Polzehl J. 1999. Model selection, transformations and variance estimation in nonlinear regression. Stat: A J Theo & Appl Stat. 33(3):197–240. doi:10.1080/02331889908802692.
  • Du KL, Swamy MNS. 2014. Neural networks and statistical learning. London, Springer. doi:10.1007/978-1-4471-5571-3.
  • Fernández-Delgado M.S, Sirsat M, Cernadas E, et al. 2019. An extensive experimental survey of regression methods. Neural Networks. 111:11–34. doi:10.1016/j.neunet.2018.12.010
  • Finney DJ. 1996. A note on the history of regression. J Appl Stat. 23(5):555–557. doi:10.1080/02664769624099.
  • Galton SJM. 2001. Pearson, and the Peas: a brief history of linear regression for statistics instructors. J Stat Educ, 9.
  • Glad IK. 1998. Parametrically guided non-parametric regression. Scand J Stat. 25(4):649–668. Available from https://www.jstor.org/stable/4616530.
  • Hao P, Chiang J. 2008. Fuzzy regression analysis by support vector learning approach. IEEE Trans Fuzzy Syst. 16(2):428–441. doi:10.1109/TFUZZ.2007.896359.
  • Hapsari D, Onishi T, Imaizumi F, et al. 2019. The use of sediment rating curve under its limitations to estimate the suspended load. Rev Agric Sci. 7:88–101. doi:10.7831/ras.7.0_88
  • Heng S, Suetsugi T. 2014. Comparison of regionalization approaches in parameterizing sediment rating curve in ungauged catchments for subsequent instantaneous sediment yield prediction. J Hydrol. 512:240–253. doi:10.1016/j.jhydrol.2014.03.003.
  • Kopal I, Labaj I, Vršková J, et al. 2022. A generalized regression neural network model for predicting the curing characteristics of carbon black-filled rubber blends. Polymers. 14(4):653. doi:10.3390/polym14040653
  • Li H, Yin G. 2009. Generalized method of moments estimation for linear regression with clustered failure time data. Biometrika. 96(2):293–306. doi:10.1093/biomet/asp005.
  • Linnet K. 1998. Performance of deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 44(5):1024–1031. doi:10.1093/clinchem/44.5.1024.
  • Lolli B, Gasperini P. 2012. A comparison among general orthogonal regression methods applied to earthquake magnitude conversions. Geophy J Int. 190(2):1135–1151. doi:10.1111/j.1365-246X.2012.05530.x.
  • Masters T, Land W A new training algorithm for the general regression neural network, 1997 IEEE International Conference on Systems, Man, and Cybernetics, Orlando, USA, 1997.
  • Morala P, Cifuentes JA, Lillo RE, et al. 2021. Towards a mathematical framework to inform neural network modelling via polynomial regression. Neural Networks. 142:57–72. doi:10.1016/j.neunet.2021.04.036.
  • Özsoy VS, Örkçü HH. 2016. Estimating the parameters of nonlinear regression models through Particle Swarm optimization. Gazi Univ J Sci. 29:187–199.
  • Pao H. 2008. A comparison of neural network and multiple regression analysis in modeling capital structure. Expert Syst Appl. 35(3):720–727. doi:10.1016/j.eswa.2007.07.018.
  • Qian SS, Reckhow KH. 2005. Nonlinear regression modeling of nutrient loads in streams: a Bayesian approach. Water Resour Res. 41.
  • Rahman M, Asadujjaman M Implementation of artificial neural network on regression analysis, 2021 5th annual systems modelling conference, Canberra, Australia. 2021;. doi:10.1109/SMC53803.2021.9569881.
  • Seal HL. 1967. Studies in the history of probability and statistics. XV: the historical development of the gauss linear model. Biometrika. 54(1):1–24.
  • Seber GAF, Wild CJ. 2003. Nonlinear regression. John Wiley & Sons, Inc., .Hoboken, New Jersey.
  • Specht DF. 1991. A general regression neural network. IEEE Trans Neural Net. 2(6):568–576. doi:10.1109/72.97934.
  • Tomandl D, Schober A. 2001. A modified general regression neural network (MGRNN) with new, efficient training algorithms as a robust ‘black box’-tool for data analysis. Neural Networks. 14(8):1023–1034. doi:10.1016/S0893-6080(01)00051-X.
  • Tsige MG, Malcherek A, Seleshi Y. 2022a. Estimating the best exponent and the best combination of the exponent and topographic factor of the modified universal soil loss equation under the hydro-climatic conditions of Ethiopia. Water. 14(9):1501. Available from https://www.mdpi.com/2073-4441/14/9/1501.
  • Tsige MG, Malcherek A, Seleshi Y. 2022b. Improving the modified universal soil loss equation by physical interpretation of its factors. Water. 14(9):1450. Available from https://www.mdpi.com/2073-4441/14/9/1450.
  • Wang F, Du T. 2014. Implementing support vector regression with differential evolution to forecast motherboard shipments. J Amer Math Soc. 41:3850–3855.
  • Wang S. 1999. Nonlinear regression: a hybrid model. Comput Oper Res. 26(8):799–817. doi:10.1016/S0305-0548(98)00088-4.
  • Wiese M, Schaper KJ. 1993. Application of neural networks in the QSAR analysis of percent effect biological data: comparison with adaptive least squares and nonlinear regression analysis. SAR and QSAR in Envir Res. 1(2–3):137–152. doi:10.1080/10629369308028825.
  • Wu FY, Yen KK. 1992. Applications of neural network in regression analysis. Comput Ind Eng. 23(1–4):93–95. doi:10.1016/0360-8352(92)90071-Q.
  • Yang M, Lin T. 2002. Fuzzy least-squares linear regression analysis for fuzzy input–output data. Fuzzy Sets Syst. 126(3):389–399. doi:10.1016/S0165-0114(01)00066-5.
  • Yong L. 2014. Novel global harmony search algorithm for least absolute deviation. J Appl Math. 2014:1–6. doi:10.1155/2014/632975.
  • Zhang G, Patuwo BE, Hu MY. 1998. Forecasting with artificial neural networks: the state of the art. Int J Forecasting. 14(1):35–62. doi:10.1016/S0169-2070(97)00044-7.