Full article: An iterative approach for deriving and solving an accurate regression equation

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This paper introduces a method for deriving an accurate regression equation based on a set of any paired data, and a technique for solving the equation. For a practical example, we used five hundred seventy-one pairs of sediment concentration and river flow data to derive an accurate sediment rating equation. The graphs of the measured and predicted sediment concentrations matched each other, and data correlation showed Nash–Sutcliffe efficiency (NSE) of 0.9999860, coefficient of determination ( $R^{2}$ ) of 0.99998679, root mean square error (RMSE) of 0.0345, mean average error (MAE) of 0.0067, volume error (VE) of 1, and sum of square error (SSE) of 0.678631. To explain the technique of deriving and solving the accurate regression equation, separate files of video presentation and excel spreadsheet are provided as supplementary materials. In general, the method can be used to model any processes, and any calibration and validation processes can be addressed.

KEYWORDS:

Introduction

The relationship between independent and dependent variables is governed by accepted scientific laws (Seber and Wild Citation2003) or it is expressed by mathematical, statistical, empirical, analytical, or numerical models. To find the best fit model to the measured data, parameters of the model can be estimated either through calibration or by regression analysis. The performance of the model is evaluated by using different statistical indicators.

Regression analysis, a technique for finding the relation among variables, is important to all scientific work where interpretations need to be drawn from measured data sets (Wu and Yen Citation1992). Authors (Seal Citation1967; Finney Citation1996; Barnes Citation1998; Galton Citation2001) highlighted the history related to the regression analysis, and authors (Fern andez-Delgado et al. Citation2019) provided an extensive experimental survey of regression methods.

If the relationship between dependent and independent variables is known or their relationship is defined by a chosen model, parameters of the model can be determined by the parametric regression method. The results of analysing data using a parametric model may heavily depend on the chosen model for regression and variance functions, moreover also on a possibly underlying preliminary transformation of the variables (Bunke et al. Citation1999). Non-parametric regression methods, on the other hand, have in general a slower rate of convergence, but need no explicit specification of the form of the regression function (Glad Citation1998); the resulting curve is hence completely determined by the data themselves (Glad Citation1998). Different types of parametric or non-parametric regression methods, and their descriptions or applications are given in (Linnet Citation1998; Seber and Wild Citation2003; Qian and Reckhow Citation2005; Li and Yin Citation2009; Lolli and Gasperini Citation2012; Wang and Du Citation2014; Yong Citation2014; Özsoy and Örkçü Citation2016; Fern andez-Delgado et al. Citation2019).

The regression methods which can be used for both parametric and non-parametric regression analysis are artificial neural network (Specht Citation1991; Wu and Yen Citation1992; Zhang et al. Citation1998) and fuzzy regression method (Bárdossy et al. Citation1993; Yang and Lin Citation2002; Hao and Chiang Citation2008). Compared to other regression approaches, the artificial neural network is more appropriate than other approaches (Wiese and Schaper Citation1993; Pao Citation2008; Rahman and Asadujjaman Citation2021). The artificial neural network was designed to study the behaviour of real, nonlinear, complex systems, and they are particularly effective in solving problems where the correlations between the dependent and independent variables are well-known (Kopal et al. Citation2022). However, their precise description by classical mathematical methods is too complicated, too simplified, or impossible (Du and Swamy Citation2014; Kopal et al. Citation2022) and they also embody much uncertainty and difficulty (Masters and Land Citation1997; Zhang et al. Citation1998; Tomandl and Schober Citation2001; Morala et al. Citation2021). The neural network model could be a more useful nonlinear regression tool if it successfully incorporates human knowledge (heuristics) and other regression techniques (Wang Citation1999).

In actual modelling, the underlying processes are generally complex and not well understood, this means that we have little or no idea about the form of the relationship (Seber and Wild Citation2003). For example, different authors indicate that the power function is a commonly used nonlinear regression approach to model the sediment rating curve (eg.,(Asselman Citation2000; Heng and Suetsugi Citation2014; Hapsari et al. Citation2019)). However, the error of the regression equation is very large. Therefore, finding an accurate regression method to derive an accurate regression equation based on a set of any paired data becomes important for the best accurate representation of any process. In this paper, we provide a procedure to derive a complex equation expressing the relationship between dependent and independent variables based on a set of any paired data.

2. Methodology

2.1. An iterative approach to derive an accurate regression equation

To arrive at iteration steps, let us begin from the following definition.

Definition 2.1.

For given values of paired variables $S$ and $Q$ , variables $x$ and $y$ are defined by

(1)

y = i (bS)^{1 / u} + jQ

(1)

(2)

x = k (hQ)^{1 / w} + t

(2)

where

i

b

u

j

k

h

w

and

t

are constants.

Let $y \approx y (x)$ , where $y (x)$ is the function.

Since a polynomial function can accommodate and generate negative value, positive value, or both negative and positive values, let us consider a polynomial function $y (x)$

(3)

y (x) = a_{n} x^{n} + a_{n - 1} x^{n - 1} + a_{n - 2} x^{n - 2} + \dots + c

(3)

Therefore,

(4)

y = y (x) + e

(4)

where $e$ is the error value

Substitute Equationequation 1(1) $y = i (bS)^{1 / u} + jQ$ (1) into 4

(5)

i (bS)^{1 / u} + jQ = y (x) + e

(5)

Rearrange equation 5

(6)

S = \frac{1}{b i^{u}} {(y (x) - jQ + e)}^{u}

(6)

In Equationequation 6(6) $S = \frac{1}{b i^{u}} {(y (x) - jQ + e)}^{u}$ (6) , variables $y (x)$ , $Q$ and $e$ are connected by plus and minus sign. It shows that values of variables $y (x)$ , $Q$ or $e$ have an individual effect on a value of variable $S$ (i.e. if we use $y (x$ ) in place of $Q$ and vice versa, or if we use $Q$ in the place of $e$ and vice versa, a value of $S$ will be different). This is a reason why we defined $x$ and $y$ in the above way to arrive at Equationequation 6(6) $S = \frac{1}{b i^{u}} {(y (x) - jQ + e)}^{u}$ (6) .

Let

(7)

e = e_{1} + e_{2} + e_{3} + \dots + e_{p - 1} + e_{p}

(7)

Substitute Equationequation 7(7) $e = e_{1} + e_{2} + e_{3} + \dots + e_{p - 1} + e_{p}$ (7) into 6

(8)

S = \frac{1}{b i^{u}} {(y (x) - j Q + e_{1} + e_{2} + e_{3} + \dots e_{p - 1} + e_{p})}^{u}

(8)

EquationEquation 8(8) $S = \frac{1}{b i^{u}} {(y (x) - j Q + e_{1} + e_{2} + e_{3} + \dots e_{p - 1} + e_{p})}^{u}$ (8) represents an actual value of variable $S$ . In Equationequation 8(8) $S = \frac{1}{b i^{u}} {(y (x) - j Q + e_{1} + e_{2} + e_{3} + \dots e_{p - 1} + e_{p})}^{u}$ (8) , if a value of error $e_{p}$ is the minimum tolerable error that could be ignored, then the sum of error values $e_{1}$ , $e_{2}$ , $e_{3}$ ,…. $e_{p - 1}$ represents an approximate value of the total error $e$ . Therefore, the predicted value of variable $S$ (let us say $S_{p}$ ) is given by

(9)

S_{p} = \frac{1}{b i^{u}} {(y (x) - j Q + e_{1} + e_{2} + e_{3} + \dots + e_{p - 1})}^{u}

(9)

Therefore, the difference between $S$ and $S_{p}$ is an error, which is equal to $E_{p}$ (i.e. $E_{p} = S - S_{p}$ ), where $p - 1$ refers to the number values of error $e$ should be required to derive an accurate regression equation at $p$ number of iteration steps. If there are $p - 1$ number of values of error $e$ (i.e. $e_{1}$ , $e_{2}$ , $e_{3}$ … $e_{p - 1}$ ), there are also $p - 1$ number of values of corresponding error $E$ (i.e. $E_{1}$ , $E_{2}$ , $E_{3}$ … $E_{p - 1}$ ).

Logic is now if we are able to express error $e$ as a function of error $E$ , we can derive an accurate regression equation. This is because of both errors (i.e. $e$ and $E$ ) is the function of variables $Q$ and $S$ . Therefore, we define an iterative procedure to approximate a value of error $e$ based on a value of the corresponding error $E$ . Let an approximate value of error $e_{1}$ , $e_{2}$ , $e_{3}$ … $e_{p - 1}$ be equal to $r_{1}$ , $r_{2}$ , $r_{3}$ … $r_{p - 1}$ respectively. Therefore, the following iteration steps are defined based on Equationequation 9(9) $S_{p} = \frac{1}{b i^{u}} {(y (x) - j Q + e_{1} + e_{2} + e_{3} + \dots + e_{p - 1})}^{u}$ (9) and the explanations above.

For the first iteration step ( $p = 1$ ), $e_{0} = 0$ , $E_{0} = 0$ and $r_{0} = 0$ . Therefore, the first predicted value of variable $S$ (i.e. $S_{1}$ ) is determined by

(10)

S_{1} = \frac{1}{b i^{u}} {(y (x) - jQ)}^{u}

(10)

If $S_{1} \approx S$ , no need to proceed to the next iteration step. If $S_{1} \approx / S$ , we proceed to the next iteration step.

For the second iteration step ( $p = 2$ ), $e_{1}$ , $E_{1}$ and $r_{1}$ is determined by

(11)

e_{1} = y - y (x)

(11)

(12)

E_{1} = S - S_{1}

(12)

(13)

r_{1} = f_{1} (E_{1})

(13)

where $f_{1}$ is the polynomial regression function between the values of $e_{1}$ and $E_{1}$ . Therefore, at the second iteration step, the second predicted value of variable $S$ (i.e. $S_{2}$ ) is determined by

(14)

S_{2} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}

(14)

If $S_{2} \approx S$ , no need to proceed to the next iteration step. If $S_{2} \approx / S$ , we proceed to the next iteration step.

For the third iteration step ( $p = 3$ ), $e_{2}$ , $E_{2}$ and $r_{2}$ is determined by

(15)

e_{2} = y - (y (x) + r_{1})

(15)

(16)

E_{2} = S - S_{2}

(16)

(17)

r_{2} = f_{2} (E_{2}),

(17)

where $f_{2}$ is the polynomial regression function between the values of $e_{2}$ and $E_{2}$ . Therefore, at the third iteration step, the third predicted value of variable $S$ (i.e. $S_{3}$ ) is determined by

(18)

S_{3} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}

(18)

If $S_{3} \approx S$ , no need to proceed to the next iteration step. If $S_{3} \approx / S$ , we proceed to the next iteration step.

For the fourth iteration steps ( $p = 4$ ), $e_{3}$ , $E_{3}$ and $r_{3}$ is determined by

(19)

e_{3} = y - (y (x) + r_{1} + r_{2})

(19)

(20)

E_{3} = S - S_{3}

(20)

(21)

r_{3} = f_{3} (E_{3}),

(21)

where $f_{3}$ is the polynomial regression function between the values of $e_{3}$ and $E_{3}$ . Therefore, at the fourth iteration step, the fourth predicted value of variable $S$ (i.e. $S_{4}$ ) is determined by

(22)

S_{4} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3})}^{u}

(22)

If $S_{4} \approx S$ , no need to proceed to the next iteration step. If $S_{4} \approx / S$ , we proceed to the next iteration step.

For the $(p - 1)$ th iteration step, $e_{p - 2}$ , $E_{p - 2}$ and $r_{p - 2}$ is determined by

(23)

e_{p - 2} = y - (y (x) + r_{1} + r_{2} + r_{3} + \dots + r_{p - 3})

(23)

(24)

E_{p - 2} = S - S_{p - 2}

(24)

(25)

r_{p - 2} = f_{p - 2} (E_{p - 2}),

(25)

where $f_{p - 2}$ is the polynomial regression function between the values of $e_{p - 2}$ and $E_{p - 2}$ . Therefore, at the $(p - 1)$ th iteration step, the $(p - 1)$ th predicted value of variable $S$ (i.e. $S_{p - 1}$ ) is determined by

(26)

S_{p - 1} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} + \dots + r_{p - 2})}^{u}

(26)

For the $p$ th iteration step, $e_{p - 1}$ , $E_{p - 1}$ and $r_{p - 1}$ is determined by

(27)

e_{p - 1} = y - (y (x) + r_{1} + r_{2} + r_{3} + \dots + r_{p - 2})

(27)

(28)

E_{p - 1} = S - S_{p - 1}

(28)

(29)

r_{p - 1} = f_{p - 1} (E_{p - 1}),

(29)

where $f_{p - 1}$ is the polynomial regression function between the values of $e_{p - 1}$ and $E_{p - 1}$ . Therefore, at the $p$ th iteration step, the $p$ th predicted value of variable $S$ (i.e. $S_{p}$ ) is determined by

(30)

S_{p} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} + \dots r_{p - 2} + r_{p - 1})}^{u}

(30)

2.2. Determining the final form of the accurate regression equation

Suppose at the $p$ th iteration step, $S \approx S_{P}$ . Then, the final form of an accurate regression equation is obtained through substitutions.

Substitute Equationequation 10(10) $S_{1} = \frac{1}{b i^{u}} {(y (x) - jQ)}^{u}$ (10) into 12

(31)

E_{1} = S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u}

(31)

Substitute Equationequation 14(14) $S_{2} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}$ (14) into 16

(32)

E_{2} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}

(32)

Substitute Equationequation 18(18) $S_{3} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}$ (18) into 20

(33)

E_{3} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}

(33)

Substitute Equationequation 26(26) $S_{p - 1} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} + \dots + r_{p - 2})}^{u}$ (26) into 28

(34)

E_{p - 1} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} \dots + r_{p - 2})}^{u}

(34)

Substitute Equationequation 31(31) $E_{1} = S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u}$ (31) into 13

(35)

r_{1} = f_{1} (S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u})

(35)

Substitute Equationequation 32(32) $E_{2} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}$ (32) into 17

(36)

r_{2} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u})

(36)

Substitute Equationequation 33(33) $E_{3} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}$ (33) into 21

(37)

r_{3} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u})

(37)

Substitute Equationequation 34(34) $E_{p - 1} = S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} \dots + r_{p - 2})}^{u}$ (34) into 29

(38)

r_{p - 1} = f_{p - 1} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} \dots + r_{p - 2})}^{u})

(38)

Substitute Equationequation 35(35) $r_{1} = f_{1} (S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u})$ (35) into 36; Equationequations 35(35) $r_{1} = f_{1} (S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u})$ (35) and Equation36(36) $r_{2} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u})$ (36) into 37; Equationequations 35(35) $r_{1} = f_{1} (S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u})$ (35) , Equation36(36) $r_{2} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u})$ (36) and Equation37(37) $r_{3} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u})$ (37) into 38 and so on. After all substitutions have been done one after the other, then the final resulting equation is very long. But, we can see that $r_{1}$ , $r_{2}$ , $r_{3}$ … $r_{p}$ is the function of variables $Q$ and $S$ . For given values of paired variables, $i$ , $b$ , $u$ , $j$ , $k$ , $h$ , $w$ and $t$ are all constants. Therefore,

(39)

r_{1} + r_{2} + r_{3} + \dots + r_{p - 1} = f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{p - 1} (Q, S)

(39)

Substitute Equationequation 39(39) $r_{1} + r_{2} + r_{3} + \dots + r_{p - 1} = f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{p - 1} (Q, S)$ (39) into 30

(40)

S_{p} = \frac{1}{b i^{u}} {(y (x) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{p - 1} (Q, S))}^{u}

(40)

Substitute Equationequation 2(2) $x = k (hQ)^{1 / w} + t$ (2) into 3

(41)

\begin{aligned} y (x) = a_{n} {(k {(hQ)}^{1 / w} + t)}^{n} + a_{n - 1} {(k {(hQ)}^{1 / w} + t)}^{n - 1} + a_{n - 2} (k (hQ)^{1 / w} + t)^{n - 2} + \dots + c \end{aligned}

(41)

From Equationequation 41(41) $\begin{aligned} y (x) = a_{n} {(k {(hQ)}^{1 / w} + t)}^{n} + a_{n - 1} {(k {(hQ)}^{1 / w} + t)}^{n - 1} + a_{n - 2} (k (hQ)^{1 / w} + t)^{n - 2} + \dots + c \end{aligned}$ (41) , $y (x)$ is the function of variable $Q$ . Therefore,

(42)

y (x) = f (Q)

(42)

Substitute Equationequation 42(42) $y (x) = f (Q)$ (42) into 40

(43)

S_{p} = \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{p - 1} (Q, S))}^{u}

(43)

Suppose at the $p$ th iteration step, $S \approx S_{P}$ . Therefore, Equationequation 43(43) $S_{p} = \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{p - 1} (Q, S))}^{u}$ (43) is given by

(44)

S \approx \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{p - 1} (Q, S))}^{u}

(44)

EquationEquation 44(44) $S \approx \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{p - 1} (Q, S))}^{u}$ (44) is the shorthand form of a very long equation. The substituting equations’ power constants $u$ , $w$ , and $n$ make the equation complex and difficult to simplify. However, the substituting equations that form the complex equation are easily interconnected in an Excel spreadsheet or programmed in Matlab. As we can see from Equationequation 44(44) $S \approx \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{p - 1} (Q, S))}^{u}$ (44) , there are only two variables $Q$ and $S$ . Therefore, we can solve this equation for a given value of $Q$ or $S$ . A procedure to solve the equation is provided in Section 2.5.

2.3. Determining initial values for deriving an accurate regression equation

In Sections 2.1 and 2.2, we showed the steps to derive and determine the final form of the accurate regression equation based on values of paired variables $S$ and $Q$ . To start deriving the equation based on the values of the paired variables $S$ and $Q$ , we should have to first determine the constants (see Equationequations 1(1) $y = i (bS)^{1 / u} + jQ$ (1) and Equation2(2) $x = k (hQ)^{1 / w} + t$ (2) ). The polynomial function (see Equationequation 3(3) $y (x) = a_{n} x^{n} + a_{n - 1} x^{n - 1} + a_{n - 2} x^{n - 2} + \dots + c$ (3) ) directly describes the relationship between variables $x$ and $y$ , but it indirectly describes the relationship between variables $S$ and $Q$ . Therefore, for given values of paired variables $S$ and $Q$ , we find values of constants $i$ , $b$ , $u$ , $j$ , $k$ , $h$ , $w$ , and $t$ for Equationequations 1(1) $y = i (bS)^{1 / u} + jQ$ (1) and Equation2(2) $x = k (hQ)^{1 / w} + t$ (2) such that plots of $x$ versus $y$ yield a smooth curve of a polynomial function. Accordingly, once all values of constants are known, the initial and final values of variables will be determined by following the iteration steps above.

2.4. Deriving an accurate sediment rating equation

In above sections, we indicated the general directions showing how to derive and determine the final form of the accurate regression equation, and we also indicated the direction showing how to determine the initial values to start deriving the equation. For a practical example, we use sediment concentration and corresponding river or streamflow data (see and ) to derive an accurate sediment rating equation. In the table, suspended sediment concentration data is represented by variable $S$ , whereas flow data is represented by variable $Q$ .

Table 1. Sediment concentration versus river or streamflow data.

Display Table

Table 2. Sediment concentration versus river or streamflow data.

Display Table

To make it clear, we use the following steps to derive an accurate sediment rating equation based on the above pairs of sediment concentration and river or streamflow data.

For given values of paired variables $S$ and $Q$ , estimate constants $i$ , $b$ , $u$ , $j$ , $k$ , $h$ , $w$ , and $t$ such that plots of $x$ versus $y$ yields a smooth curve of polynomial function (refer to Equationequations 1(1) $y = i (bS)^{1 / u} + jQ$ (1) and Equation2(2) $x = k (hQ)^{1 / w} + t$ (2) )
Choose a polynomial regression function that fits the plots of $x$ versus $y$
From the regression equation in step 2, find the constants of Equationequation 3(3) $y (x) = a_{n} x^{n} + a_{n - 1} x^{n - 1} + a_{n - 2} x^{n - 2} + \dots + c$ (3)
Calculate $y$ by using Equationequation 1(1) $y = i (bS)^{1 / u} + jQ$ (1)
Calculate $x$ by using Equationequation 2(2) $x = k (hQ)^{1 / w} + t$ (2)
Calculate $y (x)$ based on steps 3 and 5
Calculate $S_{1}$ by using Equationequation 10(10) $S_{1} = \frac{1}{b i^{u}} {(y (x) - jQ)}^{u}$ (10) , where $S_{1}$ represents the first predicted value. Plot graphs of measured ( $S$ ) and predicted ( $S_{1}$ ) values. If the graphs do not match each other, then we proceed to the next iteration step.
Calculate $e_{1}$ by using Equationequation 11(11) $e_{1} = y - y (x)$ (11)
Calculate $E_{1}$ by using Equationequation 12(12) $E_{1} = S - S_{1}$ (12)
Consider a polynomial regression function to correlate $e_{1}$ and $E_{1}$
Calculate $r_{1}$ by using the regression equation from step 10 (i.e. refer to Equationequation 35(35) $r_{1} = f_{1} (S - \frac{1}{b i^{u}} {(y (x) - jQ)}^{u})$ (35) )
Replace the calculated value of $r_{1}$ from step 11 in Equationequation 14(14) $S_{2} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}$ (14)
Calculate $S_{2}$ by using Equationequation 14(14) $S_{2} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u}$ (14) , where $S_{2}$ represents the second predicted value. Plot graphs of measured ( $S$ ) and predicted ( $S_{2}$ ) values. If the graphs do not match each other, then proceed to the next iteration step.
Replace the calculated value of $r_{1}$ from step 11 in Equationequation 15(15) $e_{2} = y - (y (x) + r_{1})$ (15)
Then, calculate $e_{2}$ by using Equationequation 15(15) $e_{2} = y - (y (x) + r_{1})$ (15)
Calculate $E_{2}$ by using Equationequation 16(16) $E_{2} = S - S_{2}$ (16)
Consider a polynomial regression function to correlate $e_{2}$ and $E_{2}$
Calculate $r_{2}$ by using the regression equation from step 17 (i.e. refer to Equationequation 36(36) $r_{2} = f_{2} (S - \frac{1}{b i^{u}} {(y (x) - j Q + r_{1})}^{u})$ (36) )
Replace the calculated value of $r_{1}$ from step 11, and the calculated value of $r_{2}$ from step 18 in Equationequation 18(18) $S_{3} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}$ (18)
Then, calculate $S_{3}$ by using Equationequation 18(18) $S_{3} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2})}^{u}$ (18) , where $S_{3}$ represents the third predicted value. Plot graphs of measured ( $S$ ) and predicted ( $S_{3}$ ) values. If the graphs do not match each other, then we proceed to the next iteration step, and so on.

We repeat the same procedure to calculate a value of $S_{p}$ by using Equationequation 30(30) $S_{p} = \frac{1}{b i^{u}} {(y (x) - j Q + r_{1} + r_{2} + r_{3} + \dots r_{p - 2} + r_{p - 1})}^{u}$ (30) , where subscript $p$ stands for number of iteration steps. During each iteration step, we plot graphs of the measured ( $S$ ) and predicted ( $S_{p}$ ) values. Our iteration procedure ends when the graphs almost match each other.

Based on the paired data given in the table 0, the values of the required constants (i.e. $i$ , $b$ , $u$ , $j$ , $k$ , $h$ , $w$ , and $t$ ) and variables (i.e. $y (x)$ , $r_{1}$ , $r_{2}$ , $r_{3}$ , $r_{4}$ …. $r_{14}$ ) had been determined by following the above steps. The values of these constants and variables are given below. shows the graph of the original river or streamflow ( $Q$ ) versus sediment concentration ( $S$ ) data, and the graph of the transformed data ( $x$ versus $y$ ) (see Section 2.3).

Figure 1. The original and transformed data according to step one.

(45)

k = 0.000001

(45)

(46)

h = 1

(46)

(47)

w = 2

(47)

(48)

t = 0

(48)

(49)

i = 0.0000111

(49)

(50)

b = 0.0022

(50)

(51)

u = 2

(51)

(52)

j = 2

(52)

(53)

y (x) = 1999999995091.56 x^{2} + 0.10499849 x + 0.0000001

(53)

(54)

r_{1} = (- 0.00008 E_{1}^{2} + 1.011 E_{1} + 0.004 - E_{1}) * 10^{- 5}

(54)

(55)

r_{2} = (1.052 E_{2} + 0.006 - E_{2}) * 10^{- 6}

(55)

(56)

r_{3} = (0.002 E_{3}^{2} + 1.089 E_{3} - 0.018 - E_{3}) * 10^{- 6}

(56)

(57)

r_{4} = (1.042 E_{4} - E_{4}) * 10^{- 6}

(57)

(58)

r_{5} = (0.007 E_{5}^{2} + 1.096 E_{5} - 0.006 - E_{5}) * 10^{- 6}

(58)

(59)

r_{6} = (0.001 E_{6}^{2} + 1.047 E_{6} - E_{6}) * 10^{- 6}

(59)

(60)

r_{7} = (0.02 E_{7}^{2} + 1.143 E_{7} - 0.004 - E_{7}) * 10^{- 6}

(60)

(61)

r_{8} = (0.005 E_{8}^{2} + 1.066 E_{8} - E_{8}) * 10^{- 6}

(61)

(62)

r_{9} = (0.013 E_{9}^{2} + 1.075 E_{9} - E_{9}) * 10^{- 6}

(62)

(63)

r_{10} = (0.026 E_{10}^{2} + 1.09 E_{10} - 0.001 - E_{10}) * 10^{- 6}

(63)

(64)

r_{11} = (0.026 E_{11}^{2} + 1.102 E_{11} - E_{11}) * 10^{- 6}

(64)

(65)

r_{12} = (- 0.001 E_{12}^{2} + 1.04 E_{12} - E_{12}) * 10^{- 6}

(65)

(66)

r_{13} = (0.115 E_{13}^{2} + 1.139 E_{13} - E_{13}) * 10^{- 6}

(66)

(67)

r_{14} = (0.007 E_{14}^{2} + 1.048 E_{14} - E_{14}) * 10^{- 6}

(67)

As the values of the above constants and variables were already determined, the final form of the accurate regression equation is obtained by direct substitutions (refer to Section 2.2). Therefore, the final form of the accurate sediment rating equation is given by

(68)

S \approx \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{14} (Q, S))}^{u}

(68)

For the final form of the equation, the graphs of measured ( $S$ )and predicted sediment concentration ( $S_{p}$ ) matched each other (see ).

Figure 2. Graphs of measured ( $S$ ) and predicted ( $S_{p}$ ) sediment concentration.

Since the final form of the equation is a very large and complex equation, the above values of the variables are easily interconnected in an excel spreadsheet or programmed in Matlab. Separate files of video presentation and Excel spreadsheet are provided as supplementary files.

2.5. Solving the accurate sediment rating equation

In the above section, we showed the procedures to derive the accurate sediment rating equation. For the paired suspended sediment concentration ( $S$ ) and flow data ( $Q$ ), we calculated each value of $E_{1}$ , $E_{2}$ , $E_{3}$ … $E_{14}$ , and the corresponding value of $e_{1}$ , $e_{2}$ , $e_{3}$ … $e_{14}$ , respectively. At the fifteenth iteration step, at the values of $E_{14}$ and $e_{14}$ , we found that $S_{15} \approx S$ . Therefore, the last remaining errors are $E_{15}$ and $e_{15}$ . According to the steps above or Section 2.1, a value of $E_{15}$ is determined by

(69)

E_{15} = S - S_{15}

(69)

Based on equation 43

(70)

S_{15} = \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{14} (Q, S))}^{u}

(70)

Therefore,

(71)

E_{15} = S - \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{14} (Q, S))}^{u}

(71)

Based on Equationequations 1(1) $y = i (bS)^{1 / u} + jQ$ (1) , Equation27(27) $e_{p - 1} = y - (y (x) + r_{1} + r_{2} + r_{3} + \dots + r_{p - 2})$ (27) and 39

(72)

e_{15} = f_{0} (Q, S) - (f (Q) + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) + \dots + f_{14} (Q, S))

(72)

For each paired values of $S$ and $Q$ , there are corresponding values of $E_{15}$ and $e_{15}$ . Now, we take the values of $E_{15}$ and $e_{15}$ as paired input data to derive another equation that relates $E_{15}$ and $e_{15}$ by following the above steps, and so on. To derive the equation based on paired values of $E_{15}$ and $e_{15}$ , we calculate another values of $E_{p}$ and $e_{p}$ (see the steps above). To avoid confusion, let us express these other values of $E_{p}$ and $e_{p}$ in terms of $E_{p}^{*}$ and $e_{p}^{*}$ , respectively. Therefore, we define the following relationship.

(73)

If lim_{E_{p}^{*} \to 0} (E_{1}^{*} + E_{p}^{*}) = E_{1}^{*}, then E_{14} + E_{p}^{*} \approx E_{14}

(73)

For the given paired data ( $S$ and $Q$ ), at the value of $E_{14}$ , we found that $S_{15} \approx S$ . According to steps above or section 2.1, the value of $E_{14}$ is determined by

(74)

E_{14} = S - S_{14}

(74)

Consider equation 43

(75)

S_{14} = \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{13} (Q, S))}^{u}

(75)

Therefore,

(76)

E_{14} = S - \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + f_{3} (Q, S) \dots + f_{13} (Q, S))}^{u}

(76)

For each paired value of $S$ and $Q$ , there is a corresponding value of $E_{14}$ , which is a unique value. It is to mean that, for a given value of $Q$ , there is only one value of $S$ which results in a corresponding value of $E_{14}$ , minimum value or zero value of $E_{14}$ (i.e. there is no possibility to have two different values of $E_{14}$ for the same values of paired data). From the relationship (refer to Equationequation 73(73) $If lim_{E_{p}^{*} \to 0} (E_{1}^{*} + E_{p}^{*}) = E_{1}^{*}, then E_{14} + E_{p}^{*} \approx E_{14}$ (73) ) to approximate a value of $E_{14}$ for an unknown value of suspended sediment concentration or flow data, we keep on deriving a series of equations until a value of $E_{p}^{*}$ is approximately zero or it is far apart from a value of $E_{1}^{*}$ . In this case, the value of $E_{p}^{*}$ determines the accuracy of the approximation. Therefore, to estimate an unknown value of suspended sediment concentration for a given value of flow data, a value of suspended sediment concentration that results in the minimum value of $E_{P}^{*}$ is the solution.

Since the systems of equations forming the complex equation are very long, the separate files of the video presentation and Excel spreadsheet on deriving and solving the accurate sediment rating equation are provided as the supplementary files.

3. Results

The iterative approach for deriving an accurate regression equation based on values of paired variables is given in Section 2.1. The procedures to determine the final form of the accurate regression equation are given in Section 2.2. Accordingly, the shorthand form of the final accurate regression equation is given by

(77)

S \approx \frac{1}{{bi}^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{p - 1} (Q, S))}^{u},

(77)

where, $S$ and $Q$ are variables, $b$ , $i$ , $u$ and $j$ are constants for given values of paired data.

The accurate sediment rating equation which was derived based on five hundred seventy-one number of records of suspended sediment concentration and flow data is given by

(78)

S \approx \frac{1}{b i^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{14} (Q, S))}^{u}

(78)

The graphs of measured and predicted suspended sediment concentration matched each other (see ), and statistical measures for the data correlation are given in . The procedures to solve the accurate regression equation are given in Section 2.5.

4. Discussions

The relationship between the sediment concentration and flow was given by the complex equation (it was not polynomial or other kinds of known function). This equation may reflect the complex relationship between the dynamic behaviour of flow and sediment transport.

A power function is a commonly used non-linear regression approach for predicting sediment from a given flow data. However, a regression error is very large. The comparison of sediment prediction accuracy of the proposed regression equation and power function are given in . The proposed regression equation is very accurate. We can minimize a regression error as small as possible by increasing iteration steps.

Figure 3. Comparison of sediment prediction accuracy of the proposed regression equation and the power function ( $S = 0.069 Q^{0.9576}$ ) for the hombole watershed in Ethiopia.

Figure 4. Comparison of sediment prediction accuracy of the proposed regression equation and the power function ( $S = 0.2036 Q^{0.5475}$ ) for the Gumera watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 5. Comparison of sediment prediction accuracy of proposed regression equation and the power function ( $S = 0.659 Q^{0.839}$ ) for the mojo watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Figure 6. Comparison of sediment prediction accuracy of the proposed regression equation and the power function ( $S = 0.1901 Q^{0.1916}$ ) for the Gilgel Gibe 1 watershed in Ethiopia, provided that all data records were taken into account without any preconditions.

Model calibration and validation are challenging tasks to apply a model for a particular purpose, even for further improvement of the model. For example, if we consider the Modified Universal Soil Loss Equation (MUSLE) or the improved MUSLE, finding the coefficient, soil erodibility, cover, and conservation practice factors of the MUSLE or the improved MUSLE through calibration is not a feasible approach (Tsige et al. Citation2022a, Citation2022b). This is because only a product effect of the coefficient and these factors is reflected in the MUSLE or the improved MUSLE rather than their individual effect during the calibration of sediment yield (Tsige et al. Citation2022a, Citation2022b). Therefore, the individual effect of model variables rather than their product effect on the engaged physical processes is important. Therefore, expressing the relationship between model variables in such a way that their individual effects can be seen on the engaged physical process is essential. The proposed regression method may play a significant role in this regard.

5. Conclusions

The accurate sediment rating equation was derived by following the proposed iteration steps. For the paired values of suspended sediment concentration ( $S$ ) and flow ( $Q$ ) data, the shorthand form of the final accurate sediment rating equation is given by

(79)

S \approx \frac{1}{{bi}^{u}} {(f (Q) - j Q + f_{1} (Q, S) + f_{2} (Q, S) + \dots + f_{14} (Q, S))}^{u},

(79)

where, $b$ , $i$ , $u$ and $j$ are constants for given values of paired data

In this paper, the polynomial regression functions were considered to derive very long and complex accurate regression equation. However, we can use any other known functions. And also, variables $x$ and $y$ were defined in such a way that individual effects of other variables can reflect on variable $S$ (refer to section 2.1). However, we can define variables $x$ and $y$ in another way, and we follow the proposed iterative approach to derive an accurate regression equation.

The proposed iterative approach can be used to derive an accurate regression equation based on given values of paired variables. Therefore, the iterative approach can be used to model any processes, and any calibration and validation processes can be addressed.

In this paper, the iterative procedure is provided to solve the accurate regression equation. For further research, the analytical solution of the equation is recommended.

Supplemental material

Supplemental Material

Download MS Excel (5.6 MB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13873954.2024.2313014

Additional information

Funding

This research was funded by the German Academic Exchange Service and Universität der Bundeswehr München.

References

Asselman NEM. 2000. Fitting and interpretation of sediment rating curves. J Hydrol. 234(3):228–248. doi:10.1016/S0022-1694(00)00253-5.
Web of Science ®Google Scholar
Bárdossy A, Bogárdi I, Duckstein L. 1993. Theory and methodology: fuzzy nonlinear regression analysis of dose-response relationships. Eur J Oper Res. 66(1):36–51. doi:10.1016/0377-2217(93)90204-Z.
Web of Science ®Google Scholar
Barnes TJ. 1998. A history of regression: actors, networks, machines, and numbers. Envir & Plan. 30(2):203–223. doi:10.1068/a300203.
Web of Science ®Google Scholar
Bunke O, Droge B, Polzehl J. 1999. Model selection, transformations and variance estimation in nonlinear regression. Stat: A J Theo & Appl Stat. 33(3):197–240. doi:10.1080/02331889908802692.
Google Scholar
Du KL, Swamy MNS. 2014. Neural networks and statistical learning. London, Springer. doi:10.1007/978-1-4471-5571-3.
Google Scholar
Fernández-Delgado M.S, Sirsat M, Cernadas E, et al. 2019. An extensive experimental survey of regression methods. Neural Networks. 111:11–34. doi:10.1016/j.neunet.2018.12.010
PubMed Web of Science ®Google Scholar
Finney DJ. 1996. A note on the history of regression. J Appl Stat. 23(5):555–557. doi:10.1080/02664769624099.
Web of Science ®Google Scholar
Galton SJM. 2001. Pearson, and the Peas: a brief history of linear regression for statistics instructors. J Stat Educ, 9.
Google Scholar
Glad IK. 1998. Parametrically guided non-parametric regression. Scand J Stat. 25(4):649–668. Available from https://www.jstor.org/stable/4616530.
Web of Science ®Google Scholar
Hao P, Chiang J. 2008. Fuzzy regression analysis by support vector learning approach. IEEE Trans Fuzzy Syst. 16(2):428–441. doi:10.1109/TFUZZ.2007.896359.
Web of Science ®Google Scholar
Hapsari D, Onishi T, Imaizumi F, et al. 2019. The use of sediment rating curve under its limitations to estimate the suspended load. Rev Agric Sci. 7:88–101. doi:10.7831/ras.7.0_88
Google Scholar
Heng S, Suetsugi T. 2014. Comparison of regionalization approaches in parameterizing sediment rating curve in ungauged catchments for subsequent instantaneous sediment yield prediction. J Hydrol. 512:240–253. doi:10.1016/j.jhydrol.2014.03.003.
Web of Science ®Google Scholar
Kopal I, Labaj I, Vršková J, et al. 2022. A generalized regression neural network model for predicting the curing characteristics of carbon black-filled rubber blends. Polymers. 14(4):653. doi:10.3390/polym14040653
PubMed Web of Science ®Google Scholar
Li H, Yin G. 2009. Generalized method of moments estimation for linear regression with clustered failure time data. Biometrika. 96(2):293–306. doi:10.1093/biomet/asp005.
Web of Science ®Google Scholar
Linnet K. 1998. Performance of deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 44(5):1024–1031. doi:10.1093/clinchem/44.5.1024.
PubMed Web of Science ®Google Scholar
Lolli B, Gasperini P. 2012. A comparison among general orthogonal regression methods applied to earthquake magnitude conversions. Geophy J Int. 190(2):1135–1151. doi:10.1111/j.1365-246X.2012.05530.x.
Web of Science ®Google Scholar
Masters T, Land W A new training algorithm for the general regression neural network, 1997 IEEE International Conference on Systems, Man, and Cybernetics, Orlando, USA, 1997.
Google Scholar
Morala P, Cifuentes JA, Lillo RE, et al. 2021. Towards a mathematical framework to inform neural network modelling via polynomial regression. Neural Networks. 142:57–72. doi:10.1016/j.neunet.2021.04.036.
PubMed Web of Science ®Google Scholar
Özsoy VS, Örkçü HH. 2016. Estimating the parameters of nonlinear regression models through Particle Swarm optimization. Gazi Univ J Sci. 29:187–199.
Web of Science ®Google Scholar
Pao H. 2008. A comparison of neural network and multiple regression analysis in modeling capital structure. Expert Syst Appl. 35(3):720–727. doi:10.1016/j.eswa.2007.07.018.
Web of Science ®Google Scholar
Qian SS, Reckhow KH. 2005. Nonlinear regression modeling of nutrient loads in streams: a Bayesian approach. Water Resour Res. 41.
Web of Science ®Google Scholar
Rahman M, Asadujjaman M Implementation of artificial neural network on regression analysis, 2021 5th annual systems modelling conference, Canberra, Australia. 2021;. doi:10.1109/SMC53803.2021.9569881.
Google Scholar
Seal HL. 1967. Studies in the history of probability and statistics. XV: the historical development of the gauss linear model. Biometrika. 54(1):1–24.
PubMedGoogle Scholar
Seber GAF, Wild CJ. 2003. Nonlinear regression. John Wiley & Sons, Inc., .Hoboken, New Jersey.
Google Scholar
Specht DF. 1991. A general regression neural network. IEEE Trans Neural Net. 2(6):568–576. doi:10.1109/72.97934.
PubMed Web of Science ®Google Scholar
Tomandl D, Schober A. 2001. A modified general regression neural network (MGRNN) with new, efficient training algorithms as a robust ‘black box’-tool for data analysis. Neural Networks. 14(8):1023–1034. doi:10.1016/S0893-6080(01)00051-X.
PubMed Web of Science ®Google Scholar
Tsige MG, Malcherek A, Seleshi Y. 2022a. Estimating the best exponent and the best combination of the exponent and topographic factor of the modified universal soil loss equation under the hydro-climatic conditions of Ethiopia. Water. 14(9):1501. Available from https://www.mdpi.com/2073-4441/14/9/1501.
Web of Science ®Google Scholar
Tsige MG, Malcherek A, Seleshi Y. 2022b. Improving the modified universal soil loss equation by physical interpretation of its factors. Water. 14(9):1450. Available from https://www.mdpi.com/2073-4441/14/9/1450.
Web of Science ®Google Scholar
Wang F, Du T. 2014. Implementing support vector regression with differential evolution to forecast motherboard shipments. J Amer Math Soc. 41:3850–3855.
Google Scholar
Wang S. 1999. Nonlinear regression: a hybrid model. Comput Oper Res. 26(8):799–817. doi:10.1016/S0305-0548(98)00088-4.
Web of Science ®Google Scholar
Wiese M, Schaper KJ. 1993. Application of neural networks in the QSAR analysis of percent effect biological data: comparison with adaptive least squares and nonlinear regression analysis. SAR and QSAR in Envir Res. 1(2–3):137–152. doi:10.1080/10629369308028825.
PubMedGoogle Scholar
Wu FY, Yen KK. 1992. Applications of neural network in regression analysis. Comput Ind Eng. 23(1–4):93–95. doi:10.1016/0360-8352(92)90071-Q.
Web of Science ®Google Scholar
Yang M, Lin T. 2002. Fuzzy least-squares linear regression analysis for fuzzy input–output data. Fuzzy Sets Syst. 126(3):389–399. doi:10.1016/S0165-0114(01)00066-5.
Web of Science ®Google Scholar
Yong L. 2014. Novel global harmony search algorithm for least absolute deviation. J Appl Math. 2014:1–6. doi:10.1155/2014/632975.
Google Scholar
Zhang G, Patuwo BE, Hu MY. 1998. Forecasting with artificial neural networks: the state of the art. Int J Forecasting. 14(1):35–62. doi:10.1016/S0169-2070(97)00044-7.
Web of Science ®Google Scholar

An iterative approach for deriving and solving an accurate regression equation

ABSTRACT

Introduction

2. Methodology

2.1. An iterative approach to derive an accurate regression equation

2.2. Determining the final form of the accurate regression equation

2.3. Determining initial values for deriving an accurate regression equation

2.4. Deriving an accurate sediment rating equation

Table 1. Sediment concentration versus river or streamflow data.

Table 2. Sediment concentration versus river or streamflow data.

2.5. Solving the accurate sediment rating equation

3. Results

4. Discussions

5. Conclusions

Supplemental Material

Disclosure statement

Supplementary material

References

Information for

Open access

Opportunities

Help and information

An iterative approach for deriving and solving an accurate regression equation

ABSTRACT

Introduction

2. Methodology

2.1. An iterative approach to derive an accurate regression equation

2.2. Determining the final form of the accurate regression equation

2.3. Determining initial values for deriving an accurate regression equation

2.4. Deriving an accurate sediment rating equation

Table 1. Sediment concentration versus river or streamflow data.

Table 2. Sediment concentration versus river or streamflow data.

2.5. Solving the accurate sediment rating equation

3. Results

4. Discussions

5. Conclusions

Supplemental Material

Disclosure statement

Supplementary material

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date