Suspended sediment modelling by SVM and wavelet

Present-day advances in artificial intelligence, as a forecaster for hydrological events, have led to numerous changes in forecasting. The wavelet support vector machine (WSWM) model is achieved by conjunction of the wavelet analysis and the support vector machine (SVM). The suspended sediment (SS) and daily stream flow (Q) data from the Iowa River in the USA were used for training and testing. The WSVM could logically be used for approximation of the suspended sediment load.


Introduction
In most rivers, main part of the sediment is transported in suspension.An increasing importance is given to correct sediment prediction, particularly in flood-prone areas [1].The proper estimation of suspended sediments (SS) is essential for the operation and design of canals, dams and diversions.The forecasting of river suspended sediments is a complex work for operation of water resources in environmental engineering and hydrology [2].A great number of unclear factors are involved in river SS modelling.Theoretical equations may not be adequate for gaining information on the whole procedure as they have restricted capability to capture non-linearities and non-stationarities in environmental and hydrologic data.Over the past decade a particular attention has been paid to the application of artificial intelligence in the field of environmental engineering and water resources management.Several studies have been made in order to develop artificial intelligence methods for modelling processes with limited knowledge [3][4][5][6].Some trustworthy literature-based techniques for modelling water resources and river suspended sediment load such as wavelet, artificial neural network (ANN), and support vector machine (SVM), are presented in this paper.The capability of ANNs to find nonlinear relations between inputs and outputs make them proper tools for modelling hydraulic and hydrological phenomena [7].The ANN simulation has been increasingly applied in many countries, and its use for time series modelling has recently been discovered.The wavelet-transformed data of observed time series enhance forecasting capabilities by capturing valuable information at different resolution levels [8].The performance of multi layer feed forward (MLFF) network, and radial basis function (RBF) network, to forecast the discharge of suspended sediments has been compared [9].The ANN employment in support of river SS prediction has been studied by many researchers [10,11].The ANN, neuro-fuzzy (NF), multi layer regression (MLR) and sediment rating curve (SRC) models were examined for the one day ahead simulation of SS in two hygrometry stations.It was established by comparison of modelling results that the NF model is better suited for the SS forecasting [12].The ANN model was proposed as a means for simulating the monthly suspended sediment flux in China [9].In the mentioned model, the suspended sediment flux was correlated with the temperature, average rainfall, rainfall intensity, and flow discharge.Results show that the ANN model is capable of simulating the monthly suspended sediment flux [13].
Other investigators have defined a model by combining the wavelet transform and the neuro-fuzzy (NF) technique to predict the daily suspended sediment [14].The ANN approach was used to model the SS concentration on two sites on the Mississippi River.The corresponding results have revealed that the ANN technique is better when compared to conventional methods [15].
A new wavelet artificial neural network model was used for the daily SS forecast in the Yadkin River at the Yadkin College station in the USA.The wavelet transform was connected to an artificial neural network (ANN).The comparison of prediction accuracies of the wavelet artificial neural network (WANN) and other models revealed that the proposed WANN model was able to effectively predict the SS.The best model performance was achieved by Meyer wavelet with the determination coefficient value of 0.83 [16].A combined neural-wavelet model was proposed for prediction of the Ligvanchai watershed precipitation at Tabriz, Iran.The main time series was decomposed into several multi-frequency time series by wavelet technique and imposed as input data to the ANN to predict the precipitation one month ahead [17].
A combined wavelet-ANN model was recommended for SS forecasting.The daily stream flow and SS data derived from the Iowa River station in the United States were employed to train and test the ANN, WANN, MLR and SRC models.According to results obtained, the WANN model performed better than other models [18].The relation between the SS and stream flow discharge was identified by genetic algorithm (GA).This model had better performance than the SRC technique [19].Another research highlights the usefulness of wavelet analysis and the power of this tool for studying time series in karst hydrosystems [20].The wavelet analysis and NF was applied for daily SS forecasting [21].The support vector machine (SVM) is a supervised learning technique that produces inputoutput mapping functions from a set of training data [22].
In training support vector machines the decision boundaries are directly determined from the training data.This learning strategy is based on statistical learning theory and minimizes the classification errors of the training data and the unknown data [23].The SVM is also used for predicting the chlorophyll concentration in reservoirs [24].The WSVR model is investigated as a means to predict monthly stream flows and daily precipitations.These models were developed by connecting two techniques, discrete wavelet analysis and support vector regression.The comparison results illustrate that the WSVR enhances the model performance [25,26].The SVM is used as a pattern-recognition predictor to simulate daily, weekly and monthly runoff and sediment yield from an Indian watershed [27].Two input variable pre-processing methods for SVM model are explored, principal component analysis and the gamma test.The proposed methods provide a more truthful performance with regard to the monthly stream flow forecasting [28].The least square support vector machine (LSSVM) is compared with artificial neural networks (ANNs) and sediment rating curve (SRC) in a separate prediction of upstream and downstream stations sediment data.The comparison results of the models show that the LSSVM model performs better than the ANN techniques [29].The artificial neural network and support vector machine models were applied to predict the suspended sediment load Suspended sediment modelling by SVM and wavelet in Doiraj river basin situated in the west of Iran.The stream flow and rainfall data were considered as input variables and length of the training data set.The best input combination for the models was identified by the use of GT [30].The genetic programming (GP) method was applied for estimating the daily SS in two stations on the Cumberland River in U.S. Results show that the GP is superior to other models [31].These surveys show that the wavelet analysis is an effective method for irregularly distributed multi-scale features of climate elements in space and time.The goal of combining the wavelet analysis with the SVM technique is to improve the SS forecasting accuracy.

Support vector machine
New techniques on artificial intelligence, with a great variety of applications, have been identified over the past decade.One of them is the support vector machine (SVM) which is used in the classification and regression algorithms assortment.Support vector regresses, which are extensions of support vector machines, have shown good generalization ability for various function approximation and time series prediction problems [23].Much research has so far been made about the theory of SVM [32][33][34][35].Therefore, only a brief explanation of e -SVM model is given.Suppose we are given training data {(x 1 ,y 1 ), ..., (x l ,y l )} ∩ X ₓ R, where x and y denote the space of input patterns and target values.The aim of the SVM is to find a function f (x) that has the greatest e deviation from actual targets y i for all training data, and it should be as flat as possible.In this paper, the e-SVM regression is conducted for prediction of next day SS [36].In other words, errors are not considered as long as they are less than e, but any deviation larger than e will not be accepted.Linear functions of f (x) are described as follows: where ‹w,x› denotes the dot product in X. Flatness in the case of Eqn (1) means that a small w is sought.One method to ensure w is to minimise the norm i.e. w ww 2 = , .This problem can be written as a convex optimization problem: The assumption in Eqn ( 2) is that such a function f (x) that approximates all pairs (x i ,y i ) with e precision actually exists, and that the convex optimization problem can be solved.On occasion, this may not be the case, or we may also allow for some errors.We can introduce slack variables x i , x i * to cope with otherwise insoluble constraints of the optimization problem Eqn (3).Thus, the formulation can be described as follows: Minimize The constant C > 0 determines the trade-off between the flatness of f (x) and the amount up to which deviations larger than e are tolerated.This corresponds to the situation with the so called e-insensitive loss function ε ξ explained in Eqn (4) as follows: The optimization problem in Eqn (3) can be solved in its dual formulation.The major plan is to make a Lagrange function from the primal objective function and the corresponding constraints, by introducing a dual set of variables.It has a saddle point with respect to the primal and dual variables at the solution.This function is shown by the following expressions: The term L is the Lagrangian and h i , h i *, a i , a i * are factors of Lagrange multipliers.Therefore, dual variables in Eqn ( 6) have to allow for constraints.
It follows from the saddle point condition that the partial derivatives of L with respect to the primal variables (w, b, x i , x i *) have to vanish for optimality.
Substituting Eqn (7) into Eqn (5) yields the dual optimization problem and eliminates dual variables.Eqn ( 8) is rewritten as follows: Maedeh Sadeghpour Haji, Seyed A. Mirbagheri, Amir H. Javid, Mostafa Khezri, Ghasem D. Najafpour This is the so-called support vector expansion.The w could be explained as a linear algorithm that only depends on dot products between training patterns w.Linear model is not suitable for numerous hydrological events.Consequently, it becomes suitable by converting Kernel for putting data in a space with more dimensions and then by applying the standard support vector regression algorithm.These interpretations will be convenient for the formulation of a nonlinear extension.This could be done by pre-processing the training patterns i x by a map ϕ : x → ℑ into some feature space ℑ .Thus, it is sufficient to recognize Kernel function k x x x x i i ( , ) : ( ), ( ) = ϕ ϕ rather than j explicitly.The Kernel function provides the opportunity for using a nonlinear function in input space for varying to linear function in characteristics space.Kernel function benefits to untreated high dimensional feature space explicitly.This technique is named Kernel trick and is shown as follows [22,30,36,37]: The standard conversion of Kernel function, which is most often used in regression and modelling, is given in Table 1 [22].

The Continuous wavelet transform (CWT)
The theory of wavelet analysis is founded on the Fourier analysis [40].The wavelet decomposition is perfect for considering transient signals and obtaining a superior characterization and a more dependable discrimination technique [41].The signal is multiplied with a function close to the window function and the transform is calculated for diverse segments of the time-domain signal.The continuous wavelet transform is described as follows: The wavelet transform is a function of two parameters, translation (t) and scale (s).Y(t) is named the mother wavelet.
The term wavelet means a small wave.The smallness implies to the condition that window function is compactly supported.The word translation relates to the place of the window, as the window is shifted from beginning to the end of the signal.Scale parameter (s) which is defined as s = 1 / frequency [42].

Discrete Wavelet Transform (DWT)
The concept of DWT is similar to that of the continuous wavelet transform (CWT).The time-scale representation of a digital signal is achieved by digital filtering methods.The CWT is a correlation between a wavelet at diverse scales and the signal with the scale (or the frequency) being used as a measure of similarity.In the discrete case, filters of different cutoff frequencies are used to analyze the signal at diverse scales.The signal is approved throughout a series of high pass filters to analyze the high frequencies, and it is passed through a series of low pass filters to analyze the low frequencies.Subsampling by a factor n decreases the number of samples in the signal n times.Up-sampling a signal corresponds to enhancing the sampling rate of a signal by adding new samples to the signal; while the signal is a discrete time function, the terms function and sequence will be applied interchangeably.This sequence is indicated by x [n], where n is an integer.The process begins with passing this signal through a half band lowpass filter with impulse response h [n].Filtering a signal communicates to the numerical operation of convolution of the signal with the impulse response of the filter.The equation is defined as follows: A half band lowpass filter eliminates the entire frequencies that are over half of the highest frequency in the signal.The lowpass filtering halves the resolution, but leaves the scale unchanged.The signal is then subsampled by 2 since one half of samples are unnecessary.

Kernel Functions Type of classifier
Inverse multiquadric function, PD * only for certain values of b, (C) PD= (conditionally) positive definite Suspended sediment modelling by SVM and wavelet These procedures double the scale and could be expressed as follows: The DWT converts the signal into an approximation and detailed information.
The original signal x [n] is primary passed through a half band highpass filter g [n] and a lowpass filter h [n].These equations could be expressed as follows: where y high [k] and y low [k] are the output of the highpass and lowpass filters, after subsampling by 2. The former system, which is identified as the subband coding, could be repeated for supplementary decomposition.The method is illustrated in Figure 1 where   Maedeh Sadeghpour Haji, Seyed A. Mirbagheri, Amir H. Javid, Mostafa Khezri, Ghasem D. Najafpour data were investigated to predict the daily SS.This procedure was used in several research papers [14,16,18,21,26,29,30].Cross-correlation coefficients, relating between observed SS t and stream flow time series (Q t , Q t-1 , Q t-2 ,...Q t- 10 ), and autocorrelation coefficients such as lag 1 day autocorrelation coefficient (R 1 ), lag 2 days autocorrelation coefficient (R 2 ), lag 3 days autocorrelation coefficient (R 3 ) …. and lag 10 days autocorrelation coefficient (R 10 ), are illustrated for Q and SS in Figure 3.It can be seen that autocorrelation coefficients of stream flow are higher than autocorrelation coefficients of suspended sediment in both training and testing data sets.The autocorrelation between SS t and SS t-4,t- 5,t-6... is quite low, and the same happens for cross-correlation coefficients between SS t and Q t-4,t-5,t-6... .Consequently, the models, whose input were the SS and Q of three previous days, were examined.We took different lag time series of stream flow and SS that included maximum three time steps into the past as input data.The semilog scale scatter plots of this stations' data are given in Figure 4.It can be noticed that there is a nonlinear and complex relationship between the stream flow and suspended sediment.This Figure demonstrates the existence of an outlier in the data range.A SS concentration value of 4780 kg/s was observed, while other concentration values were below 2245.37 kg/s.These values were generated through extra complex calculation for the models.A number of conventional evaluations such as the coefficient of determination (R 2 ), sum of square error (SSE), and root mean square error (RMSE), were considered [44,45].In this study, the performance of the models was evaluated by R 2 , RMSE, mean of Error (Error mean), and Error Standard Deviation (Error STD).These equations are described as follows: where n is the number of data.Suspended sediment modelling by SVM and wavelet coefficients of SS and Q in detail mode at level 1. SS and Q time series were decomposed at diverse levels by wavelet analysis and imposed as inputs to SVM method for forecasting oneday-ahead SS.By this decomposition, the periodic property of SS was considered.Each time subsignal plays a diverse function in original time series, and the performance of each signal is distinct.The study checks different combinations of Q and SS with maximum three time steps into the past, as inputs to the models.The following input combinations were investigated:

Suspended
Optimum parameters of the SVM models were defined by minimizing the objective function (RMSE) error between calculated and observed suspended sediment values in test period for each input combination.In the application of SVM, C and e were parameters that needed to be specified.In fact, if good care is not taken in parameter selection, the resulting regression model may yield large prediction errors on unobserved future data [46].The parameter selection tool assumes that the RBF (Gaussian) Kernel was used although extensions to other Kernels and SVR could be easily made.
Values for , e and g were needed for building a e -SVM model from training data, and the RBF Kernel function was used [47].
The g expression, mentioned in Table 1, was significant in the RBF model and could lead to under fitting and over fitting in prediction [48].
In this research, the g parameter had a default value equal to 1/ num-feature in LIBSVM software, and the C and e parameters were set to several values and various SVM models were developed.Optimum parameters for each combination in WSVM model by Symlet (Sym3) wavelet are given in Table 3, and the optimal one that presented the minimum RMSE error, mean error, and STD error, and the maximum R 2 in the test period, was chosen.Before applying the WSVM to the data, the training input and output values were normalized using the logarithm function, respectively.According to this Table 3, the WSVM model provided the best performance criteria for Combination 2 Q t-1 , Q t-2 , SS t-1 , SS t-2 .It is clear from the Table that the minimum RMSE=70.8 kg/s, while the highest R 2 value of 0.847 was obtained when the C = 6 and e = 0,07 were chosen.To characterize periodic properties, the observed Q and SS time series were decomposed into several multi-frequency time series by wavelet transform.Each subsignal participated with different and distinct behaviour in the original time series.These time series were imposed as input data to the SVM technique to forecast the next day SS.The results obtained by the WSVM model demonstrate that the performance increases if decomposition levels for river stream flow and suspended sediment time series are considered equal, while the model performance decreases by increasing the decomposition level to levels greater than 1.This research was made to investigate effects of the employed mother wavelet type and decomposition level on the projected   6 shows a good agreement between the observed and predicted values.High and low SS values were better forecast by the WSVM model.The predicted values were close to the measured values and results were closer to the 45° straight line in the scatter plots than the other model.This is due to  Improvements in prediction can clearly be observed if these methods are compared with a more primitive one such as the sediment rating curve (SRC).Generally, the SRC has the form of SS = aQ b , where a and b are constants.Thus, the SRC has an important bearing on the correct assessment of SS.The power equation covers both the effect of increased stream power at higher Q, and the extent to which new sources of SS become available in weather conditions that cause high Q.Despite its general use, several problems are recognized with regard to the accuracy of the fitted curve and the physical meaning of its regression coefficients [49].The SRC was fitted to the training data in this station, and the following equation was obtained: The predictive ability of the SRC model was also tested with the same data sets that were employed to test the SVM and WSVM models, thus making the result comparable.To support previous results by visual assessment, the semilogscaled time series of the measured and predicted SS, and residuals of errors employing SRC model for the test period, are depicted in Figure 8. RMSE and R 2 for the model were 119.89 kg/s and 0.28 in test period, respectively.The SVM model was demonstrated to be in better agreement with the original SS data than the SRC model.However, the errors show that some contributions of the physics were not considered.Therefore, the WSVM model was more efficient and had better predictions than the SRC and SVM models.The SS calculation is essential in any reservoir problems.Furthermore, correct assessment of SS is needed for the design and operation of canals, dams and diversions.The cumulative SS was estimated by these models in test period.
The WSVM and SVM methods underestimated cumulative SS by about 2.48 % and 13.4 %, respectively.The prediction of high SS values greater than 810.18 kg/s was performed in this paper.Twelve high SS cases were selected for the analysis in the test period and time of each occurrence is shown in Figures 8 and 9    Suspended sediment modelling by SVM and wavelet

Conclusion
In this research, WSVM and SVM models were applied to predict the daily suspended sediment load.The stream flow and suspended time series were decomposed into several multifrequency time series by wavelet transform.Each subsignal participated with different and distinct behavior in the original time series.Observed Q and SS data were considered as input variables, and the best input combination for the models was recognized by using the autocorrelation and cross-correlation.Performances of these models were compared by R 2 , RMSE, mean error and standard deviation error.In this investigation, RBF Kernel functions were considered as more logical and efficient consequences compared to other Kernel functions.
Conclusions revealed that the WSVM model with Combination 2 (Q t-1 , Q t-2 , SS t-1 , SS t-2 ) had superior performance than the other model with the minimum RMSE=70.83kg/s, and the highest R 2 value of 0.847 was obtained when the CC=6 and e = 0,07 was chosen.In general, both SVM and WSVM models exhibited better predictions than SRC model, while the WSVM forecast was more trustful.By comparing these intelligent methods with the sediment rating curve, improvements in prediction can clearly be seen.The WSVM and SVM methods underestimated cumulative SS by about 2.48 % and 13.4 %, respectively.The prediction of high SS values was performed in this paper, and prediction of high suspended sediment load values (greater than 810.18 kg/s) was compared with observation values.Results have revealed that the WSVM model could reasonably be used to estimate cumulative and high suspended sediment load.

x
[n]  is the original signal, and h [n] and g[n]  are lowpass and highpass filters, respectively.The original signal decomposition in three levels is illustrated as follows[43]:4.Gauging station and statistical analysis of data The data collected for a period of 9 years (01-October-1978 to 30-September-1987) from the Iowa River at Wapello, NC gauging station (USGS station No: 05465500, Basin Area (sq.mi.): 12,499, longitude: 091°10'57" and latitude: 41°10'48"), operated by the U.S. Geological Survey (USGS), were applied for training and testing the models.Actually, uninterrupted time series data such as Q and SS were needed in modelling.Data from October 1, 1978 to September 30, 1985 (7 years), and the data from October 1, 1985 to September 30, 1987 (2 years), were used as training and validation data sets.The time series of data related to daily Q and SS are shown in Figure 2. Statistical parameters for stream flow and suspended sediment such as maximum, minimum, mean, average and standard deviation (SD), are shown in

Figure 2 .Table 2 .Figure 1 .
Figure 2. Stream flow and suspended sediment time series for a period of 9 years Table 2. Statistic parameters for stream flow and suspended sediment sediment and stream flow time series were decomposed (by wavelet analysis) into some multi-frequency time series for details (different resolution levels) and approximation.They were linked to SVM technique as inputs for predicting one-day-ahead SS.Number of nodes in input layer is determined with (i+1) ₓ 2 because the WSVM combination technique applies two variables (suspended sediment and stream flow).Each time series is decomposed into i, i = (1, 2, 3) details time series, and one approximation time series.The wavelet technique can divide the SS time series properties into various scales of wavelet transform at the same time.Different mother wavelet was decomposed into Q and SS, in diverse levels, from 1 to 3. It was established that with these data, the decomposition in one level for SS and Q signals that yield two subsignals by sym3, the wavelet had the best performance.The details of subsignals are presented in Figure5.In this figure, SSApp and QApp are wavelet coefficients of SS and Q in approximation mode, and SSDet1, QDet1 are wavelet

Figure 3 .Figure 4 .
Figure 3.The autocorrelation for Q and SS, cross-correlation of SS with Q

Figure 6 .
Figure 6.The semilog-scaled time series and log-scaled scatter plots of observed and estimated SS, residuals and normal distribution of errors by best WSVM model in test period . The WSVM model results were nearer to the 1:1 line model.RMSE and R 2 values for this model were 251.42 kg/s and -0.09.The RMSE decreased by 57 % in comparison with the best SVM model.The SVM technique underestimated the SS in most cases.The semilog-scaled and Log-scaled scatter plots of observed and predicted SS for values more than 810.18 kg/s and residuals and normal distribution of errors are shown in Figures 9 and 10 .

Figure 7 .
Figure 7.The semilog-scaled time series and log-scaled scatter plots of observed and estimated SS, residuals and normal distribution of errors by best SVM model in test period

Figure 8 .Figure 9 .
Figure 8.The semilog-scaled time series and log-scaled scatter plots of observed and estimated SS, residuals and normal distribution of errors by best SRC model in test period

Figure 10 .
Figure 10.The semilog-scaled time series and Log-scaled scatter plots of observed and estimated SS for values of more than 810.18 kg/s, residuals and normal distribution of errors by best SVM model in test period

Table 2
. It is clear from the Table that the stream flow and suspended sediment data present spread distribution.Maximum experimental Q and SS were also used in the test period.These statistical parameters signify the vastly comprehensive performance of stream flow and suspended sediment phenomenon.The selection of correct combination of pre-process input factors is one of complex procedures in modelling.The autocorrelation and cross-correlation between the Q and SS

Table 4 .
For this station, the SVM model presented the best performance criteria for Combination 1 (Q t-1 , Q t-2 , Q t-3 , SS t-1 , SS t-2 , SS t-3 ) and the relative RMSE and R 2 for the model were 102.96 kg/s and 0.678 in the test period.Compared to the best SVM model, the optimal WSVM model increased R 2 (almost 20 %), while the RMSE decreased by 31.2 %.To support the previous results by visual assessment, the semilogscaled time series of the measured and predicted SS, and residuals of errors employing SVM and WSVM models for the test period, are depicted in Figures6 and 7.The log-scaled scatter plots of observed and estimated SS load values and normal distribution of errors are also shown.Figure