Full article: A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This paper addresses the crucial realm of stock price prediction, highly coveted by individual investors and institutions for its substantial economic implications. The inherent non-stationary and intricate nature of stock market fluctuations, coupled with real-time transactions, poses a formidable challenge for accurate and swift prediction. Unlike prevailing research that predominantly focuses on forecasting methods, our novel approach places a paramount emphasis on processing original data, introducing 57 technical indicators to better represent economic aspects for stock price prediction. Signifying the importance of each feature, we employ the LASSO algorithm to derive an optimal feature combination. Additionally, our methodology utilizes the Ca-LSTM (cascade long short-term memory) technique, enhancing information extraction from individual features. Experimental results, gauged by mean error, underscore the superiority of the Ca-LSTM model over other time series prediction models and conventional long short-term memory approaches. Notably, our model's integration with the accumulation-based VMD-LSTM model demonstrates enhanced forecasting accuracy. This proposed method holds considerable potential to refine stock price prediction, thereby delivering heightened value to investors in the dynamic financial landscape.

Keywords:

1. Introduction

The stock market is a crucial component of the investment market, attracting significant attention from investors and enterprises alike. In the financial field, stock price prediction is an essential and challenging task. It is believed that researches on stock price prediction are of great deal of theoretical significance and application potential (Ding & Qin, Citation2020). Stock prices exhibit nonlinear, volatile (Bontempi et al., Citation2012), high noise levels and data intensity (Chang et al., Citation2009), and posing significant obstacles to accurate prediction. For investors, accurate predictions could bring benefits. Thus people could properly anticipate the company's situation and make informed decisions (Soni et al., Citation2022). Current studies have proved that it is reasonable to identify time sequence laws from the past data and make a benchmark forecast of future stock prices. In order to effectively reduce the investment risk and obtain stable investment returns, researchers have explored a number of methods, for example, Random Forest (RF), Support Vector Machine(SVM), Native Bayes, K-Nearest Neighbor (KNN), Softmax, RNN, LSTM , etc.

In the field of financial time series prediction, deep learning methods have gained widespread acceptance due to their superior performance compared to traditional econometric models, and their ability to build forecasting models without expert knowledge (Hoseinzadeh et al., Citation2022). A study found that deep learning models can extract non-linear relationships between input and output data (Sohani et al., Citation2021) and it outperformed linear models on the basis of both statistical and economic criteria (Matias & Reboredo, Citation2012). Based on the literature review of financial time series prediction, Qiu et al. (Citation2020) draw a conclusion that the deep learning models have good nonlinear data fitting ability and adaptive self-learning ability, which is suitable for data with randomness and non-smoothness. This highlights the effectiveness of deep learning methods for forecasting tasks in the financial domain.

Early deep learning methods have proved to have the advantage of fitting nonlinear data. However, early deep learning methods had a limitation of over-fitting data (Wang, Cui et al., Citation2022). This limitation was overcome with the introduction of recurrent neural networks (RNN), which not only learns the current features of the temporal data through fully connecting hidden neurons but also considers the features gathered by the internal feedback mechanism in the past. But, RNN has disadvantages, including the vanishing gradient problem in long-term data. In order to solve this problem, the long short-term memory(LSTM) network is proposed, which introduces the memory cell to replace the hidden layer memory unit in RNN, as data flows through the network, the output of each layer depends not only on the past state but also on the current weights. Several studies have demonstrated the feasibility of using LSTM networks to predict stock prices (Chen & Ge, Citation2019; Kumar et al., Citation2021; Sun et al., Citation2021). Furthermore, there are two important aspects of long short-term memory are the unit-level neural organisation and the encoding of information (Su et al., Citation2021). Researchers must design experiments that meet these two requirements. Among them, the parameters of the model are changeable, researchers have also come up with their own methodologies. Parameter adjustment can start from various aspects, that is to say, the number of layers (Gao et al., Citation2021), the time step and the weight of attention mechanism (Kumar et al., Citation2021). Although deep neural network can capture data features of complex stock price data, the original features of opening price, closing price, highest price, lowest price and trading volume may not well meet the input requirements of the model. Studies have conducted several comparison experiments, including the comparison of each combination of features, the remaining stocks after filtering the stock according to the price fluctuation (Song et al., Citation2019), the input features and each combination of various deep learning models. These studies have led to the conclusion that novel input features are necessary for achieving more precise predictions of stock price fluctuations.

The innovations and contributions of this study are as follows:

The stock technical indicators are calculated based on stock price data, and 25 technical indicators were calculated to reach the final 57 characteristics. Feature combination screened by LASSO was input into the cascaded LSTM, in order to obtain a short training time and high-accuracy model.
Proposed a cascaded LSTM model, to accept multiple features data separately. And the model performs superior to the XGboost, Sequential LSTM model. When the cascade model is applied in the VMD-LSTM model, it also performs better.
To improve the accuracy of the proposed model, the rule of time division is studied. Time series data is divided into four groups, experimental results prove that the accuracy of the model increases with the input of different time interval time series data.

2. Related works

In recent years, deep learning has flourished, and it has been shown to be effective in dealing with time series data, including financial data. While there is a focus on developing effective deep learning models for financial analysis, it is important to recognise that high-quality data is equally critical for achieving accurate results. Prior to entering the network, data must be pre-processed, to ensure that it is in a suitable form for analysis. Therefore, careful consideration of data quality and appropriate pre-processing techniques are essential for the success of deep learning applications in finance.

Forecasting models often benefit from multiple features including transactional data, news, social media and search behaviour (Li et al., Citation2020). According to Wang, Cui et al. (Citation2022), the analysis of multiple factors has been found to offer more effective information and improve prediction accuracy in stock price prediction. Furthermore, investors are increasingly inclined to utilise multi-source data for decision-making purposes (Wu et al., Citation2022), primarily due to its interpretability. Moreover, some models cannot extract and recognise features well by themselves and require additional features pre-processing (Li et al., Citation2023). And methods could be feature selection algorithms or an integrated network structure that can extract features for specific data (Zhao & Yang, Citation2023). However, including multiple features can also introduce noise and increasing the training time. For example, inappropriate introduction of positive and negative sentiment indicators leads to poor prediction results. How to solve this problem? In the field of time series prediction, an effective method is to select the optimal feature combination, Li et al. (Citation2022) observed the holdings and returns of Hong Kong-funded institutions over the years, found that different institutions predicted and preferred stocks differently, given their diverse trading characteristics. Multi-objective optimisation algorithms are also widely used to find the optimal solution from a large number of features and parameters (Mahmoudan et al., Citation2022, Citation2021; Sohani, Pedram et al., Citation2021). In terms of stock price prediction, different feature screening methods have advantages and disadvantages. And it is believed that classic linear mapping dimensional reduction methods such as principal component analysis (PCA) and discriminant analysis (LDA) cannot get good results for nonlinear problems and were not suitable for dimensional reduction, which affected the multi-dimensional indicators of stock prices. Recently, linearisation of nonlinear stream learning (LML) techniques has led to advances in effective and efficient algorithms (Chen et al., Citation2022), and the nonlinear dimensional reduction method (Locally Linear Embedding LLE) could break through the limitations of the principal component analysis method (Yu et al., Citation2020) in nonlinear data. In addition to selecting methods based on the linear and nonlinear nature of the data, researchers also consider methods that extract and then cluster the data, thus making them more suitable for time series prediction models, such as using the XGboost method to extract features from the original data, then clustering them using the K-means method, and finally inputting the data into a deep neural network model (Wang & Zhu, Citation2022). Although this approach has some advantages, such as better performance than other methods, but it still has limitations associated with the difficulty in determining screening parameters, incomplete representation of information, and long training time. In terms of evaluation metrics, the error values are also higher compared to other literature.

For the selection of prediction method, methods used by researchers are constantly changing. Traditional financial time series forecasting methods, such as auto regressive moving average model (ARMIA), generalised auto regressive conditional heteroskedasticity(GARCH) (Bollerslev, Citation1986), which require high smoothness of the data and have been supplemented by machine learning techniques, including support vector machine model (SVM) (Xia Y used in stock price predicting; Xia et al., Citation2013), and artificial neural network (ANN). Moreover, an increasing number of researchers have been exploring the use of deep learning models in combination with data screening methods. In addition to PCA and LLE dimensional reduction methods mentioned above, researchers have combined the VMD (variational mode decomposition) method with deep neural networks. Yujun et al. (Citation2021) used VMD method and EEMD (Ensemble empirical mode decomposition) method to divide the index price into components with different bandwidths named intrinsic mode functions (IMF). They then inputted each IMF into LSTM model, respectively, since the sum of IMFs was the same as the original value, which had certain significance, and writer pointed out that the method was based on the idea of dividing and solving complex problems. Most of the methods based on modal decomposition will use independent architectures and have been proven effective in the results (Fu et al., Citation2022). Although the VMD method used by the researchers has improved the fit of the model to the time series to some extent and made the input data more suitable (Niu et al., Citation2020), there are still areas of deficiency, mainly in the following two aspects, (1) the method mainly performs the modal decomposition of a single time series (i.e. a single feature, mostly using the closing price), thus ignoring the influence of other features and causing the loss of information. (2) The components predicted by the model often derive the final results by accumulation, leading to the introduction of noise and imprecision of the results.

For the inner structure of deep learning network, researchers are constantly exploring new ways to optimise the structure of deep learning networks, and one popular approach is to experiment with different parameter combinations or layer structures in the network. While RNN/LSTM network lacks the advantage of stopping early (Kumar et al., Citation2021), limiting the number of epochs could prevent the model from over-fitting to the training data. and by controlling the number of recursions, the LSTM can be adapted to infinite scenarios with different timing rules (Tang et al., Citation2022). Additionally, a well-designed network should use proper activation functions and parameters to enhance performance. And when it comes to the structure, attention-based models, especially classification LSTM models, have shown great success successful (Yu & Kim, Citation2019). Hollis et al. (Citation2018) combined LSTM network with ATTENTION mechanisms, and pointed out that the idea was to leverage the developments in attention mechanisms to improve the performance of promising LSTM RNN architectures currently in use for FTS (Fuzzy Time Series) forecasting. Experimenting with the loss function as the model hyperparameter adjustment criterion, the ATTENTION LSTM obtains 60 % in prediction, which is higher than the 58 % of the baseline LSTM. It is worth noting that the method used in this paper is sensitive to hyperparameters, and small adjustments can lead to very different results, which can have a huge impact on model training. Another successful example is the work (Li et al., Citation2022), which put the ATTENTION-based LSTM network into the framework of migration learning and adopted the idea of adversarial learning, and finally obtained a good performance.

Based on the research above, we have focused on the input data and addressed the issue of insufficient features in the previous studies by introducing techno-economic indicators. We have also employed feature selection methods and optimised feature combinations. Moreover, an architecture similar to the VMD approach, namely, Ca-LSTM network, is proposed to introduce other features that VMD fails to introduce.

3. Methodologies

In this paper, the LASSO feature selection method is combined with the cascaded LSTM network to improve the precision of stock price prediction. This section introduces the basic principle of each method employed in the proposed model.

3.1. LASSO

The LASSO method is a regression analysis method that simultaneously selects features, aiming at enhancing the prediction accuracy and interpretability of statistical models. It is a penalised least square method and imposes an L1-penalty on the coefficients (Tibshirani, Citation1996). It is considered a constrained optimisation issue that intend to make the absolute weights being less than a constant t (Coelho et al., Citation2020), as what is expressed in the formula. The formula has y as the dependent variable, x as the independent variable, α as the regression model constant, and β as the coefficient of P independent variables. The t on the right-hand side represents the penalty coefficient, when the $t < t_{0}$ , some coefficients in the regression model will become 0 and be eliminated, thus achieving the effect of feature screening. (1) $(\hat{α}, \hat{β}) = \arg min {\sum_{i = 1}^{N} {[y_{i} - α - \sum_{j = 1}^{p} β_{i} x_{i j}]}^{2}}, S \cdot T \cdot \sum_{i = 1}^{P} | β_{i} | \leq t$ (1)

3.2. LSTM

As a time series data-friendly model, LSTM has been widely applied in the forecasting fields, as a variant of RNN, it solves the problem of gradient explosion. Additionally, its unique architecture (Figure ) allows it to better retain or forget temporal data information than other deep learning models in terms of retaining or forgetting temporal data information.

Figure 1. Structure of LSTM network.

As Figure shows, LSTM model contains three gates: input gate ( $i_{t}$ ), forgetting gate ( $f_{t}$ ) and output gate ( $o_{t}$ ). The input gate is used to determine whether a new input should be added to the layer, the forgetting gate is used to determine whether past information is discarded from the layer and the output gate is mainly used for output data. The main processes of the LSTM model can be given as follows: (2) $\begin{aligned} i_{t} = σ (ω_{i}, k [h_{t - 1}, x_{t}] + b_{i}) \end{aligned}$ (2) (3) $\begin{aligned} f_{t} = σ (ω_{f} * [h_{t - 1} x_{t}] + b_{f}) \end{aligned}$ (3) (4) $\begin{aligned} o_{t} = σ (ω_{o} \cdot * [h_{t - 1}, x_{t}] + b_{0}) \end{aligned}$ (4) (5) $\begin{aligned} C_{t} = f_{t} * C_{t - 1} + i * C_{t} \end{aligned}$ (5) (6) $\begin{aligned} C_{t} = \tanh (W_{c} * [h_{t - 1}, x_{t}] + b_{c}) \end{aligned}$ (6) (7) $\begin{aligned} h_{t} = o_{t}^{*} \tanh (c_{t}) \end{aligned}$ (7)

3.3. Ca-LSTM

Based on LSTM, we propose a cascade LSTM model (Figure ). In this model, the input gate is divided into six gates for receiving independent features. After data is fully calculated by the two LSTM hidden layers, we add a fully connected layer to store the results and finally pass the results to the output layer. We chose to use 80 neurons in each hidden layer, as this number performed well on most datasets when we manually adjusted the parameters. And after the full connection layer, we used the Adam optimiser, which is a variant of stochastic gradient descent, to optimise the model's parameters during the training.

Figure 2. Structure of Ca-LSTM.

The training process of a stock prediction model varies in its fit across different stocks. Applying the same parameters to different stocks can lead to overfitting or underfitting, especially when forecasting the SH000001 (Shanghai Composite Index) with its high stock price. To ensure accurate forecasts, we use loss tendency as the evaluation criterion and adjust parameters when the loss no longer shows a downward trend. In our study, we employed manual hyperparameter tuning based on loss tendency and performance on test datasets for various stocks. This optimisation approach allowed us to address overfitting or underfitting issues and improve model performance. We adjusted the number of epochs, batch size, activation function, and dropout rate based on the model's validation set performance. During the training, we closely monitored loss and accuracy metrics, stopping when the model reached a performance plateau on the validation set, preventing overfitting and ensuring generalisation. To comprehensively assess model performance, we compared predictions on validation datasets across different stocks using evaluation metrics such as MSE, RMSE, MSLE, and MAPE. This multi-metric evaluation provided a comprehensive understanding of the model's predictive capabilities. By employing manual hyperparameter tuning and comprehensive evaluation, we aimed to strike a balance between model complexity and generalisation, ultimately enhancing the accuracy and robustness of our stock prediction model, particularly for challenging stocks like the SH000001 (Figure ).

Figure 3. Tendency of loss (30 epochs).

After the model is fully trained, the sliding window idea in the prediction link can better simulate the real trading scenario by inputting the data of the last three to four months to predict the stock price at the next point in time, and using the point in time as the sliding range, waiting for the end of the point in time to continue to add the stock price at the next point in time to correct the model, and so on and so forth, the specific process we show in Figure .

Figure 4. Flow chart of the LASSSO + Ca-LSTM model.

3.4. Description and transformation of data

The original data contains stocks issued from the Shanghai stock exchange and Shenzhen stock exchange. Each stock has five features, including four stock price features and one quantitative feature, and the time period is from 2016 to 2020. In order to make a better selection, I choose volatility (as follows, p refers to price) as the stock selection criterion, volatility could better reflect sequence characteristics. There is a significant relation between volatility and expected stock returns (Bali & Hovakimian, Citation2009). Finally, I selected 30 stocks as experimental data set as is shown in Table . In order to better explore the data, technical indicators are calculated, which are used for stock evaluation in economics based on their original five characteristics, a total of 52 items, as is shown in Table , it is worth noting that since some of the items in the table contain sub-items, the final number of features is 52. We then take the original five features into account, and the final number of features in the constructed dataset is 57. (8) $\begin{aligned} R = r_{t} = \frac{p_{t} - p_{t - 1}}{p_{t - 1}} \end{aligned}$ (8) (9) $\begin{aligned} V o l a t i l i t y = σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} (r_{i} - \bar{r})^{2}} \end{aligned}$ (9)

Table 1. The datasets of the 30 selected stocks from 1 January 2016 to 30 November 2020.

Download CSV Display Table

Table 2. The technical indicators of the stocks.

Download CSV Display Table

After fully calculating the data, a set of characteristics describing the stock can be obtained, but for the LSTM network, the selection of features affects the accuracy, the training time and the value of the output data of the model. Feature differences are also very large. And feature selection could prevent the selection of too many or too little features (that is, more or less than necessary), and implement data reduction to speed up training and improve computational efficiency (Piramuthu, Citation2004). The features that are negatively correlated with the regression label features will make the effect of the model become extremely poor, while the features of low correlation will lead to longer training time, even if they improve the model's accuracy. This is a significant disadvantage in today's world of real-time trading. Therefore, the selection of features is extremely important. In this study, LASSO method is used to screen features with weights greater than 0, with the aim of obtaining the optimal combination of features as an input of LSTM network training (Table ). (10) $\begin{aligned} U p & = max (x_{t - n}, x_{t - n + 1} \dots x_{t}) \end{aligned}$ (10) (11) $\begin{aligned} L o w & = min (x_{t - n}, x_{t - n + 1} \dots x_{t}) \end{aligned}$ (11) (12) $\begin{aligned} m i d & = a v g (U p, L o w) \end{aligned}$ (12) (13) $\begin{aligned} V R & = (\sum_{1}^{N} (I F (p r i c e_{t} - p r i c e_{t - 1} > 0)), 1, 0) \\ + 1 / 2 * \sum_{1}^{N} (I F ({p r i c e}_{t} - {p r i c e}_{t - 1} = 0), 1, 0) \end{aligned}$ (13) (14) $\begin{aligned} T Y P & = a v g (H I G H, L O W, C L O S E) \end{aligned}$ (14) (15) $\begin{aligned} V 1 & = S U M (I F (T Y P \geq R E F (T Y P, 1), T Y P * V O L, 0), N) \\ / S U M (I F (T Y P \leq R E F (T Y P, 1) T Y P * V O L, 0), N) \end{aligned}$ (15) (16) $\begin{aligned} M F I & = 100 - (100 / (1 + V 1)) \end{aligned}$ (16) To reduce the impact on noise and facilitate optimisation of the solving process, each sample of data set is normalised to the range of [0, 1] by the following maximum and minimum standardised method. (17) $x (t)^{'} = (x (t) - min x (t)) / (max x (t) - min x (t))$ (17) To apply the LSTM network as a regression model, it is necessary to transform the original time series data including samples and features in supervised learning and add a time dimension, so that the predicted value can be obtained after it is input into the LSTM network. We add one timestep to the original data, and transform it as follows: (18) $\begin{aligned} \begin{matrix} [\begin{matrix} x_{0} \\ x_{1} \\ \dots \\ x_{t} \end{matrix}] & => [\begin{matrix} [\begin{matrix} x_{0} \\ x_{1} \\ \dots \\ x_{t - 1} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ \dots \\ x_{t} \end{matrix}] \end{matrix}] \end{matrix} \end{aligned}$ (18) (19) $\begin{aligned} \begin{matrix} [\begin{matrix} x_{00} & x_{01} & \dots & x_{0 n} \\ x_{10} & x_{11} & \dots & x_{1 n} \\ \dots & \dots & \dots & \dots \\ x_{t 0} & x_{t 1} & \dots & x_{t n} \end{matrix}] & => [\begin{matrix} [\begin{matrix} x_{00} & x_{01} & \dots & x_{0 n} \\ x_{10} & x_{11} & \dots & x_{1 n} \\ \dots \\ x_{(t - 1) 0} & x_{(t - 1) 1} & \dots & x_{(t - 1) n} \end{matrix}] [\begin{matrix} x_{10} & x_{11} & \dots & x_{1 n} \\ x_{20} & x_{21} & \dots & x_{2 n} \\ \dots \\ x_{t 0} & x_{t 1} & \dots & x_{t n} \end{matrix}] \end{matrix}] \end{matrix} \end{aligned}$ (19) As the model takes in normalised time series data, the output (i.e. predicted value) of the model is also normalised. In order to obtain the actual predicted stock prices, it is necessary to perform inverse normalisation on the output. This can be done by multiplying the predicted value by the range of the original stock price and adding the minimum stock price. This will give us the actual predicted stock price, which is on the same scale as the original data. (20) $x (t) = x (t^{'}) (max x (t) - min x (t) + min x (t))$ (20)

Table 3. Features remaining after using the LASSO method.

Download CSV Display Table

3.5. Evaluation criteria of experimental results

A good way to measure the effect of a model's time series output is to mean square it with the actual stock price (i.e. the test set). However, for the output results of the model, it is not enough to do the fitting, for stock price forecasting, the next time the price of joining can make the changes as a result, so we adopt the way of sliding window in the model to predict the shares of a point in time, after sliding data, to join the next time point in the model's price, time share prices are obtained. Here we predict the stock price at 30 points and then compare the predicted price with the actual stock price as an evaluation metric. Specifically, we calculate several commonly used metrics, RMSE (root mean square error), MAE (mean absolute error), MAPE (mean absolute percentage error), MSLE (mean squared logarithmic error) as follows: (21) $\begin{aligned} R M S E = \sqrt{(1 / N) * \sum_{t = 1}^{N} {(x_{t} - {\hat{x}}_{t})}^{2}} \end{aligned}$ (21) (22) $\begin{aligned} M A E = \frac{1}{N} * \sum_{t = 1}^{N} | x_{t} - {\hat{x}}_{t} | \end{aligned}$ (22) (23) $\begin{aligned} M A P E = \frac{1}{N} * \sum_{t = 1}^{N} | \frac{(x_{t} - {\hat{x}}_{t})}{x_{t}} | \end{aligned}$ (23) (24) $\begin{aligned} MSLE = (1 / N)^{*} \sum_{t - 1}^{N} {(\log ({\hat{x}}_{t} + 1) - \log (x_{t} + 1))}^{2} \end{aligned}$ (24)

3.6. Model complexity and overhead analysis

In this section, we provide a comprehensive analysis of the complexity and overhead associated with our proposed Ca-LSTM network for stock price prediction. We discuss the computational requirements at different stages of our approach, including data preprocessing, model training, and prediction. Additionally, we examine the practical implications of implementing the Ca-LSTM model, such as computational time and memory consumption.

To evaluate the computational complexity of our proposed approach, we analyse the time and space complexity at each stage of the Ca-LSTM model. Table presents a summary of the complexity analysis.

Table 4. Complexity analysis of the Ca-LSTM model.

Download CSV Display Table

In the data preprocessing stage, tasks such as data normalisation, feature engineering, and sequence partitioning have a linear time complexity of O(N), where N represents the number of data points in the input sequence. The space complexity remains O(1), as no additional storage requirements are introduced in this step.

For model training, the Ca-LSTM network involves forward and backward propagation, weight updates, and gradient computations. The time complexity is proportional to the number of epochs (E), the number of samples (S), and the number of layers (L). Hence, the overall time complexity is approximately O(E * S * L). The space complexity depends on the number of layers and hidden units, resulting in a space complexity of O(L * H), where H represents the number of hidden units in each layer.

During the prediction phase, the Ca-LSTM model performs forward propagation to generate future price predictions. The time complexity for prediction is O(S * L), where S denotes the number of steps in the input sequence. The space complexity remains O(L * H), similar to the training phase.

To assess the prediction time overhead of the Ca-LSTM model, we measured the average time required to generate predictions compared to the LSTM method. Due to the nature of the cascade, the training time of the Ca-LSTM method is roughly a constant multiple of that of the LSTM method.

In summary, our analysis demonstrates that the Ca-LSTM model exhibits a reasonable computational complexity throughout the various stages of data preprocessing, model training, and prediction.

4. Experiment description and results

4.1. Comparison of major forecasting models

In this part, two types of experiments are designed to evaluate the performance of the cascade LSTM network. Firstly, the LSTM network is compared with other time series data to predict the network effects, in order to get the cascade way LSTM network superiority. Secondly, the input data is divided into three levels: 1 h and 30 min, 10 min, to test the cascade LSTM network the circumstances under the optimal performance.

Table shows the experimental results predicted by the model of 3 stocks, 1 index and the average value of 30 stocks. We compared our proposed model with various existing models, and all results were obtained from the same dataset used for our proposed model. For each model compared, we referred to the literature to replicate the model's structure and parameter adjustment methods to ensure a fair comparison of the results. Table includes a range of commonly used prediction methods, spanning various fields of study. Specifically, ARMAX is a popular statistical approach, while SVR and XGBOOST are widely used in the field of machine learning, respectively. The deep learning methods of BP and LSTM are also included. Besides, since Ca-LSTM is also based on the same gate approach to overcome the shortcomings of RNN, in order to compare deep learning models from the gradient perspective, we refer to a gradient-free model PSO-SRNN based on particle swarm optimisation (PSO) algorithm in the literature (Bas et al., Citation2022) and compare it with other models. And to compare feature selection algorithms, we selected LLE, PCA, and LASSO. Notably, LASSO reduced the number of features to six, so we also included ARMAX (a variant of ARMA) with six features for comparison.

Table 5. Predictive performance evaluation of different models of different stocks.

Download CSV Display Table

We divide our dataset into three subsets: the training set, validation set, and test set. The training set comprises three years of historical stock price data, allowing the model to learn long-term patterns and trends. We allocate six months of data for the validation set, which helps fine-tune the model's hyperparameters during the training. The remaining six months of data form the test set, serving as an independent benchmark to evaluate the model's performance on unseen data. Additionally, we utilise a sliding window scheme to create sequential input-output pairs from the time series data, enabling the model to capture both short-term and long-term dependencies, which could also help the model to better simulate a real scene and to eliminate the model's failure to take new data into account. Take the trading day prediction scenario as an example, the model runs daily to predict the stock price for the following trading day. At the end of the trading day, if the price data generated by the trading is not taken into account, the model will be stuck in the past data, resulting in inaccurate predictions. By adding the data of each trading day through the sliding window, the model can timely adjust its prediction to achieve better accuracy.

The cascaded LSTM (Ca-LSTM) network consistently demonstrates superior performance compared to other methods across all four datasets, showcasing its advantages. Notably, as the volatility decreases, the Ca-LSTM model's performance further improves. Even in the case of the SH000001 dataset, which exhibits high value and challenging prediction characteristics, the Ca-LSTM model outperforms alternative methods in terms of RMSE and MAE metrics. While exploring the non-gradient Particle Swarm Optimisation-based SRNN (PSO-SRNN) algorithm, we observed its commendable performance surpassing that of the gradient-based LSTM method. Specifically, it excelled in the prediction of the SZ300161 stock set. However, upon evaluating the average performance across the 30 stocks, the PSO-SRNN method slightly underperformed when compared to our proposed Ca-LSTM model. It is worth noting that the PSO-SRNN method carries the drawback of high time overhead. Despite this limitation, it highlights the potential of optimisation algorithms for enhancing model performance. In conclusion, the comprehensive evaluation of our proposed Ca-LSTM model, alongside comparative analyses of other methods, underscores its superior performance across diverse datasets and the average evaluation metrics. While the PSO-SRNN method demonstrates competitive results on one dataset, it falls short of our model's overall performance. Furthermore, the PSO-SRNN method's high time overhead serves as a valuable reminder of the optimisation algorithm's potential for enhancing model outcomes. These findings reinforce the efficacy and potential of the Ca-LSTM model in the field of deep learning stock price prediction.

4.2. prediction performance of Ca-LSTM

And we could also get the predicted curve by the method of sliding window and the actual curve of the three stocks and one SH index as is shown in Figure .

Figure 5. Predicted price of LSTM and sliding window.

The dataset includes three stocks with varying levels of volatility: the lowest, middle, and highest. Upon examining the predictions, it is evident that the first two stocks have more accurate predictions, with predicted prices aligning closely with actual price movements, particularly during the first 12 points in the top left subfigure of Figure and the first 24 points in the top right subfigure of Figure . Notably, the model quickly adjusts to the trend after the discrete points in the first two charts, accurately repeating predictions. Conversely, the less stable third chart captures the trend in the actual data only a few days later. It is important to note that while predicting future prices for all four examples, the magnitudes of their prices on the y-axis differ significantly. The first three examples, with smaller price magnitudes, allow for better observation of the model's performance, as the trend of $p r e_p r i c e$ and $t r u e_p r i c e$ is the same, leading to highly accurate predictions. The fourth example has larger price magnitudes, resulting in larger differences between $p r e_p r i c e$ and $t r u e_p r i c e$ . Despite this, the overall trend remains consistent, demonstrating the model's value.

In order to assess the performance of the Ca-LSTM model in predicting stock prices over different time horizons, we conducted an experiment by adjusting the length of the prediction interval in the sliding window method. By varying the forecast horizon, we aimed to evaluate the model's capability to generate accurate predictions for short-, medium-, and long-term time series. The experiment involved training the Ca-LSTM model on historical stock price data and evaluating its performance for different prediction intervals. The prediction intervals considered in this experiment were categorised as short-term, medium-term, and long-term, representing 40, 60, and 80 days, respectively.

As is shown in Table , the experimental results demonstrate that the Ca-LSTM model exhibits consistent and reliable performance across short-, medium-, and long-term time series forecasting. Despite the increased forecast horizon, the model shows only a minimal increase in the error across the four evaluation indicators. These findings also indicate that our model maintains its stability and effectiveness in long-term time series forecasting, showcasing its ability to capture and forecast price movements over extended periods. The relatively small increase in error suggests that the model successfully captures the underlying trends and patterns in the stock market, enabling it to provide reliable predictions even for longer time horizons.

Table 6. Predictive performance evaluation of different stocks with different time lengths.

Download CSV Display Table

Fitting curves are useful for visualising the quality of data fitting of the network and understanding the overall trend in data, which is different from the sliding window method mentioned above. Figure shows the prediction results for time series data of the three stocks, it can be seen that the model fits the data well.

Figure 6. Fit curves of three stocks and one index. (a) Curve of the stock of lowest volatility, (b) curve of the stock of middle volatility, (c) curve of the stock of highest volatility and (d) curve of the Shanghai composite index.

The results of the 30 stocks are compared as is shown in Figure .

Figure 7. Example of a two-part figure with individual sub-captions showing that captions are flush left and justified if greater than one line of text. (a) RMSE/MAE tendency of 30 selected stocks, (b) MAPE tendency of 30 selected stocks and (c) MLSE tendency of 30 selected stocks.

We can roughly divide stocks into three groups, the volatility of these three groups has a big difference outside the group, and a small difference within the group. It can be clearly seen from the figure that, apart from the discrete value, evaluation index has an upward trend with the volatility. For stocks with similar volatility, the evaluation index is also similar.

4.3. Comparison with VMD-LSTM method

The VMD-LSTM (Figure ), which was introduced in the paper (Niu et al., Citation2020), presented an idea of decomposition of signal of stock data, enabling the model to learn more effectively. In this model, each IMF can be predicted by LSTM model, and finally the predicted prices are accumulated to get the stock price.

The method mentioned in this paper is similar to my proposed cascade LSTM structure, so I consider replacing the method in this paper. Table shows the comparison of results before and after replacement. It can be seen that all the metrics of the replaced model are stronger than the original VMD model. The reason why the values of MAPE indicators in Table are all 100 is that, in order to compare with the original paper, the data used for model training and prediction are normalised data as in the original paper, and it is characteristic that all the values are less than 1. Therefore, the values of MAPE indicators, which represent the percentage differences between the predictions and actual values, are all 100. In addition, Table also shows the comparison of effects between the proposed model and other temporal prediction models .

Figure 8. VMD-LSTM Structure of the paper (Niu et al., Citation2020).

Table 7. The comparison between the VMD+LSTM model in Ding and Qin (Citation2020) and my model.

Download CSV Display Table

4.4. Performance of the model at different time levels

To better leverage the model's advantages, I conducted a time-scale experiment and obtained some insights. When we apply the model in the trading scenario, we can consider its advantages of division of time level more. Table shows the performance of the model at different time levels. And I changed the length of the index due to the impact of time changes on the forecast horizon. With finer time division, the measurement interval will be enlarged to provide a more convincing evaluation index. The corresponding relationship of the measurement interval is as follows: 1 day = 30, 1 h = 120, 30 min = 240, 10 min = 720. The meaning of these numbers is that for the day-level forecast we forecast 30 data points into the future, and since there are four trading hours in a day, for the hour level we forecast 4 times the day level. 30-min level and 10-min level are also multiplied by a multiple each. They are uniformly measured at the day level and are therefore comparable.

Table 8. Performance evaluation of cascade LSTM of different time levels.

Download CSV Display Table

Table shows that the finer the time partition, the more accurate the model performs. And the stock SH600777 performs better than the SZ002323. The difference between the two stocks is that they have the lowest volatility and the highest volatility respectively. The conclusion shown in tables is still consistent with the increasing Figure above. The larger data values for the SH000001 make the RMSE and MAE values larger as well, but the excellent performance of MSLE and MAPE proves that the model is still applicable to it.

5. Conclusion

In this paper, a LASOO-based feature selection approach is combined with the cascaded LSTM (Ca-LSTM) network. Our proposed Ca-LSTM is based on the idea of reorganising data features and enhancing the impact of single features by cascading. Through a comprehensive comparison with the pure with other commonly used models on four datasets with dissimilar properties and thus guaranteed fairness, containing LSTM, XGboost, SVR, ARIMA, PSO-SRNN and the VMD-LSTM model, we have demonstrated the superior forecasting ability of our proposed model, especially on minute level data sets. Additionally, across a dataset of 30 selected stocks, we also find the shortcomings of the proposed model.

The advantages and disadvantages of the model are shown as follows: Firstly, the shorter the time period of the data used for prediction, the better the model's performance. Secondly, the model performed better for stocks with lower volatility. Thirdly, if the LASSO weight value is greater than zero, the effect is optimal. Furthermore, when the cascaded LSTM network is applied in the components which generated by the VMD network, it performs better than the method in the reference paper.

Our model not only contributes to the stock price prediction model in terms of single feature enhancement, but also provides a feasible direction, which is to use a better algorithm to select features, and then input them into the network model after feature enhancement. In the future, new model can be trained from our proposed model, and we plan to explore the use of more algorithms (for example, use GA for parameter adjustment, use ATTENTION to explore the inner structure) to develop our model and investigate the applicability of our proposed approach to other financial time series data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The datasets used in this study are not publicly available but can be obtained from the corresponding author upon reasonable request.

References

Alam, W., Ray, M., Kumar, R. R., Sinha, K., Rathod, S., & Singh, K. (2018). Improved ARIMAX modal based on ANN and SVM approaches for forecasting rice yield using weather variables. The Indian Journal of Agricultural Sciences, 88(12), 1909–1913. https://doi.org/10.56093/ijas.v88i12.85446
Web of Science ®Google Scholar
Bali, T. G., & Hovakimian, A. (2009). Volatility spreads and expected stock returns. Management Science, 55(11), 1797–1812. https://doi.org/10.1287/mnsc.1090.1063
Web of Science ®Google Scholar
Bas, E., Egrioglu, E., & Kolemen, E. (2022). Training simple recurrent deep artificial neural network for forecasting using particle swarm optimization. Granular Computing, 7(2), 411–420. https://doi.org/10.1007/s41066-021-00274-2
Google Scholar
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. https://doi.org/10.1016/0304-4076(86)90063-1
Web of Science ®Google Scholar
Bontempi, G., Ben Taieb, S., & Borgne, Y.-A. L. (2012). Machine learning strategies for time series forecasting. In European business intelligence summer school (pp. 62–77). Springer.
Google Scholar
Chang, P.-C., Liu, C.-H., Lin, J.-L., Fan, C.-Y., & Ng, C. S. (2009). A neural network with a case based dynamic window for stock trading prediction. Expert Systems with Applications, 36(3), 6889–6898. https://doi.org/10.1016/j.eswa.2008.08.077
Web of Science ®Google Scholar
Chen, K., Le, C., Zhong, S., Guo, L., & Xu, G. (2022). NNNPE: Non-neighbourhood and neighbourhood preserving embedding. Connection Science, 34(1), 2615–2629. https://doi.org/10.1080/09540091.2022.2133082
Web of Science ®Google Scholar
Chen, S., & Ge, L. (2019). Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction. Quantitative Finance, 19(9), 1507–1515. https://doi.org/10.1080/14697688.2019.1622287
Web of Science ®Google Scholar
Coelho, F., Costa, M., Verleysen, M., & Braga, A. P. (2020). Lasso multi-objective learning algorithm for feature selection. Soft Computing, 24(17), 13209–13217. https://doi.org/10.1007/s00500-020-04734-w
Web of Science ®Google Scholar
Ding, G., & Qin, L. (2020). Study on the prediction of stock price based on the associated network model of LSTM. International Journal of Machine Learning and Cybernetics, 11(6), 1307–1317. https://doi.org/10.1007/s13042-019-01041-1
Web of Science ®Google Scholar
Fu, L., Ding, X., & Ding, Y. (2022). Ensemble empirical mode decomposition-based preprocessing method with multi-LSTM for time series forecasting: A case study for hog prices. Connection Science, 34(1), 2177–2200. https://doi.org/10.1080/09540091.2022.2111404
Web of Science ®Google Scholar
Gao, Y., Wang, R., & Zhou, E. (2021). Stock prediction based on optimized LSTM and GRU models. Scientific Programming, 2021(4), 1–8. https://doi.org/10.1155/2021/4055281
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
PubMed Web of Science ®Google Scholar
Hollis, T., Viscardi, A., & Yi, S. E. (2018). A comparison of LSTMs and attention mechanisms for forecasting financial time series. Preprint arXiv:1812.07699.
Google Scholar
Hoseinzadeh, S., Sohani, A., & Ashrafi, T. G. (2022). An artificial intelligence-based prediction way to describe flowing a Newtonian liquid/gas on a permeable flat surface. Journal of Thermal Analysis and Calorimetry, 147(6), 4403–4409. https://doi.org/10.1007/s10973-021-10811-5
Web of Science ®Google Scholar
Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., & Hussain, O. K. (2013). Support vector regression with chaos-based firefly algorithm for stock market price forecasting. Applied Soft Computing, 13(2), 947–958. https://doi.org/10.1016/j.asoc.2012.09.024
Web of Science ®Google Scholar
Kumar, K., Haider, M., & Uddin, T. (2021). Enhanced prediction of intra-day stock market using metaheuristic optimization on RNN–LSTM network. New Generation Computing, 39(1), 231–272. https://doi.org/10.1007/s00354-020-00104-0
Web of Science ®Google Scholar
Li, J., Zhou, T., & Hu, X. (2022). Prediction algorithm of stock holdings of Hong Kong-funded institutions based on optimized PCA-LSTM model. International Journal of Innovative Computing, Information and Control, 18(3), 999–1008.
Web of Science ®Google Scholar
Li, Q., Tan, J., Wang, J., & Chen, H. (2020). A multimodal event-driven LSTM model for stock prediction using online news. IEEE Transactions on Knowledge and Data Engineering, 33(10), 3323–3337. https://doi.org/10.1109/TKDE.2020.2968894
Web of Science ®Google Scholar
Li, S., Tian, Z., & Li, Y. (2023). Residual long short-term memory network with multi-source and multi-frequency information fusion: An application to China's stock market. Information Sciences, 622, 133–147. https://doi.org/10.1016/j.ins.2022.11.136
Web of Science ®Google Scholar
Li, Y., Dai, H.-N., & Zheng, Z. (2022). Selective transfer learning with adversarial training for stock movement prediction. Connection Science, 34(1), 492–510. https://doi.org/10.1080/09540091.2021.2021143
Web of Science ®Google Scholar
Mahmoudan, A., Esmaeilion, F., Hoseinzadeh, S., Soltani, M., Ahmadi, P., & Rosen, M. (2022). A geothermal and solar-based multigeneration system integrated with a TEG unit: Development, 3E analyses, and multi-objective optimization. Applied Energy, 308, 118399. https://doi.org/10.1016/j.apenergy.2021.118399
Web of Science ®Google Scholar
Mahmoudan, A., Samadof, P., Hosseinzadeh, S., & Garcia, D. A. (2021). A multigeneration cascade system using ground-source energy with cold recovery: 3E analyses and multi-objective optimization. Energy, 233, 121185. https://doi.org/10.1016/j.energy.2021.121185
Web of Science ®Google Scholar
Matias, J. M., & Reboredo, J. C. (2012). Forecasting performance of nonlinear models for intraday stock returns. Journal of Forecasting, 31(2), 172–188. https://doi.org/10.1002/for.v31.2
Web of Science ®Google Scholar
Niu, H., Xu, K., & Wang, W. (2020). A hybrid stock price index forecasting model based on variational mode decomposition and LSTM network. Applied Intelligence, 50(12), 4296–4309. https://doi.org/10.1007/s10489-020-01814-0
Web of Science ®Google Scholar
Piramuthu, S. (2004). Evaluating feature selection methods for learning in data mining applications. European Journal of Operational Research, 156(2), 483–494. https://doi.org/10.1016/S0377-2217(02)00911-6
Web of Science ®Google Scholar
Qiu, J., Wang, B., & Zhou, C. (2020). Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS One, 15(1), e0227222. https://doi.org/10.1371/journal.pone.0227222
PubMed Web of Science ®Google Scholar
Rounaghi, M. M., & Zadeh, F. N. (2016). Investigation of market efficiency and financial stability between S&P 500 and london stock exchange: Monthly and yearly forecasting of time series stock returns using ARMA model. Physica A: Statistical Mechanics and Its Applications, 456, 10–21. https://doi.org/10.1016/j.physa.2016.03.006
Web of Science ®Google Scholar
Sohani, A., Hoseinzadeh, S., Samiezadeh, S., & Verhaert, I. (2021). Machine learning prediction approach for dynamic performance modeling of an enhanced solar still desalination system. Journal of Thermal Analysis and Calorimetry, 147, 3919–3930. https://doi.org/10.1007/s10973-021-10744-z
Web of Science ®Google Scholar
Sohani, A., Pedram, M. Z., Berenjkar, K., Sayyaadi, H., Hoseinzadeh, S., Kariman, H., & Assad, M. E. H. (2021). Techno-energy-enviro-economic multi-objective optimization to determine the best operating conditions for preparing toluene in an industrial setup. Journal of Cleaner Production, 313, 127887. https://doi.org/10.1016/j.jclepro.2021.127887
Web of Science ®Google Scholar
Song, Y., Lee, J. W., & Lee, J. (2019). A study on novel filtering and relationship between input-features and target-vectors in a deep learning model for stock price prediction. Applied Intelligence, 49(3), 897–911. https://doi.org/10.1007/s10489-018-1308-x
Web of Science ®Google Scholar
Soni, P., Tewari, Y., & Krishnan, D. (2022). Machine learning approaches in stock price prediction: A systematic review. Journal of Physics: Conference Series, 2161, 012065.
Google Scholar
Su, Z., Xie, H., & Han, L. (2021). Multi-factor RFG-LSTM algorithm for stock sequence predicting. Computational Economics, 57(4), 1041–1058. https://doi.org/10.1007/s10614-020-10008-2
Web of Science ®Google Scholar
Sun, L., Xu, W., & Liu, J. (2021). Two-channel attention mechanism fusion model of stock price prediction based on CNN-LSTM. Transactions on Asian and Low-Resource Language Information Processing, 20(5), 1–12. https://doi.org/10.1145/3453693
Web of Science ®Google Scholar
Tang, M., Chen, W., & Yang, W. (2022). Anomaly detection of industrial state quantity time-series data based on correlation and long short-term memory. Connection Science, 34(1), 2048–2065. https://doi.org/10.1080/09540091.2022.2092594
Web of Science ®Google Scholar
Tibshirani, R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Web of Science ®Google Scholar
Wang, J., Cheng, Q., & Dong, Y. (2022). An XGBoost-based multivariate deep learning framework for stock index futures price forecasting. Kybernetes, 52(10), 4158–4177.
Web of Science ®Google Scholar
Wang, J., Cui, Q., Sun, X., & He, M. (2022). Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based LSTM model. Engineering Applications of Artificial Intelligence, 113, 104908. https://doi.org/10.1016/j.engappai.2022.104908
Web of Science ®Google Scholar
Wang, J., & Zhu, S. (2022). A multi-factor two-stage deep integration model for stock price prediction based on intelligent optimization and feature clustering. Artificial Intelligence Review, 56, 7237– https://doi.org/10.1007/s10462-022-10352-9
Web of Science ®Google Scholar
Wang, J.-Z., Wang, J.-J., Zhang, Z.-G., & Guo, S.-P. (2011). Forecasting stock indices with back propagation neural network. Expert Systems with Applications, 38(11), 14346–14355. https://doi.org/10.1016/j.eswa.2011.04.222
Web of Science ®Google Scholar
Wu, S., Liu, Y., Zou, Z., & Weng, T.-H. (2022). S_i_lstm: Stock price prediction based on multiple data sources and sentiment analysis. Connection Science, 34(1), 44–62. https://doi.org/10.1080/09540091.2021.1940101
Web of Science ®Google Scholar
Xia, Y., Liu, Y., & Chen, Z. (2013). Support vector regression for prediction of stock trend. In 2013 6th International conference on information management, innovation management and industrial engineering (Vol. 2, pp. 123–126). IEEE.
Google Scholar
Yu, Y., & Kim, Y.-J. (2019). Two-dimensional attention-based LSTM model for stock index prediction. Journal of Information Processing Systems, 15(5), 1231–1242.
Web of Science ®Google Scholar
Yu, Z., Qin, L., Chen, Y., & Parmar, M. D. (2020). Stock price forecasting based on LLE-BP neural network model. Physica A: Statistical Mechanics and Its Applications, 553, 124197. https://doi.org/10.1016/j.physa.2020.124197
Web of Science ®Google Scholar
Yujun, Y., Yimei, Y., & Wang, Z. (2021). Research on a hybrid prediction model for stock price based on long short-term memory and variational mode decomposition. Soft Computing, 25(21), 13513–13531. https://doi.org/10.1007/s00500-021-06122-4
Web of Science ®Google Scholar
Zahedi, J., & Rounaghi, M. M. (2015). Application of artificial neural network models and principal component analysis method in predicting stock prices on Tehran stock exchange. Physica A: Statistical Mechanics and Its Applications, 438, 178–187. https://doi.org/10.1016/j.physa.2015.06.033
Web of Science ®Google Scholar
Zhao, Y., & Yang, G. (2023). Deep learning-based integrated framework for stock price movement prediction. Applied Soft Computing, 133, 109921. https://doi.org/10.1016/j.asoc.2022.109921
Web of Science ®Google Scholar

A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network

Abstract

1. Introduction

2. Related works