# Predicting value stocks trend using ARIMA

Stock market investors need to forecast the movement of stocks before making an investment decision. It helps to create an optimum portfolio for better returns. There are different forecasting models like fundamental, technical, or analytics techniques (Almasarweh & Wadi, 2018; Dr C. Viswanatha Reddy, n.d.). Among these, the linear technical model of prediction is the Autoregressive integrated moving average (ARIMA). This article identifies the forecast model for value stocks for 303 stocks listed in the Bombay Stock Exchange for the period 2000 to 2020.

The previous article assessed the nature of the dataset for income stocks using the ARIMA model. Considering the average closing price, and return data, initially the assessment of the stationary nature of the dataset by ADF (Augmented Dickey-Fuller) test is done. Furthermore, the presence of serial correlation and partial serial correlation is examined by Correlogram and partial Correlogram. Thus, verification of the basic assumptions of the ARIMA model, the dataset for value stocks is tested at a 5% or 10% level of significance.

## Stationarity test of value stocks

A time-series dataset is said to be stationary if it is stable over a period of time, i.e. unaffected by time. Before proceeding to the ARIMA model, it is important to ensure that the dataset is stationary. The presence of a non-stationary form of dataset not only makes the results spurious but also unreliable. Financial markets witness continuous changes and the prediction of future movement is dependent on the historical movement of stocks. Thus for building a forecasting model, it is essential to examine the nature of the dataset.

### Average closing price

An investor assesses the average closing price of a stock for making a prediction regarding the closing price in future. For making these predictions, however, it is essential to understand the presence of stability in the dataset. Analysis results for the stationary nature of the dataset are shown in the below table

Variable | Test statistic | 5% Critical value | 10% Critical Value | p-value |
---|---|---|---|---|

AverageClose | -1.47 | -2.86 | -2.57 | 0.55 |

D.AverageClose | -51.99 | -2.86 | -2.57 | 0.00 |

D.DAverageClose | -75.92 | -2.86 | -2.57 | 0.00 |

The P-value for the average close is 0.55, which is higher than the required value of 0.05 or 0.10. Even the absolute test statistic value is less than the absolute critical value. This depicts that the null hypothesis of unit root presence in the dataset is not rejected. These results are also supported by trends shown in Figure 1 wherein the upward and downward movement of the closing price is influenced by time. Thus, the average closing price dataset is not stationary. Therefore we proceed to test stationarity at 1^{st} order difference.

As shown in figure 1 above, a p-value of 0.00 at the 1st-order difference level (D.AverageClose) of the average closing price is less than the significance value of 0.05 or 0.10. The absolute test statistic value is also higher than the absolute critical value. This depicts that the null hypothesis of unit root presence in the dataset is rejected. Figure 2 below supports the statistical analysis wherein, the movement in the 1st-order difference level of average closing price is not influenced by time. Thus, the stationary form is derived 1^{st} order difference.

The 2^{nd} order difference level assessment of the average closing price shows the p-value of 0.00 < 0.05 or 0.10. The absolute test statistic value is more than the absolute critical value i.e. 75.92 > 2.86 or 2.57. Thus, the null hypothesis of a unit root in the dataset is rejected. Trend analysis also supports that ADF results wherein, the movement of average closing price at 2^{nd} order difference level is not affected by time. Hence, 2^{nd} order difference level average closing price is stationary.

The stationarity test of average closing prices for value stocks shows stationary nature at the 1^{st} and 2^{nd} order difference levels.

### Average return

Average return defines the earning capacity from a particular stock. Investors making decisions sometimes focus on the return generation capacity of stock instead of price. Thus, it is required to know whether the dataset is stable enough to make predictions or not. The analysis results for the average return dataset is shown below.

Variable | Test statistic | 5% Critical value | 10% Critical Value | p-value |
---|---|---|---|---|

AverageReturn | -63.92 | -2.86 | -2.57 | 0.00 |

D.AverageReturn | -89.98 | -2.86 | -2.57 | 0.00 |

D.DaverageReturn | -89.64 | -2.86 | -2.57 | 0.00 |

The above table has the average return p-value as 0.00 < 0.05 or 0.10. Even the absolute test statistic value is more than the absolute critical value. Thus, the null hypothesis of unit root presence in the dataset is rejected. Figure 4 below supports these results wherein, the movement of the average closing price is not influenced by variation in time. Hence, the dataset is stationary in nature.

Table 2 shows that the p-value of the average return at the 1st-order difference level is 0.00 which is less than the required value of 0.05 or 0.10. A comparison of the absolute test statistic value with the absolute critical value shows that 89.98 > 2.86 or 2.57. Thus, the null hypothesis of a unit root in the dataset is rejected. Trend assessment of the 1st order differenced average return also shows that the movement in return value is not affected by the change in time. Hence, the 1st-order difference level of average return is stationary.

The analysis is shown in Table 2 for the 2^{nd} order difference level also has a p-value of 0.00 < 0.05 or 0.10. The absolute test statistic value is also higher than the absolute critical value depicting the rejection of the null hypothesis wherein it is stated that unit root is present. Graphical analysis of data verifies the result with having no major influence of time on the variation in the movement of average return at the 2nd-order difference level. Thus, the stationary form is derived at the 2nd-order difference level.

Hence, the examination of average return shows that the stationary nature of the dataset is derived at level test, 1^{st} order and 2^{nd} order difference level.

## Correlogram test for value stocks

The presence of randomness in the dataset often leads to errors over time. In financial markets, future performance is majorly influenced by past performance. Thus there is a risk of the influence of certain errors in past on the future. In order to avoid these, the serial correlation of the dataset is assessed using a correlogram test. The below sub-sections present the analysis for the average closing price and average return dataset.

### Average closing price

After establishing stationarity, it is essential to also understand the presence of serial correlation in the dataset of the closing price. It helps determine that the current value of the average closing price is dependent on historical values. Analysis for serial assessment correlation of the variable is shown below.

The above sections show that the average closing price is derived from its stationary nature at the 1st and 2nd order difference level, thus serial correlation presence is assessed for the average closing price at these levels. In Figure 7, the shaded region represents the acceptance region while straight lines represent the autocorrelation value at different lags. Having the lag value outside the acceptance region for 5 and 12, they are considered the lags having serial correlation. The forecasting model will include the moving average of 5 or 12 at the 1st-order difference level to remove this impact.

For the 2^{nd} order difference level value of average closing price, only at lag level 1, the autocorrelation value is outside the acceptance region. Thus, the moving average level considered for the value stocks forecasting model would be 1 at the 2nd-order difference level.

### Average return

An effective prediction about the average return could be made by linking it with historical values. Thus, to assess the presence of serial correlation for the variable, the correlogram testing is shown in the below figure.

The stationary form of the average return dataset is derived at level tests, 1st order difference and 2nd order difference level. Therefore, the presence of serial correlation is assessed for each of these levels. Figure 9 shows that lag level 3 and 12 have autocorrelation value outside the acceptance region. Thus, for building the forecasting model and reducing the influence of serial correlation, the moving average value that would be considered in the model is 3 and 12 at the level test.

In the above, the autocorrelation value for lag 1, 4, 35, and 36 is outside the acceptance region. Thus, the moving average value at the 1^{st} order difference level in the forecasting model would be 1, 4, 35, and 36.

The above figure shows that only for lag level 1, the autocorrelation value for average return is outside the acceptance region. Thus, the moving average value at the 2^{nd} order difference level in the average return forecasting model is 1.

## Partial correlogram test for value stocks

A partial correlogram in a time series examines the presence of partial autocorrelation. As in financial markets, the movement of stocks’ value is dependent on their historical data, thus, it is essential to assess the existence of partial autocorrelation. The below sub-section presents the analysis for an average closing price and average return.

### Average closing price

Similar to the serial correlation, partial autocorrelation presence also needs to be examined to determine the linkage with historical values. The analysis result for the average closing price is shown below.

Derivation of stationary form for average closing price at 1^{st} and 2^{nd} order difference level, partial autocorrelation presence is assessed at these levels. With lag 2 partial autocorrelation value outside the acceptance region, the forecasting model of average closing price would include 2 autoregressive.

The above figure shows that the partial autocorrelation value for the entire lag level is outside the acceptance region. Thus, to remove the influence of partial autocorrelation, 1 to 3 would be considered autoregressive at the 2nd-order difference level forecasting model.

### Average return

For the average return too, the presence of partial autocorrelation is assessed using partial correlogram testing. The results of the analysis are shown below.

For average return, the stationary form was derived at level test, 1^{st} order and 2^{nd} order difference, thus a partial correlogram for an average return would be assessed at these levels. Figure 14 shows that lag value of 3 and 5 has partial autocorrelation value outside the acceptance region. Thus, the autoregressive of 3 and 5 would be considered in the forecasting model for eliminating the influence of partial autocorrelation.

The above figure shows that for all the lags i.e. 1 to 4 the partial autocorrelation value is higher than the required acceptance value. Thus, the forecasting model of average return at 1st order difference level would include an autoregressive value of 1 to 4 to reduce the influence of partial autocorrelation.

The above figure shows that for the lag values of 1 to 3, the partial autocorrelation value is outside the acceptance region. Thus, the autoregressive value from 1 to 3 could reduce the influence of partial autocorrelation in the forecasting model of average return at the 2nd-order difference level.

## ARIMA-based forecasting model for value stocks

The models listed in Table 3 are identified as effective forecasting models for an average closing price, and average return.

Average Closing Price | Average Return |
---|---|

(2,1,5) | (3,0,3) |

(2,1,12) | (5,0,3) |

(1,2,1) | (3,0,12) |

(2,2,1) | (5,0,12) |

(3,2,1) | (1,1,1) |

(2,1,1) | |

(3,1,1) | |

(4,1,1) | |

(1,1,4) | |

(2,1,4) | |

(3,1,4) | |

(4,1,4) | |

(1,1,35) | |

(2,1,35) | |

(3,1,35) | |

(4,1,35) | |

(1,1,36) | |

(2,1,36) | |

(3,1,36) | |

(4,1,36) | |

(1,2,1) | |

(2,2,1) | |

(3,2,1) |

The above table states all the possible ARIMA models for the income stocks. These models are derived based on the identified level that needs to be included to derive stability by ADF test, correlogram, and partial correlogram. This is necessary because effective prediction is possible only when the model is stable and free from the presence of serial correlation or partial correlation.

#### References

- Almasarweh, M., & Wadi, S. AL. (2018). ARIMA Model in Predicting Banking Stock Market Data.
*Modern Applied Science*,*12*(11), 309. https://doi.org/10.5539/mas.v12n11p309 - Dr. C. Viswanatha Reddy. (n.d.).
*Predicting the Stock Market Index Using Stochastic Time Series*.*2018*. - Shah, D., Isah, H., & Zulkernine, F. (2019). Stock market analysis: A review and taxonomy of prediction techniques.
*International Journal of Financial Studies*,*7*(2). https://doi.org/10.3390/ijfs7020026

## Discuss