How to test time series autocorrelation in STATA?
The previous article showed how to perform heteroscedasticity tests of time series data in STATA. It also showed how to apply a correction for heteroscedasticity so as not to violate the Ordinary Least Squares (OLS) assumption of constant variance of errors. This article shows a testing serial correlation of errors or time series autocorrelation in STATA. An autocorrelation problem arises when error terms in a regression model correlate over time or are dependent on each other.
Why test for autocorrelation?
It is one of the main assumptions of the OLS estimator according to the Gauss-Markov theorem that in a regression model:
Cov(ϵ_(i,) ϵ_j )=0 ∀i,j,i≠j, where Cov is the covariance and ϵ is the residual.
The presence of autocorrelation in the data causes and correlates with each other and violates the assumption, showing bias in the OLS estimator. It is therefore important to test for autocorrelation and apply corrective measures if it is present. This article focuses on two common tests for autocorrelation; the Durbin Watson D test and the Breusch Godfrey LM test. Like the previous article (Heteroscedasticity test in STATA for time series data), first run the regression with the same three variables Gross Domestic Product (GDP), Private Final Consumption (PFC) and Gross Fixed Capital Formation (GFC) for the time period 1997 to 2018.
Durbin Watson test for autocorrelation
Durbin Watson’s test depends upon 2 quantities; the number of observations and the number of parameters to test. In the dataset, the number of observations is 84 and the number of parameters is 2 (GFC and PFC). In the Durbin-Watson table two numbers are present– dl and du. These are the “critical values” (figure below).
Durbin Watson’s statistic ranges from 0 to 4. As the above scale shows, a statistics value between 0 to dl represents positive serial autocorrelation. Values between dl and du; 4-du and 4-dl indicate serial correlation cannot be determined. The value between du and 4-du represents no autocorrelation. Finally, the value between 4-dl and 4 indicates a negative serial correlation at a 95% confidence interval.
Command for the Durbin Watson test is as follows:
However, STATA does not provide the corresponding p-value. To obtain the Durbin-Watson test statistics from the table conclude whether the serial correlation exists or not. Download the Durbin Watson D table here.
In the above figure, the rows show the number of observations and the columns represents the “k” number of parameters. Here the number of parameters is 2 and the number of observations is 84.
Durbin Watson’s lower limit from the table (dl) = 1.600
Durbin Watson’s upper limit from the table (du) = 1.696
Therefore, when du and dl are plotted on the scale, the results are as follows (figure below).
Durbin Watson d statistics from the STATA command is 2.494, which lies between 4-dl and 4, implying there is a negative serial correlation between the residuals in the model.
Breusch-Godfrey LM test for autocorrelation
The Breusch-Godfrey LM test has an advantage over the classical Durbin-Watson D test. The Durbin-Watson test relies upon the assumption that the distribution of residuals is normal whereas the Breusch-Godfrey LM test is less sensitive to this assumption. Another advantage of this test is that it allows researchers to test for serial correlation through a number of lags besides one lag which is a correlation between the residuals between time t and t-k (where k is the number of lags). This is unlike the Durbin-Watson test which allows testing for only correlation between t and t-1. Therefore if k is 1, then the results of the Breusch-Godfrey test and Durbin-Watson test will be the same.
Follow the below command for the Breusch Godfrey LM test in STATA.
The following results will appear as shown below.
The hypothesis in this case is:
- Null hypothesis: There is no serial correlation.
- Alternative Hypothesis: There is a serial correlation.
Since from the above table, chi2 is less than 0.05 or 5%, the null hypothesis can be rejected. In other words, there is a serial correlation between the residuals in the model. Therefore correct for the violation of the assumption of no serial correlation.
Correction for autocorrelation
To correct the autocorrelation problem, use the ‘prais’ command instead of regression (same as when running regression), and the ‘corc’ command at last after the names of the variables.
Below is the command for correcting autocorrelation.
prais gdp gfcf pfce, corc
The below results will appear.
At the end of the results, finally, calculate original and new Durbin Watson statistics as follows.
The New D-W statistic value is 2.0578 which lies between du and 4-du, implying that there is no autocorrelation now. Thus it has been corrected.
Furthermore, the next article discusses the issue of multicollinearity. Multicollinearity arises when two or more two explanatory variables in the regression model highly correlate with each other.