ARIMA modeling for time series analysis in STATA

In the previous article, all possibilities for performing Autoregressive Integrated Moving Average (ARIMA) modeling for the time series GDP were identified as under.

S. No ARIMA
1 (1,1,1)
2 (1,1,2)
3 (1,1,3)
4 (1,1,4)
5 (1,1,5)
6 (1,1,6)
7 (1,2,1)
8 (4,2,1)
9 (9,2,1)

 Table 1: ARIMA models as per ACF and PACF graphs.

Testing ARIMA models in STATA for time series analysis

The present article tests all these ARIMA models and identifies the appropriate one for the process of forecasting time series GDP. To start with testing ARIMA models in STATA:

  1. Click on ‘Statistics’ in ribbon
  2. Click on ‘time series’
  3. Select ‘ARIMA and ARMAX models’ (Figure 1 below)
Figure 1: Path for ARIMA modeling in STATA

Figure 1: Path for ARIMA modeling in STATA

Test 1: ARIMA (1,1,1)

A dialogue box will appear as shown in the figure below. Here fill four important options to carry out ARIMA testing. First select the time series variable fitting ARIMA model. In the present case, the time series variable is GDP. Therefore select ‘gdp’ in the ‘Dependent variable’ option. Second, record the ARIMA model specifications estimated in previous article. Therefore for the first ARIMA model, (1, 1, 1) (Table 1 above), select ‘1’ in ‘Autoregressive order (p)’, ‘1’ in ‘Integrated order (d)’, and ‘1’ in ‘Moving-average order (q)’.

Figure 2: Dialogue box for ARIMA modeling in STATA

Figure 2: Dialogue box for ARIMA modeling in STATA

After selecting the values for ARIMA model specifications, click on ‘Ok’ to proceed for results (Figure 3 below).

Figure 3: Dialogue box for ARIMA modeling in STATA

Figure 3: Dialogue box for ARIMA modeling in STATA

Now ARIMA (1, 1, 1) results will appear, as the figure below shows.

Figure 4: ARIMA (1,1,1) results for time series GDP

Figure 4: ARIMA (1,1,1) results for time series GDP

ARIMA results can be analysed through several components.

Log likelihood: The log likelihood component of ARIMA model should be high, like in the present case. The value of log likelihood (ignoring negative sign) is 554. This is sufficiently high. Compare log likelihood value of different ARIMA models and select the one which has the highest.

Coefficient of AR: The coefficient of AR should be less than 1 and at least 5% level of significance. Here, the coefficient of AR is significant at 5% (0.000) but is close to 1 (0.98967). This suggests that differenced time series GDP may still be non-stationary. Therefore, compare different ARIMA models based on the coefficients of AR and MA, their value (if close to zero) and their significance.

AIC/BIC: The value of ‘AIC’ and ‘BIC’ should be lowest in comparison to other ARIMA models. The value of AIC/BIC is usually the reverse of log likelihood function. Therefore instead of log likelihood, compare different ARIMA models based on the value of AIC/BIC. The ARIMA model with lowest AIC/BIC value will be more appropriate for forecasting.

Similarly, to compare the applicability of ARIMA (1,1,1) calculate next ARIMA model (1,1,2) to compare these two models.

Test 2: ARIMA (1,1,2)

Again filled the values in ARIMA specifications as per (1, 1, 2). After selecting the values for ARIMA model specifications, click on ‘OK’ to proceed for results (Figure 5).

Figure 5: Dialogue box for ARIMA modeling in STATA

Figure 5: Dialogue box for ARIMA modeling in STATA

The figure below shows the results for ARIMA (1,1,2).

Figure 6: ARIMA (1,1,2) results for time series GDP

Figure 6: ARIMA (1,1,2) results for time series GDP

 

ARIMA results as presented in above Figure 6 can be analysed through several components, as below:

Log likelihood: the value of log likelihood (ignoring negative sign) is 552 which is similar to previous ARIMA model (1, 1, 1).

Coefficient of AR: The coefficient of AR and MA are significant but coefficient of AR is insignificant at 5%. This suggests that differenced time series GDP may still be non-stationary. Therefore, similar to previous model, ARIMA (1,1,2) also is not appropriate for forecasting.

AIC/BIC: The value of AIC and BIC is less than previous model but only up to 1 point.  Therefore, no significant difference between ARIMA (1,1,1) and (1,1,2) can be seen. Thus both are inappropriate for forecasting time series GDP.

Test the remaining ARIMA models with different specifications following same procedures (Figures 1, 2 and 3). Then click on ‘OK’ for results.

Comparison of all ARIMA Models

This section presents a comparison of all ARIMA forecasting models mentioned in Table 1. Values of AR and MA coefficients, their significance and values of AIC and BIC are evaluated.

Table 2: Comparison of ARIMA models for time series GDP in STATA

Table 2: Comparison of ARIMA models for time series GDP in STATA

As mentioned previously, the variables of interest in appropriate ARIMA modeling are AR and MA component, AIC/BIC values and significance level. The Table 2 above has been organized as per these variables. Significance level of coefficients is indicated with sign “*”.

To select the best ARIMA model, first identify those models which have AR and MA coefficients as significant as well as lesser than 1. In the table above all the ARIMA models either have AR or MA coefficients close to 0 (indicating non-stationarity) or are insignificant at 5%. However, in case of ARIMA model (9, 2, 1), majority of AR and MA coefficients are lesser than 1 and significant at 5%. Therefore, in terms of coefficient selection, ARIMA model (9, 2, 1) is appropriate.

Second, identify those ARIMA models with minimum value of AIC or BIC. As per the table 2, ARIMA model (1, 2, 1) and ARIMA model (9, 2, 1) are the only ones with lowest AIC/BIC values. However, in ARIMA model (1, 2, 1), the coefficient of MA is almost 1, with insignificance greater than 5%. Therefore, this model cannot be treated for estimating the time series GDP. Therefore, ARIMA (9, 2, 1) is the most appropriate one to estimate the GDP time series.

Thus, ARIMA model (9, 2, 1) is the perfect model exhibiting all the structural trends in GDP data and can be useful for forecasting GDP. The following article explains prediction and forecasting using ARIMA in STATA.

Priya Chetty

Partner at Project Guru
Priya Chetty writes frequently about advertising, media, marketing and finance. In addition to posting daily to Project Guru Knowledge Tank, she is currently in the editorial board of Research & Analysis wing of Project Guru. She emphasizes more on refined content for Project Guru's various paid services. She has also reviewed about various insights of the social insider by writing articles about what social media means for the media and marketing industries. She has also worked in outdoor media agencies like MPG and hotel marketing companies like CarePlus.

Related articles

  • Building univariate ARIMA model for time series analysis in STATA Autoregressive Integrated Moving Average (ARIMA) is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X1 X2….XN regressor (like […]
  • Johansen cointegration test in VAR with three variables The previous article showed lag selection and stationarity for Vector Auto Regression (VAR) with three variables; Gross Domestic Product (GDP), Gross Fixed Capital Formation (GFC) and Private Final Consumption (PFC). This article shows the co-integration test for VAR with three variables.
  • Granger causality test in STATA Applying Granger causality test in addition to cointegration test like Vector Autoregression (VAR) helps detect the direction of causality. It also helps to identify which variable acts as a determining factor for another variable. This article shows how to apply Granger causality test in STATA.
  • ARCH model for time series analysis in STATA The previous article showed how to initiate the AutoRegressive Conditional Heteroskedasticity (ARCH) model on a financial stock return time series for period 1990 to 2016. It showed results for stationarity, volatility, normality and autocorrelation on a differenced log of stock returns.
  • VECM in STATA for two cointegrating equations Unrestricted Vector Auto Regression (VAR) is not applicable in such cases. Vector Error Correction Model (VECM) is a special case of VAR which takes into account the cointegrating relations among the variables.
Discussions

7 Comments.

  1. All the discussions you have posted on time series analysis in STATA are excellent, brief and applicable. Thank You Very Much.

  2. Hello,

    Thank you for this information. I have found the ARIMA model for the time series I will be analyzing. I am not sure, however, how to perform the interrupted time series analysis in stata using these ARIMA models. I am not sure what commands I would use to perform this analysis in STATA with the ARIMA models.

    Thank you.

  3. Dear Colleen,

    Thanks for sharing your concern!
    For interrupted time series analysis (ITSA), it’s better to use OLS over ARIMA, as the former is more flexible and broadly applicable in an interrupted time-series context.

    The syntax (code) to use for itsa is:

    itsa depvar [indepvars] [if] [in] [weight], trperiod(numlist) [ single treatid(#) contid(numlist) prais lag(#) figure posttrend replace prefix(string) model_options ]

    Note, this code is data specific but I have presented the full form of the common codes below:

    trperiod(numlist) specifies the time period when the intervention begins. The values entered for time period must be in the same units as the panel time variable specified in tsset timevar; see [TS] tsset. More than one period may be specified.
    trperiod() is required.

    single indicates that itsa will be used for a single group analysis. Conversely, omitting single indicates that itsa is for a multiple group comparison.

    treatid(#) specifies the identifier of the single treated unit under study when the dataset contains multiple panels. The value entered must be in the same units as the panel variable specified in tsset panelvar timevar; see [TS] tsset. When the dataset contains data for only a single panel, treatid() must be omitted.

    contid(numlist) specifies a list of identifiers to be used as control units in the multiple group analysis. The values entered must be in the same units as the panel variable specified in tsset panelvar timevar; see [TS] tsset. If contid() is not specified, all non-treated units in the data will be used as controls.

    prais specifies that a prais model should be estimated. If prais is not specified, itsa will use newey as the default model.

    lag(#) specifies the maximum lag to be considered in the autocorrelation structure when a newey model is chosen. If the user specifies lag(0), the output is the same as regress, vce(robust); Default is lag(0). An error message will appear if both prais and lag() are specified, as prais implements an AR(1) model, by design.

    figure produces a line plot of the predicted depvar variable combined with a scatter plot of the actual values of depvar over time. In a multiple group analysis, figure plots the average values of all controls used in the analysis (more specifically, data for specified controls are collapsed and the monthly observations averaged).

    posttrend produces post-treatment trend estimates using lincom, for the specified model. In the case of a single-group ITSA, one estimate is produced. In the case of a multiple-group ITSA, an estimate is produced for the treatment group, the control
    group, and the difference. In the case of multiple treatment periods, a separate table is produced for each treatment period.
    replace replaces variables created by itsa if they already exist. If prefix() is specified, only variables created by itsa with the same prefix will be replaced.

    prefix(string) adds a prefix to the names of variables created by itsa. Short prefixes are recommended.

    model_options specifies all available options for prais when the prais option is chosen; otherwise all available options of newey other than lag().

    Hope the above notes help. Do let us know!

  4. This is super helpful!
    One question I have is in extracting the data from Stata in my do file?
    I can find the value of, eg. AR(1) in the returned e(b), but can’t find the standard error or confidence interval in any of the returned lists (although they’re displayed on the screen I’d like to store the values in a local macro).
    Is there a way to do this?
    Thanks!

    • Hello Simon,

      Thanks that you found the article helpful. I would like to share that you can also use commands for ARIMA modeling in STATA instead of using menus from drop down lists as shown in this article. That way you will be able to store all your commands in a do file and just run the commands next time to get all the results. The command you can use for ARIMA modeling is “arima depvar, arima (orders separated by commas)”. For example, in this case, the command will be “arima GDP, arima (9,2,1)” to get the results for ARIMA (9,2,1) model.

      Coming to your query, if you use e(b) command, it will return just the coefficient values as you pointed out because e(b) is used to produce a matrix of coefficients. That’s why it does not return the standard errors or the confidence intervals. So, instead of using e(b), you can use either of these methods to store your results in a text file or a word file.

      First, run the arima model. Then use these commands, “translate @Results (name of the text file).txt” followed by “type (name of the text file).txt”. It will save all the results displayed in the results window in the designated text file.

      Alternatively, first run the arima model. Then use “outreg2” command to export the output to a word file. This will return the standard errors within parentheses below the coefficient values. To use it, first you need to install the command by writing “ssc install outreg2” in the command window. Then use the command “outreg2 (name of the word file).doc” to export the output.

      Or you can just highlight the results (including std errors and confidence intervals) you need to store, copy them and paste them in an excel file, create a table and then export them to a word file.

      Hope this helps. Let us know.

  5. hello ,
    The above part is very helpful.Please do let me know how to perform ARIMAX in Stata and how to do the prediction part in ARIMAX and how many X variables can be used in ARIMAX ?

Discuss

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.