How to build the univariate ARIMA model for time series in STATA?

By Divya Narang & Priya Chetty on February 6, 2018

Autoregressive Integrated Moving Average (ARIMA) is popularly known as the Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X1 X2….XN regressor (like the introductory case where GDP is explained by GFC and PFC), ARIMA allows Y (GDP) to be explained by its own past or lagged values. ARIMA is performed on a single time series. Therefore it is termed as ‘univariate ARIMA’. In cases where ARIMA analysis includes independent variables (like GFC or PFC), then multivariate ARIMA model or ARIMAX models are suitable. This article focuses on the functioning of the univariate ARIMA model taking single time series GDP.

ARIMA is made up of AR, MA and I where:

  • AR: variables regressed on their own lagged or prior values
  • MA:  regression error representing the linear combination of error terms of repeated values
  • I: indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once).

The purpose of each of these features is to make the model fit the data as well as, possible. This article will list out the procedures for assessing the values of AR, MA and I to build an ARIMA model for time series GDP. As the stationarity of GDP has already been covered in the previous articles, it can be stated that the value of I can be either 1 (1st differencing was stationary) or 2 (2nd differencing was stationary).  Furthermore, to explore the values of AR and MA, this article will introduce the terms ‘autocorrelation’ and ‘partial autocorrelation’.

Correlogram (ac)

Correlograms are simply plots for extracting the autocorrelation in a particular time series. Autocorrelation is the presence of a series correlation in a time series data set. It implies that the time series (like GDP) can serially correlate with its own prior values. ‘Acf’ is an autocorrelation function plot to list out the autocorrelation of a particular time series with its various lags. If the time series administers the presence of auto-correlation, then Moving Averages (MA) are applicable for further analysis. Thus the value of MA will come through ACF plots. To construct ACF plots in STATA refer to Fig 1 below:

  1. Click on ‘graphics’
  2. Click on ‘time series graphs’
  3. Select ‘correlogram (ac)’ 
Figure 1: STATA path for correlogram plots
Figure 1: STATA path for correlogram plots

A dialogue box as shown in the figure below will appear. Select the time series variable ‘GDP’. Stationarity and different time series of GDP as established in the previous article. Therefore consider different time series of GDP in this case. Also, two differences in GDP were taken. Therefore review the case of both the differencing series to build the ARIMA model.

1st Differenced GDP

In the dialogue box for correlogram (ac), select 1st differenced GDP variable that is ‘gdp_d1’. Click on ‘OK’ to generate acfs graph for variable ‘gdp_d1’ (figure below).

Figure 2: Dialogue box for autocorrelation (acf) graphs
Figure 2: Dialogue box for autocorrelation (acf) graphs

A correlogram visualizing the different autocorrelation of 1st difference of GDP (gdp_d1) at different lags will appear. Paste the detailed version of the correlogram as shown in the figure below. To determine autocorrelation, see which of all the lines are coming out of the shaded region. The shaded region indicates the acceptance region and the lines indicate different lags. Since for the first six lags, the lines are coming out of the shaded region, the series ‘gdp_d1’ is autocorrelated with its lagged series at lags 1, 2, 3, 4, 5 and 6. Therefore, the MA value of the ARIMA model can take a value from 1 to 6*.

Figure 3: acf graph for first differenced GDP
Figure 3: acf graph for 1st differenced GDP for ARIMA in STATA

2nd Differenced GDP

Similarly, for the 2nd difference GDP, select variable ‘gdp_d2’ (2nd differenced variable) as shown in figure 2, and create an ACF plot for it. A correlogram visualizing the different autocorrelation of 2nd difference of GDP (gdp_d2) at different lags will appear (figure below). Paste the detailed version of the correlogram (figure below). To determine autocorrelation, see which of all the lines are coming out of the shaded region. Since only for the first lag, the lines are coming out of the shaded region (acceptance region), the series ‘gdp_d2’ is autocorrelated with its lagged series at lags 1. Therefore, the MA value of the ARIMA model of series gdp_d2 can take the value from 1*.

Figure 4: acf graph for 2nd differenced GDP
Figure 4: ACF graph for 2nd differenced GDP for ARIMA in STATA

Now there are different values of MA for all the different values of I. Therefore now estimate the values of AR to build the ARIMA model.

Get your research paper ready in 5 days. Use 634F71531FFB4 to get a discount of 3000 on 2001 - 3000 words emergency order.
Order now

Partial correlogram (PAC)

A partial correlogram is simply a plot for extracting the partial autocorrelation in the selected time series. If the time series administers the presence of partial auto-correlation, then take AR for further analysis. Thus the value of AR will come through pacf plot. To construct pacf plots follow:

  1. Click on ‘graphics’.
  2. Click on ‘Time series graphs’.
  3. Select ‘partial correlogram (PAC)’.
Figure 5: STATA path for partial correlogram (pacf) plots
Figure 5: STATA path for partial correlogram (PAC) plots

A dialogue box as shown in the figure below will appear. Here select the time series variable, ‘GDP’. Since stationarity was established and differenced time series of GDP was taken, consider differenced time series of GDP in this case. Now review the case of both the differencing series to build the ARIMA model.

1st differenced GDP

In the dialogue box for ‘partial correlogram (PAC)’, select 1st differenced GDP variable ‘gdp_d1’. Click on ‘OK’ to generate pacfs graph for variable ‘gdp_d1’.

Figure 6: Dialogue box for partial autocorrelation graphs
Figure 6: Dialogue box for partial autocorrelation graphs

A partial correlogram visualizing the different partial autocorrelation of 1st difference of GDP (gdp_d1) at different lags will appear. To determine autocorrelation, see which of all the lines are coming out of the shaded region. Since only for the first lag, the lines are coming out of the acceptance region, the series ‘gdp_d1’ partially auto correlates with its lagged series at lags 1. Therefore, the AR value of the ARIMA model can take the value from 1*.

Figure 7: pacf for 1st differenced GDP in STATA
Figure 7: pacf for 1st differenced GDP for ARIMA in STATA

2nd differenced GDP

Similarly, for 2nd difference GDP, select variable ‘gdp_d2’ (2nd differenced variable) as shown in figure 6, and create pacf for ‘gdp_d2’. Only for the first and four lags (a slight difference), the lines are slightly coming out of the shaded region (Fig 8). Therefore the series ‘gdp_d2’ partially auto correlates with its prior values at lags 1 and 4. Therefore, the AR value of the ARIMA model of series ‘gdp_d2’ can take the value from 1*.

Figure 8: pacf for 2nd differenced GDP in STATA
Figure 8: pacf for 2nd differenced GDP using ARIMA in STATA

Therefore, following ACF and PCF graphs through correlogram, establish different values of AR and MA, based on two values of I. Therefore, using the above values, one can frame the possible ARIMA model. Below is the table for possible ARIMA models.

For 1st order differenced GDP Time Series/I = 1

S. NoARIMAARIMA
1111(1,1,1).
2112(1,1,2).
3113(1,1,3).
4114(1,1,4).
5115(1,1,5).
6116(1,1,6).

For 2nd order differenced GDP Time Series/I = 2

S. NoARIMAARIMA
1121(1,2,1).
2521(4,2,1).
3921(9,2,1).
Use 5E25A5EE63214 to save 5000 on 15001 - 20000 words standard order of literature survey.
Order now

Therefore, all possible ARIMA models for the time series GDP are:

S. NoARIMA
1(1,1,1).
2(1,1,2).
3(1,1,3).
4(1,1,4).
5(1,1,5).
6(1,1,6).
7(1,2,1).
8(4,2,1).
9(9,2,1).

Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

  • Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
  • Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
  • Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).

 

Discuss

7 thoughts on “How to build the univariate ARIMA model for time series in STATA?”