# How to build the univariate ARIMA model for time series in STATA?

**Autoregressive Integrated Moving Average (ARIMA)** is popularly known as Box-Jenkins method. The emphasis of this method is on analyzing the probabilistic or stochastic properties of a single time series. Unlike regression models where Y is explained by X_{1 }X_{2}….X_{N }regressor (like the introductory case where GDP is explained by GFC and PFC), **ARIMA** allows Y (GDP) to be explained by its own past or lagged values. **ARIMA** is performed on a single time series. Therefore it is termed as ‘univariate** ARIMA’**. In case where **ARIMA** analysis includes independent variables (like GFC or PFC), then multivariate **ARIMA** model or ARIMAX models are suitable. This article focuses on the functioning of univariate **ARIMA** model taking single time series GDP.

**ARIMA** is made up of *AR*, * MA *and

**I**

**where:**

*AR*: variables regressed on own lagged or prior values: regression error representing the linear combination of error terms of repeated values**MA****I**: indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once).

The purpose of each of these feature is to make the model fit the data as well as, possible. This article will list out the procedures of assessing the values of *AR*, * MA* and

**I**to build an

**ARIMA**model for time series GDP. As the stationarity of GDP has already been covered in the previous articles, it can be stated that the value of

**can be either 1 (1**

__I__^{st}differencing was stationary) or 2 (2

^{nd}differencing was stationary). Furthermore, to explore the values of

*AR*and

*, this article will introduce the terms ‘autocorrelation’ and ‘partial autocorrelation’.*

**MA**## Correlogram (ac)

Correlogram are simply the plots for extracting the autocorrelation in a particular time series. Autocorrelation is the presence of series correlation in a time series data set. It implies that the time series (like GDP) can serially correlate with its own prior values. ‘Acf’ is autocorrelation function plot to list out autocorrelation of a particular time series with its various lags. If the time series administers the presence of auto-correlation, then ** Moving Averages (MA)** is applicable for further analysis. Thus the value of

*will come through acf plots. To construct acf plots in STATA refer Fig 1 below:*

**MA**- Click on ‘graphics’
- Click on ‘time series graphs’
- Select ‘correlogram (ac)’

A dialogue box as shown in the figure below will appear. Select the time series variable ‘GDP’. Stationarity and differenced time series of GDP as established in the previous article. Therefore consider differenced time series of GDP in this case. Also, two differencing of GDP was taken. Therefore review the case of both the differencing series to build the **ARIMA** model.

### 1^{st} Differenced GDP

In the dialogue box for correlogram (ac), select 1^{st} differenced GDP variable that is ‘gdp_d1’. Click on ‘OK’ to generate acfs graph for variable ‘gdp_d1’ (figure below).

A correlogram visualizing the different autocorrelation of 1^{st} difference of GDP (gdp_d1) at different lags will appear. Paste the detail version of correlogram as shown in the figure below. To determine autocorrelation, see which of all the lines are coming out of shaded region. The shaded region indicates the acceptance region and the lines indicate different lags. Since for the first six lag, the lines are coming out of shaded region, the series ‘gdp_d1’ is auto correlated with its lagged series at lags 1, 2, 3, 4, 5 and 6. Therefore, the** MA **value of

**ARIMA**model can take the value from 1 to 6*.

### 2nd Differenced GDP

Similarly, for 2^{nd} difference GDP, select variable ‘gdp_d2’ (2^{nd} differenced variable) as shown in figure 2, and create acf plot for it. A correlogram visualizing the different autocorrelation of 2^{st} difference of GDP (gdp_d2) at different lags will appear (figure below). Paste the detail version of correlogram (figure below). To determine autocorrelation, see which of all the lines are coming out of shaded region. Since only for first lag, the lines are coming out of shaded region (acceptance region), the series ‘gdp_d2’ is auto correlated with its lagged series at lags 1. Therefore, the ** MA** value of

**ARIMA**model of series gdp_d2 can take the value from 1*.

Now there are different values of **MA** for all the different values of ** I**. Therefore now estimate the values of

*AR*to build

**ARIMA**model.

## Partial correlogram (pac)

Partial correlogram is simply a plot for extracting the partial autocorrelation in the selected time series. If the time series administers the presence of partial auto-correlation, then take *AR* for further analysis. Thus the value of *AR* will come through pacf plot. To construct pacf plots follow:

- Click on ‘graphics’.
- Click on ‘Time series graphs’.
- Select ‘partial correlogram (pac)’.

A dialogue box as shown in the figure below will appear. Here select the time series variable, ‘GDP’. Since stationarity was established and differenced time series of GDP was taken, consider differenced time series of GDP in this case. Now review the case of both the differencing series to build **ARIMA **model.

### 1^{st} differenced GDP

In the dialogue box for ‘partial correlogram (pac)’, select 1^{st} differenced GDP variable ‘gdp_d1’. Click on ‘OK’ to generate pacfs graph for variable ‘gdp_d1’.

A partial correlogram visualizing the different partial autocorrelation of 1^{st} difference of GDP (gdp_d1) at different lags will appear. To determine autocorrelation, see which of all the lines are coming out of shaded region. Since only for the first lag, the lines are coming out of acceptance region, the series ‘gdp_d1’ partially auto correlates with its lagged series at lags 1. Therefore, the *AR* value of ARIMA model can take the value from 1*.

### 2^{nd} differenced GDP

Similarly, for 2^{nd} difference GDP, select variable ‘gdp_d2’ (2^{nd} differenced variable) as shown in figure 6, and create pacf for ‘gdp_d2’. Only for first and four lags (a slight difference), the lines are slightly coming out of shaded region (Fig 8). Therefore the series ‘gdp_d2’ partially auto correlates with its prior values at lags 1 and 4. Therefore, the *AR* value of **ARIMA** model of series ‘gdp_d2’ can take the value from 1*.

Therefore, following acf and pcf graphs through correlogram, establish different values of *AR *and ** MA**, based on two values of

**. Therefore, using the above values, one can frame the possible**

__I__**ARIMA**model. Below is the table for possible

**ARIMA**models.

For 1^{st} order differenced GDP Time Series/I = 1

S. No | AR | I | MA | ARIMA |

1 | 1 | 1 | 1 | (1,1,1). |

2 | 1 | 1 | 2 | (1,1,2). |

3 | 1 | 1 | 3 | (1,1,3). |

4 | 1 | 1 | 4 | (1,1,4). |

5 | 1 | 1 | 5 | (1,1,5). |

6 | 1 | 1 | 6 | (1,1,6). |

For 2^{nd} order differenced GDP Time Series/I = 2

S. No | AR | I | MA | ARIMA |

1 | 1 | 2 | 1 | (1,2,1). |

2 | 5 | 2 | 1 | (4,2,1). |

3 | 9 | 2 | 1 | (9,2,1). |

**5E25A5EE63214**to save

**5000**on

**15001 - 20000 words**standard order of

**literature survey**service.

^{}Therefore, all possible **ARIMA** model for the time series GDP are:

S. No | ARIMA |

1 | (1,1,1). |

2 | (1,1,2). |

3 | (1,1,3). |

4 | (1,1,4). |

5 | (1,1,5). |

6 | (1,1,6). |

7 | (1,2,1). |

8 | (4,2,1). |

9 | (9,2,1). |

- How skewed was healthcare access in India by the end of 2020? - July 27, 2021
- Python syntax to correctly handle string data type - July 23, 2021
- The growing use of social media networksamong teenagers in India - July 17, 2021

## Discuss