How to test normality in STATA?

By Rashmi Sajwan & Priya Chetty on October 31, 2018

The preceding articles showed how to conduct time series analysis in STATA on a range of univariate and multivariate models including ARIMA, VAR (Lag selection, and stationarity in VAR with three variables in STATA) and VECM (VECM in STATA for two cointegrating equations). Time series data requires some diagnostic tests in order to check the properties of the independent variables. This is called ‘normality’. This article explains how to perform a normality test in STATA.

The normality test helps to determine how likely it is for a random variable underlying the data set to be normally distributed. There are several normality tests such as the Skewness Kurtosis test, the Jarque Bera test, the Shapiro Wilk test, the Kolmogorov-Smirnov test, and the Chen-Shapiro test. This article shows two tests; Skewness Kurtosis and Jarque Bera tests because they are simple and popular.

This article uses quarterly data of the following variables of the Indian economy for the time period 1997- 2018:

Gross Domestic Product (GDP) (dependent variable).
Gross Fixed Capital Formation (GFC) (independent variable).
Private Final Consumption Expenditure (PFC) (independent variable).

The first step is to set the time series data. Thereafter proceed to the regression analysis.

Regression analysis

The time series result will identify the residuals from the regression analysis.

Figure 2: Regression results of dataset — Figure 2: Regression results of the dataset

Next, use the below command in order to generate the residuals in the data set.

regress gdp gfcf pcfe

In order to predict the residuals from the regression model, use the below command.

predict resid, residuals

The command ‘predict’ helps in generating new variables (here it is resid or the error term in the model). This will create a new variable ‘resid’ in the data editor (figure below).

Figure 3: Changes in data set after predicting regression residuals for performing normality in STATA

Conducting a normality test in STATA

In order to generate the distribution plots of the residuals, follow these steps (figure below):

Go to the ‘Statistics’ on the main window
Choose ‘Distributional plots and tests’
Select ‘Skewness and kurtosis normality tests’.

Figure 4: Procedure for Skewness and Kurtosis test for normality in STATA

After performing the above procedure, ‘sktest – Skewness and kurtosis test for normality’ box will appear (figure below). Select the main variable to test for normality (here it is ‘resid’).

Figure 6: Selection of variable for Skewness and Kurtosis test for normality in STATA — Figure 5: Selection of variable for Skewness and Kurtosis test for normality in STATA

This will give the following results (figure below). The null and alternative hypotheses for the normality test are:

Null hypothesis: The data follows a normal distribution.
Alternative hypothesis: The data does not follow a normal distribution.

Skewness Kurtosis test for normality

Skewness is a measure of the asymmetry of the probability distribution of a random variable about its mean. It represents the amount and direction of skew. On the other hand, Kurtosis represents the height and sharpness of the central peak relative to that of a standard bell curve. The figure below shows the results obtained after performing the Skewness and Kurtosis test for normality in STATA.

Figure 6: Result of Skewness and Kurtosis Test for normality in STATA

‘sktest’ shows the number of observations (which is 84 here) and the probability of skewness which is 0.8035 implying that skewness is asymptotically normally distributed (p-value of skewness > 0.05). Similarly, Pr(Kurtosis) indicates that kurtosis is also asymptotically distributed (p-value of kurtosis > 0.05). Finally, chi(2) is 0.1426 which is greater than 0.05 implying its significance at a 5% level. Consequently, the null hypothesis cannot be rejected. Therefore, according to the Skewness test for normality, residuals show normal distribution.

Jarque Bera test for normality

The other test of normality is the Jarque Bera test. In order to perform this test, use the command ‘jb resid’ in the command prompt.

jb resid

The results will appear (figure below).

Figure 7: Results for Jarque Bera test for normality in STATA

If the p-value is lower than the Chi(2) value then the null hypothesis cannot be rejected. Therefore residuals are normality distributed.

As per the above figure, chi(2) is 0.1211 which is greater than 0.05. Therefore, the null hypothesis cannot be rejected. Moreover, there is no violation of the normal distribution assumption of error terms as the residuals are coming out to be normal.

Normality through histogram

A histogram plot also indicates the normality of residuals. A bell-shaped curve shows the normal distribution of the series. In order to generate the histogram plot, follow the below procedure.

Go to ‘Graphics’ in the main bar.
Select ‘histogram’.

The below figure will appear.

Figure 9: Procedure for generating histogram plot for checking normality in STATA

Then choose the main variable and choose ‘Density’ under the Y-axis section. Click on ‘OK’ (figure below).

Figure 10: Selecting variable in histogram for normality test in STATA — Figure 10: Selecting variable in a histogram for checking normality in STATA

The below window will appear. Click on ‘Add normal density plot’.

Figure 11: Procedure for generating histogram for checking normality in STATA — Figure 11: Procedure for generating a histogram for checking normality in STATA

Offer ID is invalid

Finally, click on ‘OK’ to generate the histogram plot showing the normality distribution of the residuals (figure below).

Figure 12: Histogram plot indicating normality in STATA

The figure above shows a bell-shaped distribution of the residuals. The X-axis shows the residuals, whereas Y-axis represents the density of the data set. Thus this histogram plot confirms the normality test results from the two tests in this article.

The next article discusses the tests for heteroscedasticity. Heteroscedasticity is a violation of an important ordinary least squares (OLS) assumption that all residuals belong to a population that has a constant variance (homoscedasticity).

Priya Chetty

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here.

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.

Regression analysis

Conducting a normality test in STATA

Skewness Kurtosis test for normality

Jarque Bera test for normality

Normality through histogram

Discuss

proofreading