How to graphically test normality?

By Priya Chetty and Riya Jain on March 12, 2020
Photo by Burak K from Pexels

Normality test identifies the nature of the distribution of data of a given variable. Normal distribution of data refers to the closeness of every observation in a dataset to its mean score. This means that the normality test should ideally be conducted before undertaking a time series regression analysis. Mean of normally distributed data forms a bell-shaped curve and is symmetrical in nature. A normally distributed variable has mean, median, and mode value same whereas the standard deviation value is zero.

For example, suppose a dataset containing height, blood pressure, and IQ scores of 100 fourteen-year-olds is normally distributed. The graph below shows that the mean of the samples is 1.5 meters and the standard deviation is 0.07 meters.

Purpose of normality test

The assumption of normality is not only relevant for determining the nature of distribution but also helps in performing various statistical tests such as regression. Some of the tests like t-test, one way and two-way ANOVA need a normally distributed sample for analysis. Without the dataset being normally distributed, the results derived from the analysis would be poor in nature. Thus, it is essential to test the normality of the variable before any other inferential tests. A previous article discussed in detail the process of testing normality in STATA. This article explains how to test normality graphically with the SPSS software.

Methods of testing normality graphically

Broadly there are two categories based on which the normality test could be performed i.e. graphical and statistical.

Graphical test for normality is a visual method of deducing information from the graph of the data. The graphical method does not provide accurate results and is just based on the author’s judgement. Thus, this method is unreliable and does not guarantee the existence of normal distribution for a variable. Furthermore, this method is also not suitable for a large sample size. Normality of data can be graphically tested in a number of ways. They are explained in the below table.

*Presence of Symmetry in a distribution is often considered as a substitute for Normality.

Visual tests of normality can be done in SPSS, STATA and to some extent E-Views software.

Case example of testing normality graphically

The graphical normality test of FDI inflows in India from 1994 to 2015 was conducted using SPSS software. There are several normality tests in this software. The following tests are represented here:

• Histogram
• Stem-and-leaf plot
• Normal Q-Q plot
• De-trended Normal Q-Q Plot
• Boxplot
• Normal P-P Plot

Histogram

The analysis of normality via a histogram shows that there exists a gap in the data. The dataset is not normally distributed as a bell-shaped curve cannot be formed when plotted in a graph. The frequency distribution of FDI inflows is not symmetrical instead it is positively skewed. Thus, the FDI inflows of India are not normally distributed for the time period 1994-2015.

Steam-and-Leaf Plot

Frequency       Stem  Leaf

12.00        0 .  022223335557
.00        1 .
5.00        2 .  03578
3.00        3 .  456
2.00        4 .  34

Stem width: 1.0E+010
Each leaf: 1 case(s)

Steam-and-Leaf results (SPSS) are similar to histogram results i.e. gap exists in the dataset and the data is positively skewed. This test too shows that the dataset is not normally distributed.

Normal Q-Q plot

Results of the Q-Q plot show that FDI inflows are away from the normally distributed diagonal line. Hence, the normal distribution is not present in the dataset.

De-trended normal Q-Q plot

Difference between FDI and the expected normal distribution value in the above graph is not close to zero, thus Indian FDI inflows data is not normally distributed for the time period 1994-2015.

Boxplot

For a period of 1994-2015, Indian FDI inflows are not symmetric as the value of mean and median is not at the center of the inter-quartile range box. Thus, the data is not normally distributed.