How to graphically test normality?

By Riya Jain & Priya Chetty on March 12, 2020
Photo by Burak K from Pexels

Normality test identifies the nature of the distribution of data of a given variable. Normal distribution of data refers to the closeness of every observation in a dataset to its mean score. This means that the normality test should ideally be conducted before undertaking a time series regression analysis. Mean of normally distributed data forms a bell-shaped curve and is symmetrical in nature. A normally distributed variable has mean, median, and mode value same whereas the standard deviation value is zero.

For example, suppose a dataset containing height, blood pressure, and IQ scores of 100 fourteen-year-olds is normally distributed. The graph below shows that the mean of the samples is 1.5 meters and the standard deviation is 0.07 meters.

Illustration of normally distributed data
Figure 1: Illustration of normally distributed data (Frost, 2019)

Purpose of normality test

The assumption of normality is not only relevant for determining the nature of distribution but also helps in performing various statistical tests such as regression. Some of the tests like t-test, one way and two-way ANOVA need a normally distributed sample for analysis. Without the dataset being normally distributed, the results derived from the analysis would be poor in nature. Thus, it is essential to test the normality of the variable before any other inferential tests. A previous article discussed in detail the process of testing normality in STATA. This article explains how to test normality graphically with the SPSS software.

Methods of testing normality graphically

Broadly there are two categories based on which the normality test could be performed i.e. graphical and statistical.

Graphical test for normality is a visual method of deducing information from the graph of the data. The graphical method does not provide accurate results and is just based on the author’s judgement. Thus, this method is unreliable and does not guarantee the existence of normal distribution for a variable. Furthermore, this method is also not suitable for a large sample size. Normality of data can be graphically tested in a number of ways. They are explained in the below table.

Graphs Purpose Benefit Normality condition
Quantile-Quantile (QQ) Plot Compare quantile of data with quantile of normal distribution line. Magnify deviations from the tail.
Suits for Large Sample Size.
The observed value should fall on the expected normal distribution Line.
Probability-Probability (PP) plot Compare cumulative probability of a variable with a normally distributed cumulative probability. Identify Outliers, Skewness, and Kurtosis.
Magnify deviation from the centre distribution.
Suitable for small sample size.
The observed value should fall on the expected normal distribution Line.
Boxplot Test presence of symmetry* in data and not Normality. Identify symmetry, scatteredness, and outliers.
Focus on the medium and interquartile range only.
Interquartile range box will be symmetrical with mean and median at the centre.
Detrended Probability Plot Analyze the deviation of the variable data from the expected normally distributed data. Suitable for large sample size. The values should cluster horizontally near-zero value.
Histogram Check the formation of the bell-shaped curve via the frequency distribution of the observed values. Simple Procedure Better Visual Representation Identify gaps and outliers.
Depict presence of skewness and symmetry in data.
The frequency distribution should be bell-shaped.
Steam-and-Leaf Plot Original data representation for checking the bell shape presence in the frequency distribution. Represent actual data values.
Identify outliers, gaps, symmetry, skewness, mean, highest value, lowest value, and the median value.
Frequency is forming a bell-shaped curve.

*Presence of Symmetry in a distribution is often considered as a substitute for Normality.

Visual tests of normality can be done in SPSS, STATA and to some extent E-Views software.

Case example of testing normality graphically

The graphical normality test of FDI inflows in India from 1994 to 2015 was conducted using SPSS software. There are several normality tests in this software. The following tests are represented here:

  • Histogram
  • Stem-and-leaf plot
  • Normal Q-Q plot
  • De-trended Normal Q-Q Plot
  • Boxplot
  • Normal P-P Plot

Histogram

Figure 2: Histogram test of normality in SPSS

The analysis of normality via a histogram shows that there exists a gap in the data. The dataset is not normally distributed as a bell-shaped curve cannot be formed when plotted in a graph. The frequency distribution of FDI inflows is not symmetrical instead it is positively skewed. Thus, the FDI inflows of India are not normally distributed for the time period 1994-2015.

Steam-and-Leaf Plot

Frequency       Stem  Leaf

12.00        0 .  022223335557
  .00        1 .
 5.00        2 .  03578
 3.00        3 .  456
 2.00        4 .  34

Stem width: 1.0E+010
Each leaf: 1 case(s)

Steam-and-Leaf results (SPSS) are similar to histogram results i.e. gap exists in the dataset and the data is positively skewed. This test too shows that the dataset is not normally distributed.

Normal Q-Q plot

Normal Q-Q plot of normality in SPSS
Figure 3: Normal Q-Q plot of normality in SPSS

Results of the Q-Q plot show that FDI inflows are away from the normally distributed diagonal line. Hence, the normal distribution is not present in the dataset.

De-trended normal Q-Q plot

Detrended normal Q-Q plot of normality in SPSS
Figure 4: Detrended normal Q-Q plot of normality in SPSS

Difference between FDI and the expected normal distribution value in the above graph is not close to zero, thus Indian FDI inflows data is not normally distributed for the time period 1994-2015.

Boxplot

Boxplot normality test in SPSS
Figure 5: Boxplot normality test in SPSS

For a period of 1994-2015, Indian FDI inflows are not symmetric as the value of mean and median is not at the center of the inter-quartile range box. Thus, the data is not normally distributed.

Normal P-P Plot

Normal P-P plot in SPSS
Figure 6: Normal P-P plot in SPSS

Cumulative probability distribution of FDI inflows of India for 1994-2015 shows that the values of data are away from the normal distribution diagonal line. Thus, FDI Inflows are not normally distributed.

Need for statistically testing normality

Graphical method of testing normality is suitable for initial judgment but is not reliable. Visually it can be identified that the bell-shaped curve is formed or observed value is close to the normally distributed line, but the actual results cannot be generated. Therefore, it is important to conduct the empirical test of normality using statistical software. The next article focuses on the process of conducting statistical tests of normality.

References

Discuss