# Why is it important to test heteroskedasticity in a dataset?

By Riya Jain & Priya Chetty on March 23, 2020

Heteroskedasticity refers to the state of systematic changes in the spread of residuals or the error term of the model. The presence of residual variance in a model shows that the scattering of the model is dependent on at least one independent variable. This adds business to the model and hence creates a scenario of deviation of the model from effective and actual results.

For example, a study aims to identify the factors which lead to emotional exhaustion in an organization. Job control, work pressure, and concentration requirements are some of the main factors which affect the emotional state of an employee. Thus they are the independent variables. Here as job control plays the dominant role, its effect buffers out the work pressure effect on the emotional exhaustion. This diminished work pressure effect due to the high value of job control shows that as the result of the model is highly influenced by a single variable i.e. job control, thus there is the presence of heteroskedasticity in the model.

## Why testing for heteroskedasticity is essential?

Heteroskedasticity in a model can be present due to any of the below reasons:

• Existence of outliers in the dataset.
• Collection of data from different scales.
• Not specification of the model correctly.
• Usage of an incorrect transformation method to represent the model.

Each of the above-stated cases can cause a variation in the results from an efficient outcome. Thus, the presence of heteroskedasticity in the model leads to a violation of the assumption of the ordinary least square (OLS) regression and tends to provide biased results. Moreover, it renders the results of t or F unreliable.

## What are the different tests for examining the presence of heteroskedasticity in the model?

Heteroskedasticity in a model can be seen through two different forms of testing i.e. graphical or visual, and statistical. Specifications about each of the tests are shown in the below table.

Table 1: Tests for detecting Heteroskedasticity

Among all these tests, Scatter plot, Barlett, Levene’s, Breusch-Pagan, Cook-Weisberg and White test are the most used Heteroskedasticity tests. SPSS, Stata, and R are the software that supports these tests (except Barlett test in SPSS).  However, in the case of regression analysis in SPSS, scatter plot and F-test are the most used method for heteroskedasticity tests.

## Case example

In order to assess the presence of heteroskedasticity in the model stating the impact of job control, work pressure, and control requirements on the Emotional exhaustion level; initially the normality of the dataset is tested. The testing for normality and heteroskedasticity is done using SPSS.

Statistical tests of normality are discussed here. The results of the Shapiro-Wilk test are shown below.

Table 2: Normality results (SPSS results)

The above table shows that as the significance value for each of the variables is less than the significance level of 0.05, the null hypothesis of the normally distributed datasets is not rejected. This shows that all the variables are normally distributed.

### Heteroskedasticity test via scatter plot of residuals

Linear regression analysis was performed for the variables. Next, the scatter plot of the residuals was generated by following the below steps.

Step 1: Select Analyze>Regression>Linear. Below dialog box will appear.

Step 2: Allocate the variables independent and independent form and then click on ‘Plots’. A below-shown dialog box will appear.

Step 3: Allocate ZPRED value as ‘X’ variable and ZRESID value as ‘Y’ variable. Select a standardized residual plot in the form of ‘Normal probability plot’ and then click on ‘Continue’.

Step 4: Click on ‘Ok’. Below shown scatterplot will be generated as output.

The above figure shows that the residual values are spread. Although mostly the values are concentrated close to 0 and 1, there is no consistency in the spread of residuals. Hence there is a presence of heteroskedasticity in the model.

### Heteroskedasticity test via F-test

The procedure of regression analysis in SPSS is discussed here. The results shown in the ANOVA table are represented below.

Table 3: F-test results

The above table shows that the significance value for the F-test is 0.000 which is less than the significance level of the study i.e. 0.05. Thus, the null hypothesis of equal variance is rejected. Hence, there is a presence of heteroskedasticity in the model.

## How to overcome the problem of heteroskedasticity?

In order to remove the problem of heteroskedasticity from the model, it is often recommended to follow any of the below-stated methods:

• Regression analysis using the robust standard method.
• Generalized least square method-based regression analysis.
• Weighted least square method-based regression analysis.

Other than these methods, the log-based transformation of the dataset is also applicable as log transformation helps in reducing the effect of errors in the model. Hence, following any of the above-stated methods homoscedasticity can be achieved and accurate and reliable results with minimum variability in the model could be derived.