# Why is it important to test heteroskedasticity in a dataset?

By Riya Jain & Priya Chetty on March 23, 2020

Heteroskedasticity refers to the state of systematic changes in the spread of residuals or the error term of the model. The presence of residual variance in a model shows that the scattering of the model is dependent on at least one independent variable. This adds business to the model and hence creates a scenario of deviation of the model from effective and actual results.

For example, a study aims to identify the factors which lead to emotional exhaustion in an organization. Job control, work pressure, and concentration requirements are some of the main factors which affect the emotional state of an employee. Thus they are the independent variables. Here as job control plays the dominant role, its effect buffers out the work pressure effect on the emotional exhaustion. This diminished work pressure effect due to the high value of job control shows that as the result of the model is highly influenced by a single variable i.e. job control, thus there is the presence of heteroskedasticity in the model.

## Why testing for heteroskedasticity is essential?

Heteroskedasticity in a model can be present due to any of the below reasons:

• Existence of outliers in the dataset.
• Collection of data from different scales.
• Not specification of the model correctly.
• Usage of an incorrect transformation method to represent the model.

Each of the above-stated cases can cause a variation in the results from an efficient outcome. Thus, the presence of heteroskedasticity in the model leads to a violation of the assumption of the ordinary least square (OLS) regression and tends to provide biased results. Moreover, it renders the results of t or F unreliable.

## What are the different tests for examining the presence of heteroskedasticity in the model?

Heteroskedasticity in a model can be seen through two different forms of testing i.e. graphical or visual, and statistical. Specifications about each of the tests are shown in the below table.

Table 1: Tests for detecting Heteroskedasticity

Among all these tests, Scatter plot, Barlett, Levene’s, Breusch-Pagan, Cook-Weisberg and White test are the most used Heteroskedasticity tests. SPSS, Stata, and R are the software that supports these tests (except Barlett test in SPSS).  However, in the case of regression analysis in SPSS, scatter plot and F-test are the most used method for heteroskedasticity tests.

## Case example

In order to assess the presence of heteroskedasticity in the model stating the impact of job control, work pressure, and control requirements on the Emotional exhaustion level; initially the normality of the dataset is tested. The testing for normality and heteroskedasticity is done using SPSS.

Statistical tests of normality are discussed here. The results of the Shapiro-Wilk test are shown below.

Table 2: Normality results (SPSS results)

The above table shows that as the significance value for each of the variables is less than the significance level of 0.05, the null hypothesis of the normally distributed datasets is not rejected. This shows that all the variables are normally distributed.

### Heteroskedasticity test via scatter plot of residuals

Linear regression analysis was performed for the variables. Next, the scatter plot of the residuals was generated by following the below steps.

Step 1: Select Analyze>Regression>Linear. Below dialog box will appear.

Step 2: Allocate the variables independent and independent form and then click on ‘Plots’. A below-shown dialog box will appear.

Step 3: Allocate ZPRED value as ‘X’ variable and ZRESID value as ‘Y’ variable. Select a standardized residual plot in the form of ‘Normal probability plot’ and then click on ‘Continue’.

Step 4: Click on ‘Ok’. Below shown scatterplot will be generated as output.

The above figure shows that the residual values are spread. Although mostly the values are concentrated close to 0 and 1, there is no consistency in the spread of residuals. Hence there is a presence of heteroskedasticity in the model.

### Heteroskedasticity test via F-test

The procedure of regression analysis in SPSS is discussed here. The results shown in the ANOVA table are represented below.

Table 3: F-test results

The above table shows that the significance value for the F-test is 0.000 which is less than the significance level of the study i.e. 0.05. Thus, the null hypothesis of equal variance is rejected. Hence, there is a presence of heteroskedasticity in the model.

## How to overcome the problem of heteroskedasticity?

In order to remove the problem of heteroskedasticity from the model, it is often recommended to follow any of the below-stated methods:

• Regression analysis using the robust standard method.
• Generalized least square method-based regression analysis.
• Weighted least square method-based regression analysis.

Other than these methods, the log-based transformation of the dataset is also applicable as log transformation helps in reducing the effect of errors in the model. Hence, following any of the above-stated methods homoscedasticity can be achieved and accurate and reliable results with minimum variability in the model could be derived.

#### References

I am a management graduate with specialisation in Marketing and Finance. I have over 12 years' experience in research and analysis. This includes fundamental and applied research in the domains of management and social sciences. I am well versed with academic research principles. Over the years i have developed a mastery in different types of data analysis on different applications like SPSS, Amos, and NVIVO. My expertise lies in inferring the findings and creating actionable strategies based on them.

Over the past decade I have also built a profile as a researcher on Project Guru's Knowledge Tank division. I have penned over 200 articles that have earned me 400+ citations so far. My Google Scholar profile can be accessed here

I now consult university faculty through Faculty Development Programs (FDPs) on the latest developments in the field of research. I also guide individual researchers on how they can commercialise their inventions or research findings. Other developments im actively involved in at Project Guru include strengthening the "Publish" division as a bridge between industry and academia by bringing together experienced research persons, learners, and practitioners to collaboratively work on a common goal.

I am a Senior Analyst at Project Guru, a research and analytics firm based in Gurugram since 2012. I hold a master’s degree in economics from Amity University (2019). Over 4 years, I have worked on worked on various research projects using a range of research tools like SPSS, STATA, VOSViewer, Python, EVIEWS, and NVIVO. My core strength lies in data analysis related to Economics, Accounting, and Financial Management fields.

4 thoughts on “Why is it important to test heteroskedasticity in a dataset?”