# Why conduct a multicollinearity test in econometrics?

A multicollinearity test helps to diagnose the presence of multicollinearity in a model. Multicollinearity refers to a state wherein there exists inter-association or inter-relation between two or more independent variables. Presence of multicollinearity in a dataset is problematic because of four reasons:

1. It causes increased variability in the dataset.
2. It causes the dataset to be extremely sensitive to minor changes.
3. It causes instability in the regression model.
4. It leads to skewed and unreliable results.

For example, a study aims to determine which factors influence customer loyalty. It found four possible factors: customer satisfaction, product quality, service quality, and brand awareness. Thus they are the independent variables. However, the study also found that customer satisfaction is correlated with product quality and service quality. Thus, this existence of interlinkage between independent variables signifies the presence of multicollinearity in the model.

## When does multicollinearity arise?

The problem of multicollinearity arises mainly due to two reasons i.e.

• Poorly collected or manipulated data; or
• Structural problems like the inclusion of variable computed using other independent variables, repetition of similar variable, or dummy variable inaccurate use.

## Different tests for examining the presence of multicollinearity

There are different ways to detect whether multicollinearity is present in a model or not. The below table provides a list of tests, explaining the applicability of each.

Table 1: Multicollinearity tests

Among all these tests, Pearson’s coefficient and VIF are the most used tests for examining the presence of multicollinearity. SPSS, Stata, and R are software that can be used for computation.

Continuing the example stated above, the presence of multicollinearity is examined in the model stating that customer loyalty is affected by customer satisfaction, product quality, service quality, and brand awareness. The analysis was done using SPSS software.

## Multicollinearity test via Pearson’s correlation coefficient

The value of the Pearson correlation coefficient for all the independent variables was computed. The correlation matrix is shown in the below table.

Table 2: Correlation Matrix (SPSS results)

** Correlation is significant at the 0.01 level (2-tailed).

* Correlation is significant at the 0.05 level (2-tailed).

Above table shows that the coefficient value for the linkage between customer satisfaction, product quality, and service quality is greater than 0.5. Thus, there is a presence of multicollinearity in the model.

## Multicollinearity test via Variance Inflation Factor (VIF)

Step 1: Import data in SPSS.

Step 2: Select Analyze>Regression>Linear

The below-shown dialogue box will appear.

Step 3: Select ‘Statistics’ and then click on ‘Collinearity Diagnostics’. Select ‘Continue’.

Step 4: Categorize the variables into ‘Dependent’ and ‘Independent’ variables and then select ‘OK’.

Below shown VIF and collinearity diagnostic table will appear.

Table 3: VIF results from collinearity statistics

The above table shows that the value of VIF is higher for customer satisfaction (8.599>5), product quality (12.008 > 5), and service quality (5.099 > 5) while it is low for brand awareness (1.344 < 5). Thus, multicollinearity is present in the model. The variables customer satisfaction, product quality and service quality are inter-related.

## Multicollinearity test via P-value

Regression analysis reveals the significance value for each independent variable in the model. The procedure of regression analysis is explained here. Results of the analysis are shown in the below table.

Table 4: P-value results

Above table shows that the coefficient value matches with the theoretical linkage between the dependent (customer loyalty) and independent (customer satisfaction, product quality, service quality, and brand awareness) variables i.e. positive relationship. Despite this strong positive relationship between the independent and dependent variables, the p-value of customer satisfaction (0.148), product quality (0.727), and service quality (0.526) is insignificant i.e. greater than 0.05. Thus, multicollinearity might exist in the model.

## Multicollinearity test via ANOVA

The regression analysis procedure is shown here. Results of the ANOVA are represented in the below table.

Table 5: ANOVA results

Results shown in the above depicts that the model is jointly significant i.e. the significance of the F-value is 0.000 which is less than the significance level of the study 0.05. Even the F-value is greater than 1 i.e. 27.328 > 1, representing that the inclusion of the independent variables in the model has improved the prediction of the customer loyalty value. Thus, the overall model is appropriate and there is a possibility of multicollinearity in the model.

## How to overcome the problem of Multicollinearity?

In order to remove the problem of multicollinearity from the model, it is recommended to remove the highly correlated independent variable from the model. Also, perform an analysis with highly correlated independent variables i.e. partial least square regression or principal component analysis.

In the above example, as all tests show that there is high inter-relation with product quality i.e.

• High Pearson correlation value (with customer satisfaction – 0.934, and with service quality – 0.896).
• High VIF (i.e. 12.008).
• High p-value (i.e. 0.727).

Thus, in order to remove multicollinearity, product quality will be removed from the model. The new analysis would be performed using customer satisfaction, service quality, and brand awareness as independent variables.