Application of multivariate regression analysis

By Riya Jain and Priya Chetty on June 9, 2021

Today businesses are more data-hungry than ever before. Data analysis like multivariate regression analysis help business executives to take meaningful decisions. Data analysis is the process of applying logical and statistical techniques to describe, visualize, and assess useful information from the raw data. Among the available different analysis methods, regression analysis is the most prevalent one as it helps in examining the relationship between two or more variables.

Regression analysis is about relationship building between variables. Simple linear regression and multiple linear regression analysis are commonly practised. These methods rely on the fact that there is one dependent or outcome variable and one or more independent or predictor variables. However, in the real world, there are many situations wherein, one variable does not represent the outcome. For those scenarios, it is essential to build a relationship with more than one dependent variables. Thus, focusing on this aspect, this article explores the concept of multivariate regression analysis along with discussing its assumptions and relevance.

Understanding multivariate regression analysis

Multivariate regression analysis is an extension of the simple regression model. With the inclusion of more than one outcome variable, this regression formulates the model with one or more predictor or independent variables and two or more outcome or dependent variables (UCLA, 2021). The model representing the multivariate regression analysis could be stated as (Helwig, 2017):

Multivariate regression analysis
Figure 1: Multivariate regression analysis

Where i ∈ {1, 2, …., n} and k ∈ {1, 2, …, m}

  • yik defines the response variable for ith observation
  • b0k is regression intercept for kth response
  • bjk is jth predictor variable slope for kth response
  • xij is jth predictor variable for ith observation
  • ejk is the error vector

Herein, the model is regarded as multivariate as the value of m is more than 1 i.e. the response variable or dependent variable is not just 1. The model formulated to consist of a linear function of parameters and response modelling with predictor variables. Thus, it is stated as a multivariate linear regression model (Helwig, 2017; Spanos & Hendry, 2011).

Furthermore, multivariate regression could be categorized into two groups i.e.

Categories of multivariate regression analysis
Figure 2: Categories of multivariate regression analysis

Herein, multivariate simple regression is the model with more than one response variable and just one predictor variable while multivariate multiple regression model is the analysis with more than one response and predictor variables (Spanos & Hendry, 2011).

For example, a researcher has collected data of 600 students to determine the impact of three psychological variables i.e. self-concept, motivation, with a focus on control on the standardized test scores of three streams science, art and commerce. Thus, the model built to determine this impact is the multivariate multiple linear regression model.

Assumptions of multivariate regression analysis

In order to have the successful application of multivariate regression analysis, it is essential that the dataset should meet the following assumptions:

Assumptions of multivariate regression analysis
Figure 3: Assumptions of multivariate regression analysis
  1. Observation independence: Independence of observations state that there is a presence of no relationship between observations or between the groups included in the dataset i.e. each observation should be from a different respondent (Laerd, 2021).
  2. Adequate sample size: There should be the inclusion of enough observations in the dataset i.e. large sample size, better would be the results (Laerd, 2021).
  3. No multivariate outliers: Extreme values which are completely different or outside the normal range of values are known as multivariate outliers. The presence hampers the efficiency of results, thus, using Mahalanobis distance, the outlier’s presence should be tested for dependent variables. Herein, the p-value is computed for Mahalanobis distance and if the value is less than 0.001, then that observation should be eliminated from the dataset (Hasan, 2020; Laerd, 2021).
  4. Multivariate normality: Normality in each of the dependent variable states the presence of symmetricity in the table. Thus, for building an adequate model it is required to validate the presence of multivariate normality. In multivariate analysis, normality could be detected by Shapiro-Wilk test or Examination of skewness value i.e. if the p-value for Shapiro-Wilk test is more than the significance level of study (for example – more than 0.05) or skewness is close to 0, thus, the dataset is normal (Laerd, 2021).
  5. Variable reliability: It defines the efficiency of variables in measuring the respective construct. Thus, dependent variables included in the model should be reliable i.e. their Cronbach alpha value should be more than 0.7 (Laerd, 2021).
  6. Linearity among dependent and independent variable: The relationship between the dependent and independent variables should be linear and if the variables are not linearly related, the power of the test is reduced (Laerd, 2021).
  7. Absence of multicollinearity: For building in the multivariate regression model, it is required that there should be a presence of moderate correlation between dependent variables. If the correlation is low, then it’s better to run the model separately. But if the correlation value is more than 0.9, then there is the presence of multicollinearity. Hence, a collinearity matrix should be built to validate the absence of multicollinearity (Laerd, 2021).
  8. Homogeneity of the variance-covariance matrix: There should be homogeneity presence in the covariance of the dataset thus, Box M test of equality of covariance or Levene’s test of homogeneity of variance could be done to validate this assumption. If the significance value is more than the required value such as 0.05, then there is the presence of homogeneity (Hasan, 2020; Laerd, 2021).

Relevance of multivariate regression analysis

Multivariate regression analysis enables the examination of a complex dataset that is more in existence while the univariate analysis method can’t handle such data analysis. Thus, it is essential to practice multivariate regression analysis with more response variables (Glen, 2021). Furthermore, multivariate analysis reduces the possibility of error and provide a more realistic picture. It is not focused on a single variable to provide more value to results. Lastly, multivariate analysis is a powerful test in today’s time wherein decisions are not just concentrated on one aspect but on multiple constructs like purchasing a car. It is not about price but also quality, functionality, and safety (Jackson, 2018). Hence, as multivariate analysis method model reality wherein situation, decision, or product consist of more than one response variable; making its application and understanding relevant.

A sample analysis

The researcher focuses on determining the influence of strategic management (strategies formulation and strategies implementation) on the organization (business strategy effectiveness and organizational performance). For this, a sample size of 100 employees was considered and multivariate regression analysis is required. As before building in the model, it is required to test the assumptions, thus, analysis is done.

As the sample is collected from different respondents, observations are independent and the sample size is 100, which though is not too large but is effective to measure responses. For multivariate outliers, Mahalanobis distance was computed. The below figure show the results

Figure 4: Mahalanobis Distance results

As the p-value for both dependent variables are more than 0.001, thus, there is no outlier present.

Furthermore, with respect to multivariate normality, the Shapiro-Wilk test or Examination of skewness value could be done. Herein, skewness value examination was done which depict that

Table 1: Multivariate normality

The above table shows that the value of Skewness is not close to 0, thus, there is no presence of normality in the dataset.

Reliability examination for the variables is presented below i.e.

VariableCronbach alpha value
Table 2: Reliability

The above table shows that the value of each variable is more than 0.7, thus, they are effective in measuring responses.

Furthermore, strategic management has a direct association with the organization, thus, there is the existence of a linear relationship between both the variables.

For multicollinearity examination, the collinearity matrix of dependent variables is presented below

Table 3: Collinearity matrix

The above table shows that as the value is above 0.2 but less than 0.9, thus, there is an existence of a moderate correlation between dependent variables and the absence of multicollinearity.

Lastly, for the examination of Homogeneity of Variance-Covariance matrix, the Levene’s Test of Equality value is presented below

Table 4: Homogeneity of Variance-Covariance matrix

The above table shows that with a value less than 0.05, the dependent variable BS does reject the null hypothesis of having equal error variance. Hence, there is not homogeneity of variance present. Hence, as the skewness and homogeneity condition of multivariate regression analysis is not satisfied, thus, multivariate regression analysis could not be done, or adjustment of data is required before building in the model.


Riya Jain