How to conduct generalized least squares test?

In statistics, Generalized Least Squares (GLS) is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals. Ordinary Least Squares (OLS) method only estimates the parameters in linear regression model. Also, it seeks to minimize the sum of the squares of the differences between the observed responses in the given dataset and those predicted by a linear function. The main advantage of using OLS regression for estimating parameters is that it is easy to use. However OLS gives robust results only if there are no missing values in the data and there are no major outliers in the data set. Moreover, OLS regression model does not take into account unequal variance, or ‘heteroskedastic errors’. Due to heteroskedastic errors the results are not robust and also creates bias.

Therefore, the generalized least squares test is crucial in tackling the problem of outliers, heteroskedasticity and bias in data. It is capable of producing estimators that are ‘Best Linear Unbiased Estimates’. Thus, GLS estimator is unbiased, consistent, efficient and asymptotically normal. 

Major assumption for generalized least square regression analysis

The assumption of GLS is that the errors are independent and identically distributed. Furthermore, other assumptions include:

  • The error variances are homoscedastic
  • Errors are uncorrelated
  • Normally distributed

In the absence of these assumptions, the OLS estimators and the GLS estimators are same. Thus, the difference between OLS and GLS is the assumptions of the error term of the model. There are 3 different perspectives from which one can understand the GLS estimator:

  • A generalization of OLS
  • Transforming the model equation to a new model whose errors are uncorrelated and have equal variances that is homoskedastic.

Example of generalized least squares test

This section explains the process of applying GLS with the use of a case study. The sample dataset contains data of 30 students. The aim is to review the impact of self-efficiency and ability (independent variable) on achievement (dependent variable). For this case study first a simple linear regression is performed and the results are compared with the generalized least squares test.

Step 1: Linear regression

Table: 1 Simple linear regression of case study
Table: 1 Simple linear regression of case study

Since the dependent variable is continuous in nature, it is important to confirm if the dependent variable follows normal distribution. The distribution of residuals of dependent variable (achievement) is normal, with skewness -0.18 and kurtosis 1.95. As the table above shows, linear regression was performed to check the relationship between achievement and self-efficiency and ability. The parameter estimates was 0.003 with p value 0.989. For another independent variable ability, the parameter estimates was -0.047 with p value 0.823. This shows that none of the independent variable are statistically significant as the p value is greater than 0.05.

The interpretation of coefficients of the independent variables is as follows:

  • The independent variable ‘self-efficiency’ is positively related to dependent variable ‘achievement’. However other independent variable ‘ability’ is negatively attributed to the dependent variable.
  • The estimates parameter and p value shows that the sample size was inadequate to demonstrate the true spectrum of relationship .
  • Furthermore, for every unit of rise in self-efficiency, the dependent variable also increases by 1 unit, keeping all other factors same.

Step 2: Weighted least squares regression

Table 2: Weighted least squares regression of generalized least squares case study
Table 2: Weighted least squares regression of case study

After performing the weighted analysis, self-efficiency was found to influence the achievement more, with beta coefficient of 0.045 and value of 0.021. This shows that the regression coefficient is statistically significant. Ability influenced the achievement less, with beta coefficient of 0.014 with value 0.046. Both the p values are statistically significant which indicates that GLS is a better fit than simple regression done previously. Therefore there is significant importance of ranking or relationship between dependent variable ‘achievement’ and independent variable ‘self- efficiency’ and ‘ability’.

Application of generalized least squares

  • GLS model is useful in regionalization of hydrologic data.
  • GLS is also useful in reducing autocorrelation by choosing an appropriate weighting matrix.
  • It is one of the best methods to estimate regression models with auto correlate disturbances and test for serial correlation (Here Serial correlation and auto correlate are same things).
  • One can also learn to use the maximum likelihood technique to estimate the regression models with auto correlated disturbances.
  • The GLS procedure finds extensive use across various domains.The goal of GLS method to estimate the parameters of regional regression models of flood quantiles.
  • GLS is widely popular in conducting market response model, econometrics and time series analysis.

A number of available software support the generalized least squares test, like R, MATLAB, SAS, SPSS, and STATA.

How to conduct a survival analysis?How to use an instrumental variable?
Was this article helpful?