In statistics, Generalised Least Squares (GLS) is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals. The Ordinary Least Squares (OLS) method only estimates the parameters in the linear regression model. Also, it seeks to minimize the sum of the squares of the differences between the observed responses in the given dataset and those predicted by a linear function. The main advantage of using OLS regression for estimating parameters is that it is easy to use. However, OLS gives robust results only if there are no missing values in the data and there are no major outliers in the data set. Moreover, the OLS regression model does not take into account unequal variance or ‘heteroskedastic errors’. Due to heteroscedastic errors, the results are not robust and also create bias.
Therefore, the generalized least squares test is crucial in tackling the problem of outliers, heteroskedasticity and bias in data. It is capable of producing estimators that are ‘Best Linear Unbiased Estimates’. Thus, the GLS estimator is unbiased, consistent, efficient and asymptotically normal.
Major assumption for generalized least square regression analysis
The OLS model has main assumption of having estimators as BLUE. But as violation of BLUE assumption of Gauss Markov theorem result in having serial correlation and homoskedasticity assumption violation. This prevents model fitting and hence impact assessment or relationship development could not be done. Thus, GLS as an improvised model to deal with these violations has below stated assumptions
- The error variances are heteroskedastic
- Errors are correlated
- Normally distributed
- Absence of multicollinearity
In the absence of first two assumptions, the OLS estimators and the GLS estimators are the same. Thus, the difference between OLS and GLS is the assumptions of the error term of the model. There are 3 different perspectives from which one can understand the GLS estimator:
- A generalization of OLS
- Transforming the model equation to a new model whose errors are uncorrelated and have equal variances that is homoskedastic.
Example of generalized least squares test
This section explains the process of applying GLS with the use of a case study. The sample dataset contains data from 30 students. The aim is to review the impact of self-efficiency and ability (independent variable) on achievement (dependent variable). For this case study first, a simple linear regression is performed and the results are compared with the generalized least squares test.
Step 1: Linear regression
Since the dependent variable is continuous in nature, it is important to confirm if the dependent variable follows a normal distribution. The distribution of residuals of the dependent variable (achievement) is normal, with skewness -0.18 and kurtosis 1.95. As the table above shows, linear regression was performed to check the relationship between achievement and self-efficiency and ability. The parameter estimate was 0.003 with a p-value of 0.989. For another independent variable ability, the parameter estimate was -0.047 with a p-value of 0.823. This shows that none of the independent variables are statistically significant as the p-value is greater than 0.05.
The interpretation of coefficients of the independent variables is as follows:
- The independent variable ‘self-efficiency’ is positively related to dependent variable ‘achievement’. However other independent variable ‘ability’ is negatively attributed to the dependent variable.
- The estimates parameter and p value shows that the sample size was inadequate to demonstrate the true spectrum of relationship .
- Furthermore, for every unit of rise in self-efficiency, the dependent variable also increases by 1 unit, keeping all other factors same.
Step 2: Weighted least squares regression
After performing the weighted analysis, self-efficiency was found to influence the achievement more, with a beta coefficient of 0.045 and a value of 0.021. This shows that the regression coefficient is statistically significant. Ability influenced the achievement less, with a beta coefficient of 0.014 with a value of 0.046. Both the p values are statistically significant which indicates that GLS is a better fit than simple regression done previously. Therefore there is the significant importance of ranking or relationship between dependent variable ‘achievement’ and independent variable ‘self- efficiency’ and ‘ability’.
Application of generalized least squares
- GLS model is useful in regionalisation of hydrologic data.
- GLS is also useful in reducing autocorrelation by choosing an appropriate weighting matrix.
- It is one of the best methods to estimate regression models with auto correlate disturbances and test for serial correlation (Here Serial correlation and auto correlate are same things).
- One can also learn to use the maximum likelihood technique to estimate the regression models with auto correlated disturbances.
- The GLS procedure finds extensive use across various domains.The goal of GLS method to estimate the parameters of regional regression models of flood quantiles.
- GLS is widely popular in conducting market response model, econometrics and time series analysis.
A number of available software support the generalized least squares test, like R, MATLAB, SAS, SPSS, and STATA.