How to conduct generalized least squares test?

In statistics, Generalized Least Squares (GLS) is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals. Ordinary Least Squares (OLS) method only estimates the parameters in linear regression model. Also, it seeks to minimize the sum of the squares of the differences between the observed responses in the given dataset and those predicted by a linear function. The main advantage of using OLS regression for estimating parameters is that it is easy to use. However OLS gives robust results only if there are no missing values in the data and there are no major outliers in the data set. Moreover, OLS regression model does not take into account unequal variance, or ‘heteroskedastic errors’. Due to heteroskedastic errors the results are not robust and also creates bias.

Therefore, the generalized least squares test is crucial in tackling the problem of outliers, heteroskedasticity and bias in data. It is capable of producing estimators that are ‘Best Linear Unbiased Estimates’. Thus, GLS estimator is unbiased, consistent, efficient and asymptotically normal. 

Major assumption for generalized least square regression analysis

The assumption of GLS is that the errors are independent and identically distributed. Furthermore, other assumptions include:

  • The error variances are homoscedastic
  • Errors are uncorrelated
  • Normally distributed

In the absence of these assumptions, the OLS estimators and the GLS estimators are same. Thus, the difference between OLS and GLS is the assumptions of the error term of the model. There are 3 different perspectives from which one can understand the GLS estimator:

  • A generalization of OLS
  • Transforming the model equation to a new model whose errors are uncorrelated and have equal variances that is homoskedastic.

Example of generalized least squares test

This section explains the process of applying GLS with the use of a case study. The sample dataset contains data of 30 students. The aim is to review the impact of self-efficiency and ability (independent variable) on achievement (dependent variable). For this case study first a simple linear regression is performed and the results are compared with the generalized least squares test.

Step 1: Linear regression

Table: 1 Simple linear regression of case study

Table: 1 Simple linear regression of case study

Since the dependent variable is continuous in nature, it is important to confirm if the dependent variable follows normal distribution. The distribution of residuals of dependent variable (achievement) is normal, with skewness -0.18 and kurtosis 1.95. As the table above shows, linear regression was performed to check the relationship between achievement and self-efficiency and ability. The parameter estimates was 0.003 with p value 0.989. For another independent variable ability, the parameter estimates was -0.047 with p value 0.823. This shows that none of the independent variable are statistically significant as the p value is greater than 0.05.

The interpretation of coefficients of the independent variables is as follows:

  • The independent variable ‘self-efficiency’ is positively related to dependent variable ‘achievement’. However other independent variable ‘ability’ is negatively attributed to the dependent variable.
  • The estimates parameter and p value shows that the sample size was inadequate to demonstrate the true spectrum of relationship .
  • Furthermore, for every unit of rise in self-efficiency, the dependent variable also increases by 1 unit, keeping all other factors same.

Step 2: Weighted least squares regression

Table 2: Weighted least squares regression of generalized least squares case study

Table 2: Weighted least squares regression of case study

After performing the weighted analysis, self-efficiency was found to influence the achievement more, with beta coefficient of 0.045 and value of 0.021. This shows that the regression coefficient is statistically significant. Ability influenced the achievement less, with beta coefficient of 0.014 with value 0.046. Both the p values are statistically significant which indicates that GLS is a better fit than simple regression done previously. Therefore there is significant importance of ranking or relationship between dependent variable ‘achievement’ and independent variable ‘self- efficiency’ and ‘ability’.

 Application of generalized least squares

  • GLS model is useful in regionalization of hydrologic data.
  • GLS is also useful in reducing autocorrelation by choosing an appropriate weighting matrix.
  • It is one of the best methods to estimate regression models with auto correlate disturbances and test for serial correlation (Here Serial correlation and auto correlate are same things).
  • One can also learn to use the maximum likelihood technique to estimate the regression models with auto correlated disturbances.
  • The GLS procedure finds extensive use across various domains.The goal of GLS method to estimate the parameters of regional regression models of flood quantiles.
  • GLS is widely popular in conducting market response model, econometrics and time series analysis.

A number of available software support the generalized least squares test, like R, MATLAB, SAS, SPSS, and STATA.

Priya Chetty

Partner at Project Guru
Priya is a master in business administration with majors in marketing and finance. She is fluent with data modelling, time series analysis, various regression models, forecasting and interpretation of the data. She has assisted data scientists, corporates, scholars in the field of finance, banking, economics and marketing.

Related articles

  • How to perform LASSO regression test? In statistics, to increase the prediction accuracy and interpret-ability of the model, LASSO (Least Absolute Shrinkage and Selection Operator) is extremely popular. It is a regression procedure that involves selection and regularisation and was developed in 1989. Lasso regression is an […]
  • How to use an instrumental variable? Instrumental variable is a third variable that estimates causal relationships in the regression analysis when an endogenous variable is present. Instrumental variables are useful when the independent variable in the regression model correlates with the error term in the model.
  • How to perform nonlinear regression? Regression analysis is a statistical tool to study the relationship between variables. These variables are the outcome variable and one or more exposure variables. In other words, regression analysis is an equation which predicts a response from the value of a certain predictor.
  • How to perform cross validation on a data set? Thus to assess the model, a common practice in data science is to iterate over various models and select the most appropriate model. In other words it is important to test the same model with different values of parameters.This is called the cross validation method.
  • How to conduct path analysis? Path analysis is a graphical representation of multiple regression models. In this analysis, the graphs represent the relationship between dependent and independent variables with the help of square and arrows.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.