# How to conduct generalized least squares test?

In statistics, **Generalised Least Squares (GLS)** is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals. *The Ordinary Least Squares (OLS)* method only estimates the parameters in the linear regression model. Also, it seeks to minimize the sum of the squares of the differences between the observed responses in the given dataset and those predicted by a linear function. The main advantage of using *OLS* regression for estimating parameters is that it is easy to use. However, *OLS* gives robust results only if there are no missing values in the data and there are no major outliers in the data set. Moreover, the *OLS* regression model does not take into account unequal variance or ‘heteroskedastic errors’. Due to heteroscedastic errors, the results are not robust and also create bias.

Therefore, the generalized least squares test is crucial in tackling the problem of outliers, heteroskedasticity and bias in data. It is capable of producing estimators that are ‘Best Linear Unbiased Estimates’. Thus, the **GLS** estimator is unbiased, consistent, efficient and asymptotically normal.** **

## Major assumption for generalized least square regression analysis

The *OLS* model has main assumption of having estimators as BLUE. But as violation of BLUE assumption of Gauss Markov theorem result in having serial correlation and homoskedasticity assumption violation. This prevents model fitting and hence impact assessment or relationship development could not be done. Thus, **GLS** as an improvised model to deal with these violations has below stated assumptions

- The error variances are heteroskedastic
- Errors are correlated
- Normally distributed
- Absence of multicollinearity

In the absence of first two assumptions, the *OLS* estimators and the **GLS** estimators are the same. Thus, the difference between *OLS* and **GLS** is the assumptions of the error term of the model. There are 3 different perspectives from which one can understand the **GLS** estimator:

- A generalization of
*OLS* - Transforming the model equation to a new model whose errors are uncorrelated and have equal variances that is homoskedastic.

## Example of generalized least squares test

This section explains the process of applying **GLS** with the use of a case study. The sample dataset contains data from 30 students. The aim is to review the impact of self-efficiency and ability (independent variable) on achievement (dependent variable). For this case study first, a simple linear regression is performed and the results are compared with the generalized least squares test.

### Step 1: Linear regression

Since the dependent variable is continuous in nature, it is important to confirm if the dependent variable follows a normal distribution. The distribution of residuals of the dependent variable (achievement) is normal, with skewness -0.18 and kurtosis 1.95. As the table above shows, linear regression was performed to check the relationship between achievement and self-efficiency and ability. The parameter estimate was 0.003 with a p-value of 0.989. For another independent variable ability, the parameter estimate was -0.047 with a p-value of 0.823. This shows that none of the independent variables are statistically significant as the p-value is greater than 0.05.

The interpretation of coefficients of the independent variables is as follows:

- The independent variable ‘self-efficiency’ is positively related to dependent variable ‘achievement’. However other independent variable ‘ability’ is negatively attributed to the dependent variable.
- The estimates parameter and p value shows that the sample size was inadequate to demonstrate the true spectrum of relationship .
- Furthermore, for every unit of rise in self-efficiency, the dependent variable also increases by 1 unit, keeping all other factors same.

### Step 2: Weighted least squares regression

After performing the weighted analysis, self-efficiency was found to influence the achievement more, with a beta coefficient of 0.045 and a value of 0.021. This shows that the regression coefficient is statistically significant. Ability influenced the achievement less, with a beta coefficient of 0.014 with a value of 0.046. Both the p values are statistically significant which indicates that GLS is a better fit than simple regression done previously. Therefore there is the significant importance of ranking or relationship between dependent variable ‘achievement’ and independent variable ‘self- efficiency’ and ‘ability’.

## Application of generalized least squares

**GLS**model is useful in regionalisation of hydrologic data.**GLS**is also useful in reducing autocorrelation by choosing an appropriate weighting matrix.- It is one of the best methods to estimate regression models with auto correlate disturbances and test for serial correlation (Here Serial correlation and auto correlate are same things).
- One can also learn to use the maximum likelihood technique to estimate the regression models with auto correlated disturbances.
- The
**GLS**procedure finds extensive use across various domains.The goal of**GLS**method to estimate the parameters of regional regression models of flood quantiles. **GLS**is widely popular in conducting market response model, econometrics and time series analysis.

A number of available software support the generalized least squares test, like R, MATLAB, SAS, SPSS, and STATA.

## Discuss