How to use an instrumental variable?

Instrumental variable is a third variable that estimates causal relationships in the regression analysis when an endogenous variable is present. Instrumental variables are useful when the independent variable in the regression model correlates with the error term in the model. A major complication in econometrics is the possibility of inconsistent parameter estimation due to endogenous regressors. The instrumental variables estimator provides a way to nonetheless obtain consistent parameter estimates. Instrumental variables methods are popular in addressing the following problems which occur in OLS regression:

  • Omitted variable bias (occurs when a model created incorrectly).
  • Measurement error (the difference between a measured quantity and its true value).
  • Simultaneity or reverse causality (refers either to a direction of cause-and-effect contrary to a common presumption or to a two-way causal relationship).

There are two major assumptions for any variable to become an instrumental variable. The first is that the instrumental variable should correlate with the error term. Second, the instrumental variable should highly correlate with the variable that is being replaced by the instrumental variable.

Example of instrumental variable

This section presents an example to explain the application of instrumental variable. In the sample dataset, the aim is to estimate the change in market demand due to change in price. In this case quantity of demand clearly depends on price, however prices are not specified. This is because the prices partly depend upon market demand. So, one has to use the instrumental variable to correctly estimate the model. A suitable instrument for price is a variable that impacts price but does not directly affect demand.

An obvious candidate is a variable that affects supply. This is because supply is correlated with prices, however it does not directly affect demand. Therefore, for instance, when talking about agricultural products, ‘favorable growing conditions’ can be used as the instrumental variable. The choice of instrument here is appropriate since favorable growing conditions do not directly affect demand. However it affects the price as per the formal economic model of supply and demand.

Case study on instrumental variables

This section explains the application of instrumental variables through a sample dataset. The dataset contains two variables; ‘wage’ and ‘educational level’. The aim is to determine the impact of educational level on the wage. However there are other factors which are highly correlating with education and also affect the wage, such as skills and IQ level. So the variable ‘education’ in this regression becomes an endogenous variable.

Now it is important to find the instrumental variable which is highly correlating with education level of the candidate is not related to the wage of the candidate. One such instrumental variable is the education level of the parents. The educational level of the parents is correlating with education level of child. This is because highly educated parents also encourage their child for higher education. Also, the parents’ education level does not directly affect the wage of the child. The below section discusses the results of the regression using parents’ education level. The section below presents the analysis using STATA software.

Step 1

The first step is to conduct instrumental variable regression.

Statistics> endogenous covariates> single equated instrumental variable regression
Table 1: Results for instrumental variable regression

Table 1: Results for instrumental variable regression

Step 2

After the regression analysis, the next step is to check the endogeniety of the variable in the model. The table below shows the results.

Table 2: Test of endogeneity for education level

Table 2: Test of endogeneity for education level

In this case the null hypothesis is that the variables are exogenous or the independent variables are not correlated with the error term in the regression model. However in this case both the Durbin score and the Wu-Hausman test are significant. So the null hypothesis can be rejected. In other words, the variables are endogenous.

Table 3: Results from first stage regression and the summary statistics

Table 3: Results from first stage regression and the summary statistics

The results from the table 3 test the null hypothesis, that the instrumental variable is weak against the alternative hypothesis that the instruments are not weak. Since R square is 0.75, the null hypothesis can be rejected and conclude that the instrumental variable are not weak. So using parents’ education level as instrument variable has been able to explain the variation.

Table 4: Results from over-identification of the model

Table 4: Results from over-identification of the model

The results from first stage (Tables 3 and 4) shows that the instrumental variable is correctly identified, so this case tests whether the model is over-identified or not. In this case the null hypothesis is that the instrument set is valid and the model is correct. As shown in table above, Sargan and Basmann test show that the p values are highly insignificant. Therefore the null hypothesis cannot be rejected. Hence the model is valid and correct.

Application of instrumental variables

Some of the uses of instrumental variables are as follows:

  • For real-world evidence regarding treatment effectiveness and organizational issues, instrumental variables are widely popular in the field of outcomes research.
  • They are useful to control for confounding and measurement error in observational studies.
  • In economics, instrumental variables are useful in determining which factors influence demand without affecting cost and vice versa
  • They can also be useful when evaluating a randomized trial.

 Software supporting instrumental variables

Software that support instrument variables applications with multiple independent variables are R, SAS, MATLAB, STATA and SPSS.

Prateek Sharma

Prateek Sharma

Analyst at Project Guru
Prateek has completed his graduation in commerce with a rich experience in Telecom, Marketing and Banking domains for preparing comprehensive documents and reports while managing internal and external data analysis. He is an adaptable business-minded Data Analyst at Project Guru skilled in recording, interpreting and analysing data with a demonstrated ability to deliver valuable insights via data analytics and advanced data-driven methods. Apart from his strong passion towards data science, he finds extreme sports interesting. He keeps himself updated with the latest tech and always love to learn more about latest gadgets and technology.
Prateek Sharma

Latest posts by Prateek Sharma (see all)

Related articles

  • How to conduct generalized least squares test? In statistics, Generalized Least Squares (GLS) is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals.
  • How to perform nonlinear regression? Regression analysis is a statistical tool to study the relationship between variables. These variables are the outcome variable and one or more exposure variables. In other words, regression analysis is an equation which predicts a response from the value of a certain predictor.
  • How to perform and apply Monte Carlo simulation? Monte Carlo simulation is an extension of statistical analysis where simulated data is produced. This method uses repeated sampling techniques to generate simulated data.
  • How to perform LASSO regression test? In statistics, to increase the prediction accuracy and interpret-ability of the model, LASSO (Least Absolute Shrinkage and Selection Operator) is extremely popular. It is a regression procedure that involves selection and regularisation and was developed in 1989. Lasso regression is an […]
  • How to perform bootstrap and jackknife analysis? Bootstrap and jackknife are superficially similar statistical techniques that involve re-sampling the data. They are nonparametric and specific resampling techniques that can estimate standard errors and confidence intervals of a population parameter.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.