How to use an instrumental variable?

By Prateek Sharma & Priya Chetty on May 4, 2018

Instrumental variable is a third variable that estimates causal relationships in the regression analysis when an endogenous variable is present. Instrumental variables are useful when the independent variable in the regression model correlates with the error term in the model. A major complication in econometrics is the possibility of inconsistent parameter estimation due to endogenous regressors. The instrumental variables estimator provides a way to nonetheless obtain consistent parameter estimates. Instrumental variables methods are popular in addressing the following problems which occur in OLS regression:

  • Omitted variable bias (occurs when a model created incorrectly).
  • Measurement error (the difference between a measured quantity and its true value).
  • Simultaneity or reverse causality (refers either to a direction of cause-and-effect contrary to a common presumption or to a two-way causal relationship).

There are two major assumptions for any variable to become an instrumental variable. The first is that the instrumental variable should correlate with the error term. Second, the instrumental variable should highly correlate with the variable that is being replaced by the instrumental variable.

Example of instrumental variable

This section presents an example to explain the application of instrumental variable. In the sample dataset, the aim is to estimate the change in market demand due to change in price. In this case quantity of demand clearly depends on price, however prices are not specified. This is because the prices partly depend upon market demand. So, one has to use the instrumental variable to correctly estimate the model. A suitable instrument for price is a variable that impacts price but does not directly affect demand.

An obvious candidate is a variable that affects supply. This is because supply is correlated with prices, however it does not directly affect demand. Therefore, for instance, when talking about agricultural products, ‘favorable growing conditions’ can be used as the instrumental variable. The choice of instrument here is appropriate since favorable growing conditions do not directly affect demand. However it affects the price as per the formal economic model of supply and demand.

Case study on instrumental variables

This section explains the application of instrumental variables through a sample dataset. The dataset contains two variables; ‘wage’ and ‘educational level’. The aim is to determine the impact of educational level on the wage. However there are other factors which are highly correlating with education and also affect the wage, such as skills and IQ level. So the variable ‘education’ in this regression becomes an endogenous variable.

Now it is important to find the instrumental variable which is highly correlating with education level of the candidate is not related to the wage of the candidate. One such instrumental variable is the education level of the parents. The educational level of the parents is correlating with education level of child. This is because highly educated parents also encourage their child for higher education. Also, the parents’ education level does not directly affect the wage of the child. The below section discusses the results of the regression using parents’ education level. The section below presents the analysis using STATA software.

Step 1

The first step is to conduct instrumental variable regression.

Statistics> endogenous covariates> single equated instrumental variable regression
Table 1: Results for instrumental variable regression
Table 1: Results for instrumental variable regression

Step 2

After the regression analysis, the next step is to check the endogeniety of the variable in the model. The table below shows the results.

Table 2: Test of endogeneity for education level
Table 2: Test of endogeneity for education level

In this case the null hypothesis is that the variables are exogenous or the independent variables are not correlated with the error term in the regression model. However in this case both the Durbin score and the Wu-Hausman test are significant. So the null hypothesis can be rejected. In other words, the variables are endogenous.

Table 3: Results from first stage regression and the summary statistics
Table 3: Results from first stage regression and the summary statistics

The results from the table 3 test the null hypothesis, that the instrumental variable is weak against the alternative hypothesis that the instruments are not weak. Since R square is 0.75, the null hypothesis can be rejected and conclude that the instrumental variable are not weak. So using parents’ education level as instrument variable has been able to explain the variation.

Table 4: Results from over-identification of the model
Table 4: Results from over-identification of the model

The results from first stage (Tables 3 and 4) shows that the instrumental variable is correctly identified, so this case tests whether the model is over-identified or not. In this case the null hypothesis is that the instrument set is valid and the model is correct. As shown in table above, Sargan and Basmann test show that the p values are highly insignificant. Therefore the null hypothesis cannot be rejected. Hence the model is valid and correct.

Application of instrumental variables

Some of the uses of instrumental variables are as follows:

  • For real-world evidence regarding treatment effectiveness and organizational issues, instrumental variables are widely popular in the field of outcomes research.
  • They are useful to control for confounding and measurement error in observational studies.
  • In economics, instrumental variables are useful in determining which factors influence demand without affecting cost and vice versa
  • They can also be useful when evaluating a randomized trial.

Software that support instrument variables applications with multiple independent variables are R, SAS, MATLAB, STATA and SPSS.

Discuss

2 thoughts on “How to use an instrumental variable?”