Omitted Variable Bias

You’re not a historian, but most historians will tell you that they make very discrete judgment as to what facts to omit in order to make their book into some shape, some length that can be managed.

Oliver Stone

Omitted variable bias happens when a variable is omitted from your regression model. Duh!

The reasons as to why such omissions may occur are numerous, such as data unavailability or just mistakenly neglecting the effect or the importance of the variable being omitted.

For instance, let’s say that the true model is:

y=β0+ β1x1+ β2xx+u

But, we are, advertently or inadvertently, estimating this misspecified model instead:

y=β0+ β1x1 +u

As a result, the x2 variable will be part of the error term, giving:

The new error term= ν = (β2x2+u)

Now, if x2 is correlated with x1 (As seen in the example below), then the error term will be correlated with the regressor, violating the Gauss-Markov assumption which estates that the error term should be uncorrelated with the regressors. Such correlation will lead to inconsistent estimates.

Simply put, because we are not regressing our dependent variable (y) on both of the relevant regressors (x1 and x2 in this case), we tend to over/under estimate the effect of the included regressors on the dependent variable.

The most common example of omitted variable bias is the effect of schooling on wages. If we want to analyze this effect, we should include students’ abilities as part of the regression since it is this ability that first may make a student more successful than his peers during his education and second, it may lead to better jobs for him after graduation and hence better wages. But why would we exclude “ability” from our regression model? Mainly because even if we know that it is part of the true model, we cannot include it in the model because we do not have a reasonable and trustworthy measure of ability. Therefore, sometimes even if you know that you have to include specific regressors in the model, you simply cannot, because the data are unavailable or it is not possible to measure the regressor.

So if the actual model is:

Wages = β0+ β1.education + β2.ability + u

And we consider it to be:

Wages = β0+ β1.education + u

Then the included regressor, β1, will take some of the effect of ability in itself, resulting in unbiased estimates of β1. Will the bias be upward or downward? Well, in this case, since ability has a positive effect on the wages, but you are omitting it from the model, then β1 will be upwardly biased, since it will take some of the ‘credit’ of β2, our excluded regressor.

Also watch

Further reading

Introductory econometrics – Jeffrey M. Wooldridge – Chapter 3.3 (used as reference here)

Endogeneity Source: Omitted Variables

Omitted Variable Bias in 3D

Applied Econometrics: When Can an Omitted Variable Invalidate a Regression

Confounding Variables Can Bias Your Results