# Omitted Variable Bias

You’re not a historian, but most historians will tell you that they make very discrete judgment as to what facts to omit in order to make their book into some shape, some length that can be managed.

Oliver Stone

Omitted variable bias happens when a variable is omitted from your regression model. Duh!

The reasons as to why such omissions may occur are numerous, such as data unavailability or just mistakenly neglecting the effect or the importance of the variable being omitted.

For instance, let’s say that the true model is:

y=β_{0}+ β_{1}x_{1}+ β_{2}x_{x}+u

But, we are, advertently or inadvertently, estimating this misspecified model instead:

y=β_{0}+ β_{1}x_{1}
+u

As a result, the x_{2} variable will be part of the
error term, giving:

The new error term= ν = (β_{2}x_{2}+u)

Now, if x_{2} is correlated with x_{1}
(As seen in the example below), then the error term will be correlated with the
regressor, violating the Gauss-Markov assumption which estates that the error
term should be uncorrelated with the regressors. Such correlation will lead to
inconsistent estimates.

Simply put, because we
are not regressing our dependent variable (y) on both of the relevant
regressors (x_{1} and x_{2} in this case), we tend to
over/under estimate the effect of the included regressors on the dependent
variable.

The most common example of omitted variable bias is the effect of schooling on wages. If we want to analyze this effect, we should include students’ abilities as part of the regression since it is this ability that first may make a student more successful than his peers during his education and second, it may lead to better jobs for him after graduation and hence better wages. But why would we exclude “ability” from our regression model? Mainly because even if we know that it is part of the true model, we cannot include it in the model because we do not have a reasonable and trustworthy measure of ability. Therefore, sometimes even if you know that you have to include specific regressors in the model, you simply cannot, because the data are unavailable or it is not possible to measure the regressor.

So if the actual model is:

Wages = β_{0}+ β_{1}.education + β_{2}.ability
+ u

And we consider it to be:

Wages = β_{0}+ β_{1}.education + u

Then the included regressor, β_{1,} will take some of the effect of ability in itself, resulting in unbiased estimates of β_{1}. Will the bias be upward or downward? Well, in this case, since ability has a positive effect on the wages, but you are omitting it from the model, then β_{1 }will be upwardly biased, since it will take some of the ‘credit’ of β_{2}, our excluded regressor.

### Also watch

### Further reading

Introductory econometrics – Jeffrey M. Wooldridge – Chapter 3.3 (used as reference here)

Endogeneity Source: Omitted Variables

Applied Econometrics: When Can an Omitted Variable Invalidate a Regression

Confounding Variables Can Bias Your Results