**********
**Example for STATA IV Regression
**********
**clear STATA's memory
clear all
**Generate 1000 random observations
**Note that there is causal relationship between X and Y. A one unit increase in X increases Y by 0.5 units.
set obs 1000
gen z = rnormal(0,1)
gen u = rnormal(0,1)
gen x = rnormal(0,1)+.75*z + 2*u
gen y = rnormal(0,1) -4*u + .5*x
**Look at graph between X and Y. The graph makes it seem like there is a negative relationship. The correlation is negative.
twoway scatter y x
correl x y
**If we do the regression, we find a negative relationship. We might mistakenly think X decreases Y.
reg y x
**The incorrect estimation is due to ommitted variable bias. U is missing, and it affects both X and Y. If we include U then OLS works out.
reg y x u
**What if we don't know exactly what U is though? Then what can we do. We can do an IV regression.
**Note that there is a causal relationship between Z and X, but Z does not otherwise affect Y.
**Look at graph between X and Z. There is a positive correlation, but not particularly strong.
twoway scatter x z
correl x z
**Now we can try an IV regression. Look how good we do at estimating the relationship between X and Y.
ivregress 2sls y (x=z)
**We still have ommitted variable bias though, so even though we have the correct causal effect, we need U to predict what Y will be for a given X.
**Our predicted Y versus actuall Y, based on IV regression.
predict ypred if e(sample)
twoway scatter y ypred
correl y ypred
**The reason it does a poor job, is because predicting Y based on the observed X is still ignoring that the X's are being affected by U, which are going to affect Y.
**To know what Y is we need to know U, but to know how X affects Y, we do not need to know U as long as we have an Instrumental Variable.