R Code
Mean | SD | Histogram | |
---|---|---|---|
Sales per Capita | 95.15 | 41.13 | ▂▅▅▇▄▁ |
Real Price | 3.40 | 1.64 | ▇▇▂▃▂▂▁ |
Nominal Price | 2.68 | 2.24 | ▇▄▁▂▂▁▁ |
We’re interested in cigarette prices and sales, so let’s focus our summaries on those two variables
Mean | SD | Histogram | |
---|---|---|---|
Sales per Capita | 95.15 | 41.13 | ▂▅▅▇▄▁ |
Real Price | 3.40 | 1.64 | ▇▇▂▃▂▂▁ |
Nominal Price | 2.68 | 2.24 | ▇▄▁▂▂▁▁ |
Instrumental Variables (IV) is a way to identify causal effects using variation in treatment particpation that is due to an exogenous variable that is only related to the outcome through treatment.
Two reasons to consider IV:
Either problem is sometimes loosely referred to as endogeneity
Consider simple regression equation: \[y = \beta x + \varepsilon (x),\] where \(\varepsilon(x)\) reflects the dependence between our observed variable and the error term.
Simple OLS will yield \[\frac{dy}{dx} = \beta + \frac{d\varepsilon}{dx} \neq \beta\]
The regression we want to do, \[y_{i} = \alpha + \delta D_{i} + \gamma A_{i} + \epsilon_{i},\] where \(D_{i}\) is treatment (think of schooling for now) and \(A_{i}\) is something like ability.
\(A_{i}\) is unobserved, so instead we run \[y_{i} = \alpha + \beta D_{i} + \epsilon_{i}\]
From this “short” regression, we don’t actually estimate \(\delta\). Instead, we get an estimate of \[\beta = \delta + \lambda_{ds}\gamma \neq \delta,\] where \(\lambda_{ds}\) is the coefficient of a regression of \(A_{i}\) on \(D_{i}\).
IV will recover the “long” regression without observing underlying ability
IF our IV satisfies all of the necessary assumptions.
We want to estimate \[E[Y_{i} | D_{i}=1] - E[Y_{i} | D_{i}=0]\]
With instrument \(Z_{i}\) that satisfies relevant assumptions, we can estimate this as \[E[Y_{i} | D_{i}=1] - E[Y_{i} | D_{i}=0] = \frac{E[Y_{i} | Z_{i}=1] - E[Y_{i} | Z_{i}=0]}{E[D_{i} | Z_{i}=1] - E[D_{i} | Z_{i}=0]}\]
In words, this is effect of the instrument on the outcome (“reduced form”) divided by the effect of the instrument on treatment (“first stage”)
Recall “long” regression: \(Y=\alpha + \delta S + \gamma A + \epsilon\).
\[\begin{align} COV(Y,Z) & = E[YZ] - E[Y] E[Z] \\ & = E[(\alpha + \delta S + \gamma A + \epsilon)\times Z] - E[\alpha + \delta S + \gamma A + \epsilon)]E[Z] \\ & = \alpha E[Z] + \delta E[SZ] + \gamma E[AZ] + E[\epsilon Z] \\ & \hspace{.2in} - \alpha E[Z] - \delta E[S]E[Z] - \gamma E[A] E[Z] - E[\epsilon]E[Z] \\ & = \delta (E[SZ] - E[S] E[Z]) + \gamma (E[AZ] - E[A] E[Z]) \\ & \hspace{.2in} + E[\epsilon Z] - E[\epsilon] E[Z] \\ & = \delta C(S,Z) + \gamma C(A,Z) + C(\epsilon, Z) \end{align}\]
Working from \(COV(Y,Z) = \delta COV(S,Z) + \gamma COV(A,Z) + COV(\epsilon,Z)\), we find
\[\delta = \frac{COV(Y,Z)}{COV(S,Z)}\]
if \(COV(A,Z)=COV(\epsilon, Z)=0\)
Easy to think of in terms of randomized controlled trial…
Measure | Offered Seat | Not Offered Seat | Difference |
---|---|---|---|
Score | -0.003 | -0.358 | 0.355 |
% Enrolled | 0.787 | 0.046 | 0.741 |
Effect | 0.48 |
Angrist et al., 2012. “Who Benefits from KIPP?” Journal of Policy Analysis and Management.
Think of IV as two-steps:
Interested in estimating \(\delta\) from \(y_{i} = \alpha + \beta x_{i} + \delta D_{i} + \varepsilon_{i}\), but \(D_{i}\) is endogenous (no pure “selection on observables”).
Step 1: With instrument \(Z_{i}\), we can regress \(D_{i}\) on \(Z_{i}\) and \(x_{i}\): \[D_{i} = \lambda + \theta Z_{i} + \kappa x_{i} + \nu,\] and form prediction \(\hat{D}_{i}\).
Step 2: Regress \(y_{i}\) on \(x_{i}\) and \(\hat{D}_{i}\): \[y_{i} = \alpha + \beta x_{i} + \delta \hat{D}_{i} + \xi_{i}\]
Recall \(\hat{\theta}=\frac{C(Z,S)}{V(Z)}\), or \(\hat{\theta}V(Z) = C(Y,Z)\). Then:
\[\begin{align} \hat{\delta} & = \frac{COV(Y,Z)}{COV(S,Z)} \\ & = \frac{\hat{\theta}C(Y,Z)}{\hat{\theta}C(S,Z)} = \frac{\hat{\theta}C(Y,Z)}{\hat{\theta}^{2}V(Z)} \\ & = \frac{C(\hat{\theta}Z,Y)}{V(\hat{\theta}Z)} = \frac{C(\hat{S},Y)}{V(\hat{S})} \end{align}\]
But in practice, DON’T do this in two steps. Why?
Because standard errors are wrong…not accounting for noise in prediction, \(\hat{D}_{i}\). The appropriate fix is built into most modern stats programs.
We’ll talk about this next class!