Find covariates \(X_{i}\) such that the following assumptions are plausible:
Then we can use \(X_{i}\) to group observations and use expectations for control as the predicted counterfactuals among treated, and vice versa.
\(E[Y_{1}|D,X]=E[Y_{1}|X]\)
In words…nothing unobserved that determines treatment selection and affects your outcome of interest.
Someone of each type must be in both the treated and untreated groups
\[0 < \text{Pr}(D=1|X) <1\]
With selection on observables and common support:
Sum the average treatment effects by group, and take a weighted average over those groups:
\[ATE=\sum_{i=1}^{N} P(X=x_{i}) \left(E[Y | X, D=1] - E[Y | X, D=0]\right)\]
This is the curse of dimensionality
For each observation \(i\), find the \(m\) “nearest” neighbors, \(J_{m}(i)\).
Impute \(\hat{Y}_{0i}\) and \(\hat{Y}_{1i}\) for each observation: \[\hat{Y}_{0i} = \begin{cases} Y_{i} & \text{if} & D_{i}=0 \\ \frac{1}{m} \sum_{j \in J_{m}(i)} Y_{j} & \text{if} & D_{i}=1 \end{cases}\] \[\hat{Y}_{1i} = \begin{cases} Y_{i} & \text{if} & D_{i}=1 \\ \frac{1}{m} \sum_{j \in J_{m}(i)} Y_{j} & \text{if} & D_{i}=0 \end{cases}\]
Form “matched” ATE: \(\hat{\delta}^{\text{match}} = \frac{1}{N} \sum_{i=1}^{N} \left(\hat{Y}_{1i} - \hat{Y}_{0i} \right)\)
Euclidean distance: \(\sum_{k=1}^{K} (X_{ik} - X_{jk})^{2}\)
Scaled Euclidean distance: \(\sum_{k=1}^{K} \frac{1}{\sigma_{X_{k}}^{2}} (X_{ik} - X_{jk})^{2}\)
Mahalanobis distance: \((X_{i} - X_{j})' \Sigma_{X}^{-1} (X_{i} - X_{j})\)
Estimate propensity score, denoted \(\hat{\pi}(X_{i})\)
Weight by inverse of propensity score \[\hat{\mu}_{1} = \frac{ \sum_{i=1}^{N} \frac{Y_{i} D_{i}}{\hat{\pi}(X_{i})} }{ \sum_{i=1}^{N} \frac{D_{i}}{\hat{\pi}(X_{i})} } \text{ and } $\hat{\mu}_{0} = \frac{ \sum_{i=1}^{N} \frac{Y_{i} (1-D_{i})}{1-\hat{\pi}(X_{i})} }{ \sum_{i=1}^{N} \frac{1-D_{i}}{1-\hat{\pi}(X_{i})} }\]
Form “inverse-propensity weighted” ATE: \[\hat{\delta}^{IPW} = \hat{\mu}_{1} - \hat{\mu}_{0}\]
ps <- glm(D~X, family=binomial, data)
Or estimate in one step, \[Y_{i} = \delta D_{i} + \beta X_{i} + D_{i} \times \left(X_{i} - \bar{X}\right) \gamma + \varepsilon_{i}\]
Now let’s do some matching, re-weighting, and regression with simulated data:
nn.est1 <- Matching::Match(Y=select.dat$y,
Tr=select.dat$w,
X=select.dat$x,
M=1,
Weight=1,
estimand="ATE")
summary(nn.est1)
Estimate... 5.3168
AI SE...... 0.64953
T-stat..... 8.1857
p.val...... 2.2204e-16
Original number of observations.............. 5000
Original number of treated obs............... 1733
Matched number of observations............... 5000
Matched number of observations (unweighted). 5032
nn.est2 <- Matching::Match(Y=select.dat$y,
Tr=select.dat$w,
X=select.dat$x,
M=1,
Weight=2, #<<
estimand="ATE")
summary(nn.est2)
Estimate... 5.3168
AI SE...... 0.64953
T-stat..... 8.1857
p.val...... 2.2204e-16
Original number of observations.............. 5000
Original number of treated obs............... 1733
Matched number of observations............... 5000
Matched number of observations (unweighted). 5032
NN Matching
nn.est3 <- Matching::Match(Y=select.dat$y_alt,
Tr=select.dat$w_alt,
X=select.dat$x,
M=1,
Weight=2,
estimand="ATE")
summary(nn.est3)
Estimate... 7.6277
AI SE...... 0.051843
T-stat..... 147.13
p.val...... < 2.22e-16
Original number of observations.............. 5000
Original number of treated obs............... 2788
Matched number of observations............... 5000
Matched number of observations (unweighted). 22610