Find covariates \(X_{i}\) such that the following assumptions are plausible:
Selection on observables: \[Y_{0i}, Y_{1i} \perp\!\!\!\perp D_{i} | X_{i}\]
Common support: \[0 < \text{Pr}(D_{i}=1|X_{i}) < 1\]
Then we can use \(X_{i}\) to group observations and use expected values from control group as the predicted counterfactuals among treated, and vice versa.
Assumption 1: Selection on Observables
\(E[Y_{1}|D,X]=E[Y_{1}|X]\)
In words…nothing else, outside of \(X\), that determines treatment selection and affects your outcome of interest.
Assumption 2: Common Support
Someone of each type must be in both the treated and untreated groups
\[0 < \text{Pr}(D=1|X) <1\]
Causal inference with observational data
With selection on observables and common support:
Subclassification
Matching estimators
Reweighting estimators
Regression estimators
Subclassification
Sum the average treatment effects by group, and take a weighted average over those groups:
Let’s assume we are evaluating the effect of a treatment on an outcome variable (e.g., a job training program on income). We match on education level, considering the “High School Only” group. We will use all available matches within the stratum.
ID
D
Income
1
1
35,000
2
1
37,000
3
0
30,000
4
0
32,000
5
0
31,000
Step 1: Identify Matches
All treated individuals (IDs 1 and 2) are matched to all control individuals (IDs 3, 4, and 5).
The number of matches per treated unit: ( m = 3 ).
Step 2: Impute Counterfactuals
For treated individuals, estimate the counterfactual control outcome:
But are observations really the same in each group? Potential for “matching discrepancies” to introduce bias in estimates
“Bias correction” based on \[\hat{\mu}(x_{i}) - \hat{\mu}(x_{j(i)})\] (i.e., difference in fitted values from regression of \(y\) on \(x\), with the difference between observed \(Y_{1i}\) and imputed \(Y_{0i}\))
\(\hat{\mu}(x_{i})\) is the predicted outcome from a regression of \(Y\) on \(X\).
\(x_{i}\) is the covariate vector for a treated unit.
\(x_{j(i)}\) is the covariate vector for its matched control.
Estimate... 4.7816
AI SE...... 0.61462
T-stat..... 7.7798
p.val...... 7.3275e-15
Original number of observations.............. 5000
Original number of treated obs............... 1774
Matched number of observations............... 5000
Matched number of observations (unweighted). 5007
Estimate... 4.7816
AI SE...... 0.61462
T-stat..... 7.7798
p.val...... 7.3275e-15
Original number of observations.............. 5000
Original number of treated obs............... 1774
Matched number of observations............... 5000
Matched number of observations (unweighted). 5007
Estimate... 7.567
AI SE...... 0.053401
T-stat..... 141.7
p.val...... < 2.22e-16
Original number of observations.............. 5000
Original number of treated obs............... 2813
Matched number of observations............... 5000
Matched number of observations (unweighted). 22388