Inference: Typically want to cluster at unit-level to allow for correlation over time within units, but problems with small numbers of treated or control groups:
Conley-Taber CIs
Wild cluster bootstrap
Randomization inference
“Extra” things like propensity score weighting and doubly robust estimation
Two-way fixed effects (TWFE)
DD and TWFE?
Just a shorthand for a common regression specification
Fixed effects for each unit and each time period, \(\gamma_{i}\) and \(\gamma_{t}\)
TWFE and 2x2 DD identical with homogeneous effects and common treatment timing
Otherwise…TWFE is biased and inconsistent for ATT
Consider standard TWFE specification with a single treatment coefficient, \[y_{it} = \alpha + \delta D_{it} + \gamma_{i} + \gamma_{t} + \varepsilon_{it}.\] We can decompose \(\hat{\delta}\) into three things:
\[\hat{\delta}_{twfe} = \text{VW} ATT + \text{VW} PT - \Delta ATT\]
A variance-weighted ATT
Violation of parallel trends
Heterogeneous effects over time
Intuition
Problems come from heterogeneous effects and staggered treatment timing
OLS is a weighted average of all 2x2 DD groups
Weights are function of size of subsamples, size of treatment/control units, and timing of treatment
Units treated in middle of sample receive larger weights
Best case: Variance-weighted ATT
Prior-treated units act as controls for late-treated units, so differential timing alone can introduce bias
Heterogeneity and differential timing introduces “contamination” via negative weights assigned to some underlying 2x2 DDs
Does it really matter?
Definitely! But how much?
Large treatment effects for early treated units could reverse the sign of final estimate
Several “modern” approaches to address these issues
New DD: Callaway & Sant’Anna (Group-Time ATT)
Estimates group-by-time ATT, then aggregates into overall effects
Handles staggered adoption with flexible aggregation and clear identification targets
Separates identification from weighting choices explicitly
New DD: Sun & Abraham (Interaction-Weighted Event Study)
Corrects TWFE event-study bias under staggered adoption
Uses cohort-by-time interactions and aggregates appropriately
Clean interpretation for dynamic effects relative to treatment timing
New DD: Borusyak, Jaravel, Spiess (Imputation Estimator)
Imputes untreated potential outcomes using pre-treatment trends
Estimates treatment effects by comparing observed outcomes to imputed counterfactuals
Transparent decomposition and robust to staggered timing
New DD: Gardner (Two-Stage DID)
Stage 1 removes unit and time effects using untreated data
Stage 2 estimates treatment effects on residualized outcomes
Useful when treatment timing is staggered and effects are heterogeneous