Difference prices to remove time-invariant hospital characteristics
Can then estimate a cross-sectional IV model where cumulative HRRP penalties are instrumented using pre-period Medicare volume
Sometimes referred to as IV with long-differenced outcomes
Why Price Differences (instead of levels)?
Assume a simple two-period model for hospital \(i\) in years \(t \in \{2011, 2014\}\): \[p_{it} = \beta\,\text{pen}_{i,t-2} + \alpha_i + u_{it},\] where \(\alpha_i\) is a time-invariant hospital fixed effect (e.g., hospital 1 is just different than hospital 2 in some time-invariant way).
Take the long difference (2014 minus 2011): \(\Delta p_i \equiv p_{i,2014}-p_{i,2011}.\)
Substitute the model into both terms and difference: \[\Delta p_i= \beta(\text{pen}_{i,2012}-\text{pen}_{i,2009}) + (\alpha_i-\alpha_i) + (u_{i,2014}-u_{i,2011}).\]
Because \(\alpha_i\) does not change over time, it cancels: \[(\alpha_i-\alpha_i)=0 \quad \Rightarrow \quad \Delta p_i = \beta\,\Delta \text{pen}_i + \Delta u_i.\]
Differencing removes any time-invariant hospital-level confounder captured by \(\alpha_i\).
Time-varying hospital shocks remain in \(\Delta u_i\).
Naive OLS Estimate
Before moving to IV, let’s look first at a simple OLS estimate, ignoring any potential endogeneity problems
First-stage coefficient is positive, with F-stat of nearly 290
IV Results
Now time to implement “official” IV estimator
Note…\(R^2\) no longer relevant measure of model fit for IV as the standard variance decomposition no longer holds
Note also…I’m not worried here about statistical inference (standard errors are likely wrong and would be larger under heteroskedasticity and smaller when considering a “richer” specification)
import pandas as pdimport statsmodels.formula.api as smf# --- OLS ---ols = smf.ols("price_change ~ hrrp_penalty", data=hcris_final).fit()# --- IV / 2SLS (linearmodels is the closest analogue to fixest IV) ---from linearmodels.iv import IV2SLSiv = IV2SLS.from_formula("price_change ~ 1 + [hrrp_penalty ~ avg_mcare]", data=hcris_final).fit(cov_type="robust")print(ols.summary())print(iv.summary)# --- modelsummary-like table (coef, SE, N only) ---out = pd.DataFrame( {"OLS": {"HRRP Penalty ($1000s)": ols.params["hrrp_penalty"],"Std. Error": ols.bse["hrrp_penalty"],"N": int(ols.nobs), },"IV": {"HRRP Penalty ($1000s)": iv.params["hrrp_penalty"],"Std. Error": iv.std_errors["hrrp_penalty"],"N": int(iv.nobs), }, })print(out)
OLS
IV
HRRP Penalty ($1000s)
4.562
119.026
(15.707)
(50.580)
Num.Obs.
2514
2514
Some other IV issues
IV estimators are biased. Performance in finite samples is questionable.
IV estimators provide an estimate of a Local Average Treatment Effect (LATE), which is only the same as the ATT under strong conditions or assumptions.
What about lots of instruments? The finite sample problem is more important and we may try other things (JIVE).
The National Bureau of Economic Researh (NBER) has a great resource here for understanding instruments in practice.
Quick IV Review
When do we consider IV as a potential identification strategy?
What are the main IV assumptions (and what do they mean)?