Instrumental Variables: Part II

Naive estimate

Clearly a strong relationship between prices and sales. For example, just from OLS:

R Code

cig.data <- read_rds(here("data/TaxBurden_Data.rds"))
cig.data <- cig.data %>% mutate(ln_sales=log(sales_per_capita),
                                ln_price_cpi=log(price_cpi),
                                ln_price=log(cost_per_pack),
                                tax_cpi=tax_state*(218/index),
                                total_tax_cpi=tax_dollar*(218/index),
                                ln_total_tax=log(total_tax_cpi),                             
                                ln_state_tax=log(tax_cpi))
ols <- lm(ln_sales ~ ln_price, data=cig.data)
summary(ols)


Call:
lm(formula = ln_sales ~ ln_price, data = cig.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.23899 -0.17057  0.02239  0.18605  1.13866 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.689838   0.007209  650.55   <2e-16 ***
ln_price    -0.420307   0.006464  -65.02   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3073 on 2497 degrees of freedom
Multiple R-squared:  0.6287,    Adjusted R-squared:  0.6285 
F-statistic:  4228 on 1 and 2497 DF,  p-value: < 2.2e-16

Is this causal?

But is that the true demand curve?
Aren’t other things changing that tend to reduce cigarette sales?

Tax as an IV

R Code

cig.data %>% 
  ggplot(aes(x=Year,y=total_tax_cpi)) + 
  stat_summary(fun="mean",geom="line") +
  labs(
    x="Year",
    y="Tax per Pack ($)",
    title="Cigarette Taxes in 2010 Real Dollars"
  ) + theme_bw() +
  scale_x_continuous(breaks=seq(1970, 2020, 5))

IV Results

R Code

ivs <- feols(ln_sales ~ 1 | ln_price ~ ln_total_tax, 
             data=cig.data)
summary(ivs)

TSLS estimation - Dep. Var.: ln_sales
                  Endo.    : ln_price
                  Instr.   : ln_total_tax
Second stage: Dep. Var.: ln_sales
Observations: 2,499
Standard-errors: IID 
              Estimate Std. Error  t value  Pr(>|t|)    
(Intercept)   4.802865   0.009589 500.8559 < 2.2e-16 ***
fit_ln_price -0.614292   0.010928 -56.2132 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.358298   Adj. R2: 0.494569
F-test (1st stage), ln_price: stat = 2,269.1, p < 2.2e-16, on 1 and 2,497 DoF.
                  Wu-Hausman: stat = 1,216.9, p < 2.2e-16, on 1 and 2,496 DoF.

Two-stage equivalence

R Code

step1 <- lm(ln_price ~ ln_total_tax, data=cig.data)
pricehat <- predict(step1)
step2 <- lm(ln_sales ~ pricehat, data=cig.data)
summary(step2)


Call:
lm(formula = ln_sales ~ pricehat, data = cig.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.00723 -0.18338  0.00263  0.18885  1.24648 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.802865   0.008102  592.82   <2e-16 ***
pricehat    -0.614292   0.009233  -66.53   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3028 on 2497 degrees of freedom
Multiple R-squared:  0.6394,    Adjusted R-squared:  0.6392 
F-statistic:  4427 on 1 and 2497 DF,  p-value: < 2.2e-16

Different specifications

R Code

ols1 <- lm(ln_sales ~ ln_price_cpi, data=cig.data)
ols2 <- lm(ln_sales ~ ln_price_cpi + factor(state), data=cig.data)
ols3 <- lm(ln_sales ~ ln_price_cpi + factor(state) + factor(Year), data=cig.data)

ivs1 <- feols(ln_sales ~ 1 | ln_price_cpi ~ ln_total_tax, data=cig.data)
ivs2 <- feols(ln_sales ~ 1 | state | ln_price_cpi ~ ln_total_tax, data=cig.data)
ivs3 <- feols(ln_sales ~ 1 | state + Year | ln_price_cpi ~ ln_total_tax, data=cig.data)

rows <- tribble(~term, ~ m1, ~ m2, ~ m3 , ~ m4, ~ m5, ~ m6 ,
                'State FE', "No", "Yes", "Yes", "No", "Yes", "Yes",
                'Year FE', "No", "No", "Yes", "No", "No", "Yes")
attr(rows, 'position')  <- c(3,4)

modelsummary(list(ols1, ols2, ols3, ivs1, ivs2, ivs3),
          keep=c("ln_price_cpi"),
          coef_map=c("ln_price_cpi"="Log Real Price", 
                    "fit_ln_price_cpi"="Log Real Price"),
          gof_map=c("nobs", "r.squared"),
          add_rows=rows) %>%
          group_tt(j=list(" "=1, "OLS"=2:4, "IV"=5:7))

	OLS			IV
	(1)	(2)	(3)	(4)	(5)	(6)
Log Real Price	-0.953	-0.921	-1.213	-1.043	-0.993	-1.406
	(0.012)	(0.008)	(0.034)	(0.014)	(0.038)	(0.167)
State FE	No	Yes	Yes	No	Yes	Yes
Year FE	No	No	Yes	No	No	Yes
Num.Obs.	2499	2499	2499	2499	2499	2499
R2	0.723	0.887	0.925	0.717	0.883	0.924

Test the IV

R Code

first1 <- feols(ln_price_cpi ~ ln_total_tax, data=cig.data)
first2 <- feols(ln_price_cpi ~ ln_total_tax | state, data=cig.data)
first3 <- feols(ln_price_cpi ~ ln_total_tax | state + Year, data=cig.data)

rf1 <- feols(ln_sales ~ ln_total_tax, data=cig.data)
rf2 <- feols(ln_sales ~ ln_total_tax | state, data=cig.data)
rf3 <- feols(ln_sales ~ ln_total_tax | state + Year, data=cig.data)

panels <- list(
  "First Stage: Price ~ Tax" = list(first1, first2, first3),
  "Reduced Form: Quantity ~ Tax" = list(rf1, rf2, rf3)
)

rows <- tribble(~term, ~ m1, ~ m2, ~ m3 ,
                'State FE', "No", "Yes", "Yes",
                'Year FE', "No", "No", "Yes")

modelsummary(panels,
          keep=c("ln_total_tax"),
          shape="rbind",
          coef_map=c("ln_total_tax"="Log Total Real Tax"),
          gof_map=c("nobs", "r.squared"),
          add_rows=rows)

	(1)	(2)	(3)
Log Total Real Tax	0.663	0.735	0.372
	(0.008)	(0.010)	(0.014)
Num.Obs.	2499	2499	2499
R2	0.738	0.776	0.992
Log Total Real Tax	-0.692	-0.730	-0.523
	(0.010)	(0.030)	(0.064)
Num.Obs.	2499	2499	2499
R2	0.639	0.814	0.925
State FE	No	Yes	Yes
Year FE	No	No	Yes

Summary

Most elasticities of around -1
Larger elasticities when including year fixed effects
Perhaps not too outlandish given more recent evidence: NBER Working Paper.

Some other IV issues

IV estimators are biased. Performance in finite samples is questionable.
IV estimators provide an estimate of a Local Average Treatment Effect (LATE), which is only the same as the ATT under strong conditions or assumptions.
What about lots of instruments? The finite sample problem is more important and we may try other things (JIVE).

The National Bureau of Economic Researh (NBER) has a great resource here for understanding instruments in practice.

Quick IV Review

When do we consider IV as a potential identification strategy?
What are the main IV assumptions (and what do they mean)?
How do we test those assumptions?