Instrumental Variables: Part II

Ian McCarthy | Emory University

Naive estimate

Clearly a strong relationship between prices and sales. For example, just from OLS:

R Code
cig.data <- read_rds(here("data/TaxBurden_Data.rds"))
cig.data <- cig.data %>% mutate(ln_sales=log(sales_per_capita),
                                ln_price_cpi=log(price_cpi),
                                ln_price=log(cost_per_pack),
                                tax_cpi=tax_state*(218/index),
                                total_tax_cpi=tax_dollar*(218/index),
                                ln_total_tax=log(total_tax_cpi),                             
                                ln_state_tax=log(tax_cpi))
ols <- lm(ln_sales ~ ln_price, data=cig.data)
summary(ols)

Call:
lm(formula = ln_sales ~ ln_price, data = cig.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.23899 -0.17057  0.02239  0.18605  1.13866 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.689838   0.007209  650.55   <2e-16 ***
ln_price    -0.420307   0.006464  -65.02   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3073 on 2497 degrees of freedom
Multiple R-squared:  0.6287,    Adjusted R-squared:  0.6285 
F-statistic:  4228 on 1 and 2497 DF,  p-value: < 2.2e-16

Is this causal?

  • But is that the true demand curve?
  • Aren’t other things changing that tend to reduce cigarette sales?

Tax as an IV

R Code
cig.data %>% 
  ggplot(aes(x=Year,y=total_tax_cpi)) + 
  stat_summary(fun="mean",geom="line") +
  labs(
    x="Year",
    y="Tax per Pack ($)",
    title="Cigarette Taxes in 2010 Real Dollars"
  ) + theme_bw() +
  scale_x_continuous(breaks=seq(1970, 2020, 5))

IV Results

R Code
ivs <- feols(ln_sales ~ 1 | ln_price ~ ln_total_tax, 
             data=cig.data)
summary(ivs)
TSLS estimation, Dep. Var.: ln_sales, Endo.: ln_price, Instr.: ln_total_tax
Second stage: Dep. Var.: ln_sales
Observations: 2,499 
Standard-errors: IID 
              Estimate Std. Error  t value  Pr(>|t|)    
(Intercept)   4.802865   0.009589 500.8559 < 2.2e-16 ***
fit_ln_price -0.614292   0.010928 -56.2132 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.358298   Adj. R2: 0.494569
F-test (1st stage), ln_price: stat = 2,269.1, p < 2.2e-16, on 1 and 2,497 DoF.
                  Wu-Hausman: stat = 1,216.9, p < 2.2e-16, on 1 and 2,496 DoF.

Two-stage equivalence

R Code
step1 <- lm(ln_price ~ ln_total_tax, data=cig.data)
pricehat <- predict(step1)
step2 <- lm(ln_sales ~ pricehat, data=cig.data)
summary(step2)

Call:
lm(formula = ln_sales ~ pricehat, data = cig.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.00723 -0.18338  0.00263  0.18885  1.24648 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.802865   0.008102  592.82   <2e-16 ***
pricehat    -0.614292   0.009233  -66.53   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3028 on 2497 degrees of freedom
Multiple R-squared:  0.6394,    Adjusted R-squared:  0.6392 
F-statistic:  4427 on 1 and 2497 DF,  p-value: < 2.2e-16

Different specifications

R Code
ols1 <- lm(ln_sales ~ ln_price_cpi, data=cig.data)
ols2 <- lm(ln_sales ~ ln_price_cpi + factor(state), data=cig.data)
ols3 <- lm(ln_sales ~ ln_price_cpi + factor(state) + factor(Year), data=cig.data)

ivs1 <- feols(ln_sales ~ 1 | ln_price_cpi ~ ln_total_tax, data=cig.data)
ivs2 <- feols(ln_sales ~ 1 | state | ln_price_cpi ~ ln_total_tax, data=cig.data)
ivs3 <- feols(ln_sales ~ 1 | state + Year | ln_price_cpi ~ ln_total_tax, data=cig.data)

rows <- tribble(~term, ~ m1, ~ m2, ~ m3 , ~ m4, ~ m5, ~ m6 ,
                'State FE', "No", "Yes", "Yes", "No", "Yes", "Yes",
                'Year FE', "No", "No", "Yes", "No", "No", "Yes")
attr(rows, 'position')  <- c(3,4)

modelsummary(list(ols1, ols2, ols3, ivs1, ivs2, ivs3),
          keep=c("ln_price_cpi"),
          coef_map=c("ln_price_cpi"="Log Real Price", 
                    "fit_ln_price_cpi"="Log Real Price"),
          gof_map=c("nobs", "r.squared"),
          add_rows=rows) %>%
          add_header_above(c("","OLS"=3,"IV"=3))
OLS
IV
 (1)   (2)   (3)   (4)   (5)   (6)
Log Real Price −0.953 −0.921 −1.213 −1.043 −0.993 −1.406
(0.012) (0.008) (0.034) (0.014) (0.038) (0.167)
State FE No Yes Yes No Yes Yes
Year FE No No Yes No No Yes
Num.Obs. 2499 2499 2499 2499 2499 2499
R2 0.723 0.887 0.925 0.717 0.883 0.924

Test the IV

R Code
first1 <- feols(ln_price_cpi ~ ln_total_tax, data=cig.data)
first2 <- feols(ln_price_cpi ~ ln_total_tax | state, data=cig.data)
first3 <- feols(ln_price_cpi ~ ln_total_tax | state + Year, data=cig.data)

rf1 <- feols(ln_sales ~ ln_total_tax, data=cig.data)
rf2 <- feols(ln_sales ~ ln_total_tax | state, data=cig.data)
rf3 <- feols(ln_sales ~ ln_total_tax | state + Year, data=cig.data)

panels <- list(
  "First Stage: Price ~ Tax" = list(first1, first2, first3),
  "Reduced Form: Quantity ~ Tax" = list(rf1, rf2, rf3)
)

rows <- tribble(~term, ~ m1, ~ m2, ~ m3 ,
                'State FE', "No", "Yes", "Yes",
                'Year FE', "No", "No", "Yes")

modelsummary(panels,
          keep=c("ln_total_tax"),
          shape="rbind",
          coef_map=c("ln_total_tax"="Log Total Real Tax"),
          gof_map=c("nobs", "r.squared"),
          add_rows=rows)
 (1)  (2)  (3)
First Stage: Price ~ Tax
Log Total Real Tax 0.663 0.735 0.372
(0.008) (0.010) (0.014)
Num.Obs. 2499 2499 2499
R2 0.738 0.776 0.992
Reduced Form: Quantity ~ Tax
Log Total Real Tax -0.692 -0.730 -0.523
(0.010) (0.030) (0.064)
Num.Obs. 2499 2499 2499
R2 0.639 0.814 0.925
State FE No Yes Yes
Year FE No No Yes

Summary

  1. Most elasticities of around -1
  2. Larger elasticities when including year fixed effects
  3. Perhaps not too outlandish given more recent evidence: NBER Working Paper.

Some other IV issues

  1. IV estimators are biased. Performance in finite samples is questionable.
  2. IV estimators provide an estimate of a Local Average Treatment Effect (LATE), which is only the same as the ATT under strong conditions or assumptions.
  3. What about lots of instruments? The finite sample problem is more important and we may try other things (JIVE).

The National Bureau of Economic Researh (NBER) has a great resource here for understanding instruments in practice.

Quick IV Review

  1. When do we consider IV as a potential identification strategy?
  2. What are the main IV assumptions (and what do they mean)?
  3. How do we test for those assumptions?