Module 4: Difference-in-Differences and Effects of Medicaid Expansion

class: center, middle, inverse, title-slide

.title[
# Module 4: Difference-in-Differences and Effects of Medicaid Expansion
]
.subtitle[
## Part 2: Basics of Fixed Effects and Panel Data
]
.author[
### Ian McCarthy | Emory University
]
.date[
### Econ 470 & HLTH 470
]

---

class: inverse, center, middle
name: panel

<style type="text/css">
.remark-slide-content {
    font-size: 30px;
    padding: 1em 2em 1em 2em;    
}
.remark-code {
  font-size: 15px;
}
.remark-inline-code { 
    font-size: 20px;
}
</style>

# Understanding Panel Data

---
# Nature of the Data

- Repeated observations of the same units over time (balanced vs unbalanced)
- Identification due to variation **within unit**

--
**Notation**
- Unit `$i=1,...,N$` over several periods `$t=1,...,T$`, which we denote `$y_{it}$`
- Treatment status `$D_{it}$`
- Regression model, <br>
`$y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}$` for `$t=1,...,T$` and `$i=1,...,N$`

---
# Benefits of Panel Data

- *May* overcome certain forms of omitted variable bias
- Allows for unobserved but time-invariant factor, `$\gamma_{i}$`, that affects both treatment and outcomes

--
**Still assumes**
- No time-varying confounders 
- Past outcomes do not directly affect current outcomes
- Past outcomes do not affect treatment (reverse causality)

---
# Some textbook settings

- Unobserved "ability" when studying schooling and wages
- Unobserved "quality" when studying physicians or hospitals

---
class: inverse, center, middle
name: panelreg

# Panel Data and Regression

---
# Fixed effects and regression

`$y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}$` for `$t=1,...,T$` and `$i=1,...,N$`

--
- Allows correlation between `$\gamma_{i}$` and `$D_{it}$`
- Physically estimate `$\gamma_{i}$` in some cases via set of dummy variables
- More generally, "remove" `$\gamma_{i}$` via:
  - "within" estimator
  - first-difference estimator
  
---
# Within Estimator
`$y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}$` for `$t=1,...,T$` and `$i=1,...,N$`

--
- Most common approach (default in most statistical software)
- Equivalent to demeaned model,<br>

`$$y_{it} - \bar{y}_{i} = \delta (D_{it} - \bar{D}_{i}) + (\gamma_{i} - \bar{\gamma}_{i}) + (\gamma_{t} - \bar{\gamma}_{t}) + (\epsilon_{it} - \bar{\epsilon}_{i})$$`

- `$\gamma_{i} - \bar{\gamma}_{i} = 0$` since `$\gamma_{i}$` is time-invariant
- Requires *strict exogeneity* assumption (error is uncorrelated with `$D_{it}$` for all time periods)

---
# First-difference
`$y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}$` for `$t=1,...,T$` and `$i=1,...,N$`

--
- Instead of subtracting the mean, subtract the prior period values<br>
`$y_{it} - y_{i,t-1} = \delta(D_{it} - D_{i,t-1}) + (\gamma_{i} - \gamma_{i}) + (\gamma_{t} - \gamma_{t-1}) + (\epsilon_{it} - \epsilon_{i,t-1})$`
- Requires exogeneity of `$\epsilon_{it}$` and `$D_{it}$` only for time `$t$` and `$t-1$` (weaker assumption than within estimator)
- Sometimes useful to estimate both FE and FD just as a check

---
# Keep in mind...

- Discussion only applies to linear case or very specific nonlinear models
- Fixed effects at lower "levels" accommodate fixed effects at higher levels (e.g., FEs for hospital combine to form FEs for zip code, etc.)
- Fixed effects can't solve reverse causality
- Fixed effects don't address unobserved, time-varying confounders
- Can't estimate effects on time-invariant variables
- May "absorb" a lot of the variation for variables that don't change much over time

---
class: inverse, center, middle
name: irl

# Panel Data and Fixed Effects IRL

---
# Within Estimator (Default) in practice

.pull-left[
**Stata**<br>

```stata
ssc install causaldata
causaldata gapminder.dta, use clear download
gen lgdp_pc=log(gdppercap)
tsset country year
xtreg lifeExp lgdp_pc, fe
```
]

.pull-right[
**R**<br>

```r
library(fixest)
library(causaldata)
reg.dat <- causaldata::gapminder %>%
  mutate(lgdp_pc=log(gdpPercap))
feols(lifeExp~lgdp_pc | country, data=reg.dat)
```
]

---
# Within Estimator (Default) in practice

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Default FE </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Log GDP per Capita </td>
   <td style="text-align:center;"> 9.769 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.702) </td>
  </tr>
</tbody>
</table>

---
# Within Estimator (Manually Demean) in practice

.pull-left[
**Stata**<br>

```stata
causaldata gapminder.dta, use clear download
gen lgdp_pc=log(gdppercap)
foreach x of varlist lifeExp lgdp_pc {
  egen mean_`x'=mean(`x')
  egen demean_`x'=`x'-mean_`x'
}
reg demean_lifeExp demean_lgdp_pc
```
]

.pull-right[
**R**<br>

```r
library(causaldata)
reg.dat <- causaldata::gapminder %>%
  mutate(lgdp_pc=log(gdpPercap)) %>%
  group_by(country) %>%
  mutate(demean_lifeexp=lifeExp - mean(lifeExp, na.rm=TRUE),
         demean_gdp=lgdp_pc - mean(lgdp_pc, na.rm=TRUE))
lm(demean_lifeexp~ 0 + demean_gdp, data=reg.dat)
```
]

---
# Within Estimator (Manually Demean) in practice
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Default FE </th>
   <th style="text-align:center;"> Manual FE </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Log GDP per Capita </td>
   <td style="text-align:center;"> 9.769 </td>
   <td style="text-align:center;"> 9.769 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.702) </td>
   <td style="text-align:center;"> (0.701) </td>
  </tr>
</tbody>
</table>

**Note:** `feols` defaults to clustering at level of FE, `lm` requires our input

---
# First differencing (default) in practice

.pull-left[
**Stata**<br>

```stata
causaldata gapminder.dta, use clear download
gen lgdp_pc=log(gdppercap)
reg d.lifeExp d.lgdp_pc, noconstant
```
]

.pull-right[
**R**<br>

```r
library(plm)
reg.dat <- causaldata::gapminder %>%
  mutate(lgdp_pc=log(gdpPercap))

plm(lifeExp ~ 0 + lgdp_pc, model="fd", individual="country", index=c("country","year"), data=reg.dat)
```
]

---
# First differencing (manual) in practice

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Default FE </th>
   <th style="text-align:center;"> Manual FE </th>
   <th style="text-align:center;"> Default FD </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Log GDP per Capita </td>
   <td style="text-align:center;"> 9.769 </td>
   <td style="text-align:center;"> 9.769 </td>
   <td style="text-align:center;"> 5.290 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.702) </td>
   <td style="text-align:center;"> (0.284) </td>
   <td style="text-align:center;"> (0.291) </td>
  </tr>
</tbody>
</table>

---
# First differencing (manual) in practice

.pull-left[
**Stata**<br>

```stata
causaldata gapminder.dta, use clear download
gen lgdp_pc=log(gdppercap)
reg d.lifeExp d.lgdp_pc, noconstant
```
]

.pull-right[
**R**<br>

```r
reg.dat <- causaldata::gapminder %>%
  mutate(lgdp_pc=log(gdpPercap)) %>%  
  group_by(country) %>%
  arrange(country, year) %>%
  mutate(fd_lifeexp=lifeExp - lag(lifeExp),
         lgdp_pc=lgdp_pc - lag(lgdp_pc)) %>%
  na.omit()

lm(fd_lifeexp~ 0 + lgdp_pc , data=reg.dat)
```
]

---
# First differencing (manual) in practice

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Default FE </th>
   <th style="text-align:center;"> Manual FE </th>
   <th style="text-align:center;"> Default FD </th>
   <th style="text-align:center;"> Manual FD </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Log GDP per Capita </td>
   <td style="text-align:center;"> 9.769 </td>
   <td style="text-align:center;"> 9.769 </td>
   <td style="text-align:center;"> 5.290 </td>
   <td style="text-align:center;"> 5.290 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.702) </td>
   <td style="text-align:center;"> (0.284) </td>
   <td style="text-align:center;"> (0.291) </td>
   <td style="text-align:center;"> (0.291) </td>
  </tr>
</tbody>
</table>

---
# FE and FD with same time period

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Default FE </th>
   <th style="text-align:center;"> Default FD </th>
   <th style="text-align:center;"> Manual FD </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Log GDP per Capita </td>
   <td style="text-align:center;"> 8.929 </td>
   <td style="text-align:center;"> 5.290 </td>
   <td style="text-align:center;"> 5.290 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.741) </td>
   <td style="text-align:center;"> (0.291) </td>
   <td style="text-align:center;"> (0.291) </td>
  </tr>
</tbody>
</table>

Don't want to read too much into this, but...
- Likely strong serial correlation in this case (almost certainly)
- Mispecified model