Module 0: Getting Started

class: center, middle, inverse, title-slide

.title[
# Module 0: Getting Started
]
.subtitle[
## Part 3: Basics of Data Management in R
]
.author[
### Ian McCarthy | Emory University
]
.date[
### Econ 470 & HLTH 470
]

---

class: inverse, center, middle
name: real_data

<style type="text/css">
.remark-slide-content {
    font-size: 30px;
    padding: 1em 2em 1em 2em;    
}
.remark-code, .remark-inline-code { 
    font-size: 20px;
}
</style>

# Real World

<html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html>

---
# Practice data versus the real world

<br>
.center[
  ![:scale 800px](https://media.giphy.com/media/3oz8xGme7vEndhrsly/giphy.gif)
]

---
# Advice 1: Be patient and careful in your coding

.center[
  ![](https://media.giphy.com/media/fsQJBs0rtxpYoN4m0J/giphy.gif)
]

---
# Advice 2: Comment, comment, comment

You don't want to end up like this guy...

.center[
  ![:scale 600px](https://media.giphy.com/media/S7u66urzxc2J2/giphy.gif)
]

---
class: inverse, center, middle
name: ma_data

# Medicare Advantage

<html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html>

---
# Medicare Advantage

Let's work with the [Medicare Advantage GitHub repository](https://github.com/imccart/Medicare-Advantage)

---
# Access the data

First step is to download the raw data that we'll be using:
 - [Monthly Enrollment](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County.html)
 - [Plan Characteristics](https://www.cms.gov/Medicare/Prescription-Drug-Coverage/PrescriptionDrugCovGenIn/index)
 - [Service Areas](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/MA-Contract-Service-Area-by-State-County.html)

--

Lots more out there, but this is enough for now.

---
# 1) Contract/enrollment info

```r
for (y in 2007:2015) {
  ## Basic contract/plan information
  ma.path=paste0(path.data.ma,"/monthly-ma-and-pdp-enrollment-by-cpsc/CPSC_Contract_Info_",y,"_01.csv")
  contract.info=read_csv(ma.path,
                         skip=1,
                         col_names = c("contractid","planid","org_type","plan_type",
                                       "partd","snp","eghp","org_name","org_marketing_name",
                                       "plan_name","parent_org","contract_date"),
                         col_types = cols(
                           contractid = col_character(),
                           planid = col_double(),
                             ...
                           ))
```

???
This imports the contract information (name, insurer, plan identifiers) into an object `contract.info` from the filepath in `ma.path`.

---
# 1) Contract/enrollment info

```r
    ## Clean the contract level data
  contract.info = contract.info %>%
    group_by(contractid, planid) %>%
    mutate(id_count=row_number())

contract.info = contract.info %>%
    group_by(contractid, planid) %>%
    mutate(id_count=row_number())
```

???
This reduces the data to unique contract and plan IDs. Note the `row_number()` to assign a running count for each pair.

---
# 1) Contract/enrollment info

```r
    ## Enrollments per plan
  enroll.info=read_csv(paste0("data/input/monthly-ma-and-pdp-enrollment-by-cpsc/CPSC_Enrollment_Info_",y,"_01.csv"),
                       skip=1,
                       col_names = c("contractid","planid","ssa","fips","state","county","enrollment"),
                       col_types = cols(
                       contractid = col_character(),
                       planid = col_double(),
                       ssa = col_double(),
                       fips = col_double(),
                       state = col_character(),
                       county = col_character(),
                       enrollment = col_double()
                       ),na="*")
```

???
This imports the enrollment data (count of enrollees) into an object `enroll.info` from the filepath `ma.path`

---
# 1) Contract/enrollment info

```r
    ## Merge contract info with enrollment info
  plan.data = contract.info %>%
    left_join(enroll.info, by=c("contractid", "planid")) %>%
    mutate(year=y)
```

???
This merges the contract details with enrollments. Note that this is for each year.

---
# 1) Contract/enrollment info

```r
  ## Fill in missing fips codes (by state and county)
  plan.data = plan.data %>%
    group_by(state, county) %>%
    fill(fips)

## Fill in missing plan characteristics by contract and plan id
  plan.data = plan.data %>%
    group_by(contractid, planid) %>%
    fill(plan_type, partd, snp, eghp, plan_name)
  
  ## Fill in missing contract characteristics by contractid
  plan.data = plan.data %>%
    group_by(contractid) %>%
    fill(org_type,org_name,org_marketing_name,parent_org)
```

???
Some cleanup by filling in missing values using "nearby" values

---
# 1) Contract/enrollment info

```r
  ## Collapse from monthly data to yearly
  plan.year = plan.data %>%
    group_by(contractid, planid, fips) %>%
    arrange(contractid, planid, fips) %>%
    rename(avg_enrollment=enrollment)
  
  write_rds(plan.year,paste0("data/output/ma_data_",y,".rds"))
```

???
Summarize monthly data to the yearly level, and save data object as `ma_data_` (year)

---
# 1) Contract/enrollment info

```r
full.ma.data <- read_rds("data/output/ma_data_2007.rds")
for (y in 2008:2015) {
  full.ma.data <- rbind(full.ma.data,read_rds(paste0("data/output/ma_data_",y,".rds")))
}
```

???

Appends yearly data into one large file, `full.ma.data`

---
# All together now

Now let's do this together...
1. Initialize repository in GitHub
2. Clone to local computer
3. Copy data from OneDrive
4. Follow some practice code, available [here](00-test-ma-code.R)