class: center, middle, inverse, title-slide .title[ # Module 0: Getting Started ] .subtitle[ ## Part 3: Basics of Data Management in R ] .author[ ### Ian McCarthy | Emory University ] .date[ ### Econ 470 & HLTH 470 ] --- class: inverse, center, middle name: real_data <!-- Adjust some CSS code for font size, maintain R code font size --> <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 2em 1em 2em; } .remark-code, .remark-inline-code { font-size: 20px; } </style> <!-- Set R options for how code chunks are displayed and load packages --> # Real World <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Practice data versus the real world <br> .center[ ![:scale 800px](https://media.giphy.com/media/3oz8xGme7vEndhrsly/giphy.gif) ] --- # Advice 1: Be patient and careful in your coding .center[ ![](https://media.giphy.com/media/fsQJBs0rtxpYoN4m0J/giphy.gif) ] --- # Advice 2: Comment, comment, comment You don't want to end up like this guy... .center[ ![:scale 600px](https://media.giphy.com/media/S7u66urzxc2J2/giphy.gif) ] <!-- New Section --> --- class: inverse, center, middle name: ma_data # Medicare Advantage <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Medicare Advantage Let's work with the [Medicare Advantage GitHub repository](https://github.com/imccart/Medicare-Advantage) --- # Access the data First step is to download the raw data that we'll be using: - [Monthly Enrollment](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County.html) - [Plan Characteristics](https://www.cms.gov/Medicare/Prescription-Drug-Coverage/PrescriptionDrugCovGenIn/index) - [Service Areas](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/MA-Contract-Service-Area-by-State-County.html) -- Lots more out there, but this is enough for now. --- # 1) Contract/enrollment info ```r for (y in 2007:2015) { ## Basic contract/plan information ma.path=paste0(path.data.ma,"/monthly-ma-and-pdp-enrollment-by-cpsc/CPSC_Contract_Info_",y,"_01.csv") contract.info=read_csv(ma.path, skip=1, col_names = c("contractid","planid","org_type","plan_type", "partd","snp","eghp","org_name","org_marketing_name", "plan_name","parent_org","contract_date"), col_types = cols( contractid = col_character(), planid = col_double(), ... )) ``` ??? This imports the contract information (name, insurer, plan identifiers) into an object `contract.info` from the filepath in `ma.path`. --- # 1) Contract/enrollment info ```r ## Clean the contract level data contract.info = contract.info %>% group_by(contractid, planid) %>% mutate(id_count=row_number()) contract.info = contract.info %>% group_by(contractid, planid) %>% mutate(id_count=row_number()) ``` ??? This reduces the data to unique contract and plan IDs. Note the `row_number()` to assign a running count for each pair. --- # 1) Contract/enrollment info ```r ## Enrollments per plan enroll.info=read_csv(paste0("data/input/monthly-ma-and-pdp-enrollment-by-cpsc/CPSC_Enrollment_Info_",y,"_01.csv"), skip=1, col_names = c("contractid","planid","ssa","fips","state","county","enrollment"), col_types = cols( contractid = col_character(), planid = col_double(), ssa = col_double(), fips = col_double(), state = col_character(), county = col_character(), enrollment = col_double() ),na="*") ``` ??? This imports the enrollment data (count of enrollees) into an object `enroll.info` from the filepath `ma.path` --- # 1) Contract/enrollment info ```r ## Merge contract info with enrollment info plan.data = contract.info %>% left_join(enroll.info, by=c("contractid", "planid")) %>% mutate(year=y) ``` ??? This merges the contract details with enrollments. Note that this is for each year. --- # 1) Contract/enrollment info ```r ## Fill in missing fips codes (by state and county) plan.data = plan.data %>% group_by(state, county) %>% fill(fips) ## Fill in missing plan characteristics by contract and plan id plan.data = plan.data %>% group_by(contractid, planid) %>% fill(plan_type, partd, snp, eghp, plan_name) ## Fill in missing contract characteristics by contractid plan.data = plan.data %>% group_by(contractid) %>% fill(org_type,org_name,org_marketing_name,parent_org) ``` ??? Some cleanup by filling in missing values using "nearby" values --- # 1) Contract/enrollment info ```r ## Collapse from monthly data to yearly plan.year = plan.data %>% group_by(contractid, planid, fips) %>% arrange(contractid, planid, fips) %>% rename(avg_enrollment=enrollment) write_rds(plan.year,paste0("data/output/ma_data_",y,".rds")) ``` ??? Summarize monthly data to the yearly level, and save data object as `ma_data_` (year) --- # 1) Contract/enrollment info ```r full.ma.data <- read_rds("data/output/ma_data_2007.rds") for (y in 2008:2015) { full.ma.data <- rbind(full.ma.data,read_rds(paste0("data/output/ma_data_",y,".rds"))) } ``` ??? Appends yearly data into one large file, `full.ma.data` --- # All together now Now let's do this together... 1. Initialize repository in GitHub 2. Clone to local computer 3. Copy data from OneDrive 4. Follow some practice code, available [here](00-test-ma-code.R)