Slides

Examples

Example 1: creating a new project

In this exercise we will create a new project that will be used throughout the rest of this course. I highly encourage you to follow along with me.

  1. Use projectr::proj_start() to initiate a new project called 20240722_sismid_repro. Set up a directory in a separate location using the argument data_dir. I typically store my data on OneDrive, but the idea is to put it in a secure location.

Run the following code in R


# Setting up a folder WITH a symbolic link to the data subfolder
projectr::proj_start(proj_dir = "~/icloud/Documents/projects/2024/20240722_sismid_repro", 
                     data_dir = "~/onedrive/Data/2024/20240722_sismid_repro") 

Next, open the new project by double clicking the .Rproj icon. This will open up a new Rstudio session with the working director set to that project folder. It will also set the working directory of the terminal tab to that folder.

  1. Download the data download script and save it in the source folder of your new project directory as 01_data_download.R.

It would be fine to copy/paste the code for 01_data_download.R and saving it via point and click, but you can also do this using tools from the command line module by entering the following into the terminal:

echo '# install.packages("RSocrata")
library("RSocrata")
library(tidyverse)

# download longitudinal Covid WW concentration data from API
covid <- read.socrata(
  "https://data.cdc.gov/resource/g653-rqe2.json",
  app_token = Sys.getenv("TOKEN"),
  email     = Sys.getenv("EMAIL"),
  password  = Sys.getenv("PASSWORD")
) %>%
  mutate(date_downloaded = Sys.Date())


# download cross-sectional Covid WW concentration data from API, which will be used to get county names
counties <- read.socrata(
  "https://data.cdc.gov/resource/2ew6-ywp6.json",
  app_token = Sys.getenv("TOKEN"),
  email     = Sys.getenv("EMAIL"),
  password  = Sys.getenv("PASSWORD")
)


## save intermediate data object and data data was accessed
save(covid, counties, file = here::here("data", "raw.Rdata"))' > source/01_data_download.R

Check that you have saved this script in the source folder using the terminal tab:

ls source
  1. Make an Rmarkdown document that knits to html called final_report.Rmd and put it in the analysis folder of your project directory.

We will come back to the final_report.Rmd document later in the lecture.

Example 2: adding .Renviron variables

  1. Add a .Renviron file to the root directory of your 20240722_sismid_repro project folder.

Open the terminal. I will use the terminal tab of the 20240722_sismid_repro project. First, check that your working directory is 20240722_sismid_repro.

pwd

If you are not in the correct directory use the cd command to move into ~\20240722_sismid_repro. Next, create and open the .Renviron file.

touch .Renviron
open .Renviron
  1. Edit the .Renviron file by defining variables called TOKEN, EMAIL, an PASSWORD that contain your app token, and email and password used to obtain your app token.

Your edited .Renviron file should look something like this:

PASSWORD="password333"
EMAIL="julia.wrobel@emory.edu"
TOKEN="K1MVpmwDDeefZ0vPuYk2wRN"

Close your R session and then reopen it by double-clicking 20240722_sismid_repro. Your environment variables defined through .Renviron should now be defined.

You can check by typing the following into the R console:

Sys.getenv("EMAIL")
## [1] "julia.wrobel@emory.edu"

This should print out your email address.

  1. Add the .Renviron file to your .gitignore file.

This is not necessary for accessing the environment variables you just defined, but will be important for later in the course when we start using git and GitHub. You do not want to accidentally put your .Renviron on GitHub, and by adding the file to .gitignore we can avoid this mistake.

To do this, go to your terminal and execute the following two lines of code:

echo "" >> .gitignore # ensures you are writing on a new line
echo ".Renviron" >> .gitignore

Alternatively, you can just open the .gitignore file and add .Renviron on a new line. Sometimes I find this to be simpler.

  1. Check that the code in your 01_data_download.R file will run.

Example 3

  1. Using the projectr template, make an Rmarkdown document called exploratory_analysis.Rmd and put it in the analysis folder of your project directory. Load and explore the data. Take notes on what you learn. Add in brief descriptions of the key variables.

Might be helpful to reference the page about the data.

  1. Download the data cleaning script and save it in the source folder of your new project directory as 02_data_cleaning.R.
echo '# set up a variable to define which state you want to analyze
state = "ga"

# download raw data
source(here::here("source", "01_data_download.R"))

# grab only observations from the specified state
covid = covid %>%
  filter(grepl(state, key_plot_id))


# only include columns from counties dataset we are interested in
counties = counties %>%
  filter(grepl(state, key_plot_id)) %>%
  select(key_plot_id, wwtp_id, county = county_names,
         county_fips, population_served) %>%
  distinct()

# merge covid data with the county label information
# convert  variables from character to numeric
# concentration variable more intuitive name
covid = left_join(covid, counties, by = "key_plot_id") %>%
  select(-key_plot_id) %>%
  mutate(pcr_conc_lin = as.numeric(pcr_conc_lin),
         population_served = as.numeric(population_served)) %>%
  rename(concentration = pcr_conc_lin)


## save intermediate data object and data data was accessed
save(covid, file = here::here("data", "clean.Rdata"))
' > source/02_data_cleaning.R
  1. Download the data analysis script and save it in the source folder of your new project directory as 03_data_analysis.R.

  2. Download the data visualization script and save it in the source folder of your new project directory as 04_data_visualization.R.

  3. Open your final_report.Rmd document and source each of the scripts. Add comments to explain the document.

Breakout exercises

Exercise 1

In your breakout group, turn your final_report.Rmd document into a parameterized report with the following parameters:

  • a state parameter that reads in a character vector that is the abbreviation of a US state (e.g. “ga”)
  • a parameter called download_raw_data that takes in logical values (TRUE/FALSE). If set to TRUE, the raw data will be downloaded from the CDC API. If set to FALSE, the other steps of the analysis will proceed without downloadeding the raw data.

The YAML header will look like this:

---
title: "Analysis of SARS-CoV-2 concentration in wastewater"
author: "Julia Wrobel"
date: "2024-07-24"
output:
  html_document:
    code_folding: hide
    toc: true
    toc_float: true
params:
  state: "ga"
  download_raw_data: FALSE
hitheme: tomorrow
highlighter: highlight.js
---

The default state is Georgia.

To be able to filter by state, we will need to add a line of code to the file 02_data_cleaning.R so that it can access the value of param$state. Add the following line of code to the beginning of the 02_data_cleaning.R file.

state = params$state

We add the param download_raw_data with the following code to allow the user to toggle whether or not to download the raw data.

if(params$download_raw_data){
  source(here("source", "01_data_download.R")) 
}else if(!file.exists(here::here("data", "raw.Rdata"))){
  source(here("source", "01_data_download.R")) 
}

source(here("source", "02_data_cleaning.R"))  

Finally, we can run the parameterized report by knitting the document, or by running the following code in the R console:

rmarkdown::render(here::here("analysis","final_report.Rmd"), params = list(
  state = "nj",
  download_raw_data = TRUE
))

A copy of the entire 20240722_sismid_repro folder with parameterized final report is available for download here.

Exercise 2

Write a for loop in R that will run your final_report.Rmd for all US states and output a different html document for each state. The loop should store the documents in your “results” folder.

The function is below:

states = c("ga", "co", "ny", "pa", "nj")
for(s in states){
  filename = paste0("Report-", s, ".html")
  
  rmarkdown::render(
    here::here("analysis","final_report.Rmd"), 
    params = list(
      state = s,
      download_raw_data = FALSE
    ),
    output_file = here::here("results", filename)
  )
}

Exercise 3

  1. Use projectr::proj_start() to initiate a new project called 20240722_YOURNAME_sismid. Use the argument data_dir = NULL. Where is your data being stored?
  2. Using the terminal, set up a symbolic link for where you will store data for this course that points to 20240722_YOURNAME_sismid/data.

Set up the project directory structure.

## set up a new project
# devtools::install_github("julia-wrobel/projectr")
library(projectr)


# Setting up a folder without a symbolic link to the data subfolder
projectr::proj_start(proj_dir = "~/icloud/Documents/projects/2024/202407_wrobel_sismid", 
                     data_dir = NULL) 

Set a symbolic link by opening a Terminal window and executing code like the following. Note that I am storing my data in a folder on OneDrive and you may want to store yours elsewhere.

ln -s "~/onedrive/Data/2024/202407_wrobel_sismid"  "~/projects/2024/202407_wrobel_sismid/data" 

Additional readings