Part 1: Introduction to R/Rstudio/Vectors/For Loops

Materials from class on Wednesday, January 5, 2022

Install R/Rstudio

Before class, please install R and Rstudio. If it has been a while since you installed R, please re-install R to update to the most recent version (warning: you may lose all your installed packages and will have to reinstall them).

Installation instructions can be found here.

Please also download the “part1” folder in this course materials link. Unzip the folder if needed. Open the Rstudio project by double clicking on the .Rproj file (“Rstudio project file”). Run the 00-install-packages.R script to install necessary packages. A video on how to do this can be found here.

Class files

R Project files

Before each class, I will update this folder link with the appropriate “part” folder. Please download the part1 folder. Unzip this folder and open in Rstudio by double clicking on the .Rproj file. This folder will have the files for this part and the assignment.

Slides

Open the class introduction slides in a separate window: https://sph-r-programming-2022.netlify.app/01-introduction_slides#1

We will also cover a few of these slides on for loops: https://sph-r-programming-2022.netlify.app/slides/02-for_loops#1

Class Video

View last year’s class here.

Post-Class

Please fill out the following survey and we will discuss the results during the next lecture. All responses will be anonymous.

  • Clearest Point: What was the most clear part of the lecture?
  • Muddiest Point: What was the most unclear part of the lecture to you?
  • Anything Else: Is there something you’d like me to know?

https://forms.gle/4tVx1mL7SzQx7MCu5

Pacing

I won’t always talk about the pacing feedback but in the beginning I think it’s useful: Mean 2.94, median 3. So on average the pace seemed to be on target, most people chose 3, though I was trying to balance going slowly at first and also adding in a couple challenging topics. So depending on your experience with R this might have gone ok or not. Please ask questions in the chat (or unmute to ask) during class for anything that’s unclear!

Muddiest Points

Remember, all of this is anonymous.

loops - but i know we are going over it more later

For Loop, however sitting down with the .Rmd tonight helped a lot.

The for() loops. Mostly in regards to how to apply it to numerical data.

Yes! Most comments were on loops. With 30 minutes in class I weighed the option of just ending early and starting this when we were fresh, or with doing a quick intro and finishing more in depth later (and maybe chose the wrong option ;)). Don’t worry if you’re still confused because I didn’t explain things very thoroughly! Class 2 we will go over it all again.

Loops- not technically, more why we would want to do this in reality

Loops in the very simplistic way we learned them are less useful in practice due to the vectorization of most functions in R, but we will later see how they can be useful with data operations. Also understanding them gives us the building block knowledge to understand more complex iteration tasks we will cover in the future (purrr::map() for example, possibly lapply() and apply() if we have time).

loops without indexing - Does R always end the loop once the end of the vector referenced is reached? Are there ever situations where R gets stuck in a continuous loop?

Great question, I remember learning about for loops in other programming classes and dealing with situations where we got stuck in an infinite loop but I honestly can’t think of an example where I’ve had this happen in real life with R. This can definitely happen, but in real data scenarios I don’t think it’s common because you usually have a finite vector or data frame. More often, R will exit the for loop early due to some error. I will try to show an example of this in class. (I imagine a while() loop could get stuck, but I never ever use that function.) If you ever get stuck with something happening in R, you can press ESC over and over or click top bar “Session” -> “Terminate R” or “Restart R”.

Only one point I was still a bit confused about was for loop, when we make an empty vector to save our work, such as, the example we used in class, all_my_pets, why do we need to add the new variable mypet to the end of the vector all_my_pets instead of only add mypet to all_my_pets by “all_my_pets <- c(mypet)”?

Ah yes, this is a tricky part and I’ll go over this again in class 2. This is the code. I’ve added print(all_my_pets) so you can see how it grows with each iteration:

pets <- c("dog", "cat", "mouse")
# make an empty vector
all_my_pets <- c()

for(pet in pets){
  mypet <- paste("jessica's", pet)
  
  # add the new variable mypet to the end of the vector all_my_pets
  all_my_pets <- c(all_my_pets, mypet)
  
  # this prints out what the object all_my_pets is for each iteration
  print(all_my_pets)
}
## [1] "jessica's dog"
## [1] "jessica's dog" "jessica's cat"
## [1] "jessica's dog"   "jessica's cat"   "jessica's mouse"

However, if we had just done all_my_pets <- c(mypet):

pets <- c("dog", "cat", "mouse")
# make an empty vector
all_my_pets <- c()

for(pet in pets){
  mypet <- paste("jessica's", pet)
  
  # add the new variable mypet to the end of the vector all_my_pets
  all_my_pets <- c(mypet)
  
  # this prints out what the object all_my_pets is for each iteration
  print(all_my_pets)
}
## [1] "jessica's dog"
## [1] "jessica's cat"
## [1] "jessica's mouse"

We just replace all_my_pets with mypet each time! It has no memory of what happened before, because we have overwritten that object, essentially. We are not adding mypet to the vector all_my_pets. If we call all_my_pets now, it just shows us the last value of pets, and it’s a vector of length 1:

all_my_pets
## [1] "jessica's mouse"

I’ll go over this in class too. Note that these are identical, the c() around mypet does nothing:

c(mypet)
## [1] "jessica's mouse"
mypet
## [1] "jessica's mouse"

I think the idea of how and why it is so important to open/save our files in a certain way is unclear to me.

Hopefully this will become more clear a little bit each week. We haven’t gotten enough into data management for it to be clear enough. I’ll spend a little bit of time talking about projects and files again this class, and probably again in future classes. This is a thing that takes practice. Main points:

  • Use Rstudio projects to keep everything in the same folder and so you can use file paths relative to the home directory (we will talk about this in class 2)
  • Don’t save your R environment upon exiting Rstudio (I had you unclick this box in the global options so this is done automatically) so that you don’t depend on saved objects that may or may not be there. Save your output in files (we will talk about saving output later).

Are there some excellent projects last year recommending to learn?

I think you are asking about the midterm projects from last year? They are all excellent! I just picked a couple to highlight things I noticed after skimming them, but I recommend going through them all if you have the time:

  • Investigating Moore’s Law there is a lot of useful work in here with functions we will learn later like case_when() and some neat plots with ggplot2 that incorporate interesting regression results and the use of an interesting function get_regression_equation()!
  • Understanding C02 Emissions there are some complicated uses of summarize() and mutate() in here that are worth examining, and nice visual conclusions
  • MMR Vaccination Rates in Oregon - nice final summary here and a visualization of how the data can surprise us and change our approaches to data visualization and exploration.
  • The Popularity of Sci-Fi Films - beautiful and effective ggplots and nice summaries!

The explanation of what vectors are and all of the points of discussion.

I’ll try to go over this in pieces in the next class, hopefully this will help.

how to do the hw assignments and create the code

Definitely reach out to me if you have questions about generating the homework files, there has been some useful discussion in slack as well on questions about the code. The full set of code will be embedded in your .Rmd document. That is what you are writing/editing. Then, you click the knit button on the top to create an .html file which will “run” the code all at once, and create a combined file that has text and code output. This is the file you will submit on Sakai. Please reach out to me or Colin if you need help!

Clearest Points

Everything was clear, this was just the first class :)

general set up of R

How to change global settings

assigning objects

code for variable

I felt like I knew pretty much everything, but I liked the basic of for loops!

I appreciated getting clarification a lot of little things like NA, NULL, what are default settings for functions.

Yay!

What is going on in R! I set up two monitors so I can move along with you in R, and that is so helpful to watch you in R real time.

This is great! I do feel bad for folks who don’t have two monitors, as I know having too many windows up on one screen can make it hard to see what I’m doing. Feel free to log in twice with a tablet or something else if it can make it easier for you, or let me know how I can make it more accessible if you don’t have two screens!

Other messages

Remember, all of this is anonymous to me.

I think this class will be very useful for me. While I rated the pace a little slow, I think things will get more challenging so I’m not necessarily wishing you to speed up!

Great to know :)

I’ll definitely be reaching out for help while I go through the assignment, but excited to get started :)

I am going to likely request a lot of short one-on-one sessions with you at the beginning, just to gain greater clarity on basic things - since I am brand new to R.

Please do!! I don’t want to leave anyone who is new to R behind, so let’s all get to baseline together in the beginning.

Everything except for loops was review for those of us who were in 551 last quarter

Ah yes that is what I guessed. At first I read this as “Dr. Niederhausen taught us for loops” and I was super impressed she had time for that! But now I’m glad at least one part was new!

Please explain the details of the use of R when describing the tools so that we know how to use the tools to be able to perform the functions within R.

I’m a little confused about what you mean by “tools”, maybe send me an anonymous message again and I will try to do better! Please ask questions during class if anything isn’t clear, I’m happy to repeat or explain things differently in real time.

I wish this class was provided to undergrads or offered before the core BSTA courses I need for my degree

I know…we all wish this, actually. The unfortunate excuse is: it’s too hard to put into the fall term because of the EPI and BSTA sequences required for the MS and MPH take up a lot of space/credits, and we can’t require people to take courses before the fall term which they enroll. The only way I can think of is for this to be a 1 credit course in fall with very limited lessons/time commitment (though watching all the OCTRI-BERD R workshops and going through R bootcamp probably would teach the same amount of info), or for BSTA and EPI courses to have “sections” or “labs” with extra learning time (Though none of our classes seem to have this. All of my grad school classes did, and it was so helpful, bur requires large PhD programs for TA support). All of these options are difficult for the program right now, but I will keep bringing it up! We all will keep trying to make this better, and I’m sorry it wasn’t better for your experience!

Additional Info

Projects in RStudio Desktop

See this short video about creating projects in Rstudio desktop if it’s a new concept to you:

Slack Intro

Slack invite link is on Sakai, and will be emailed before class.

Zoom Intro

We will be using zoom. Here’s a short video on how to use zoom: