forcats::fct_collapse()

Function of the Week: fct_collapse()

Submission Instructions

Please sign up for a function here: https://docs.google.com/spreadsheets/d/1-RWAQTlLwttjFuZVAtSs8OiHIwu6AZLUdWugIHHTWVo/edit?usp=sharing

For this assignment, please submit both the .Rmd and the .html files. I will add it to the website. Remove your name from the Rmd if you do not wish it shared. If you select a function which was presented last year, please develop your own examples and content.



fct_collapse()

  • In this document, I will introduce the fct_collapse() function and show what it’s for.
#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#example dataset
library(palmerpenguins)


What is it for?

  • This function is used for re-factoring a factor or character variable into fewer factor levels than it currently has. For example, I changed the factor variable “island” in the penguins data set from 3 levels to 2 by combining the islands “Dream” and “Torgersen” into “Not_Boscoe” below:
# 3 Factor Levels
table(penguins$island)
## 
##    Biscoe     Dream Torgersen 
##       168       124        52
# 2 Factor Levels
penguins_refactored <- fct_collapse(penguins$island,
                                    Biscoe = "Biscoe",
                                    Not_Biscoe = c("Torgersen", "Dream"),
                                    other_level = "Missing") # If any other values are encountered, they
                                                             # will be classified as "Missing"
table(penguins_refactored)
## penguins_refactored
##     Biscoe Not_Biscoe 
##        168        176


  • As a second example, I re-factored the variable for number of cylinders (“cyl”) from the mtcars data set from 3 to 2 factor levels:
CARS <- mtcars
CARS$cyl <- as.factor(CARS$cyl)

# 3 Factor Levels
table(CARS$cyl)
## 
##  4  6  8 
## 11  7 14
# 2 Factor Levels
CARS_refactored <- fct_collapse(CARS$cyl,
                                    Four = "4",
                                    More_than_Four = c("6", "8"),
                                    other_level = "Missing")

table(CARS_refactored)
## CARS_refactored
##           Four More_than_Four 
##             11             21


Is it helpful?

  • This function has its uses, but, overall, I think it may just be easier to create a new variable with the desired factor levels using mutate() and either case_when() or ifelse(). Below is an example of what I would do instead:
penguins_refactored_2 <- penguins %>%
  mutate(island_2lvls = ifelse(island == "Biscoe",
                               "Biscoe",
                               "Not Biscoe"))

table(penguins_refactored_2$island_2lvls)
## 
##     Biscoe Not Biscoe 
##        168        176


  • Using the second example with the mtcars data set, I would do the following:
CARS_refactored_2 <- CARS %>%
  mutate(cyl_2lvls = ifelse(cyl == "4",
                               "Four",
                               "More than four"))

table(CARS_refactored_2$cyl_2lvls)
## 
##           Four More than four 
##             11             21