forcats::fct_collapse()
Function of the Week: fct_collapse()
Libby White
Last Updated 2022-01-24
Submission Instructions
Please sign up for a function here: https://docs.google.com/spreadsheets/d/1-RWAQTlLwttjFuZVAtSs8OiHIwu6AZLUdWugIHHTWVo/edit?usp=sharing
For this assignment, please submit both the .Rmd
and the .html
files. I will add it to the website. Remove your name from the Rmd if you do not wish it shared. If you select a function which was presented last year, please develop your own examples and content.
fct_collapse()
- In this document, I will introduce the fct_collapse() function and show what it’s for.
#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#example dataset
library(palmerpenguins)
What is it for?
- This function is used for re-factoring a factor or character variable into fewer factor levels than it currently has. For example, I changed the factor variable “island” in the penguins data set from 3 levels to 2 by combining the islands “Dream” and “Torgersen” into “Not_Boscoe” below:
# 3 Factor Levels
table(penguins$island)
##
## Biscoe Dream Torgersen
## 168 124 52
# 2 Factor Levels
penguins_refactored <- fct_collapse(penguins$island,
Biscoe = "Biscoe",
Not_Biscoe = c("Torgersen", "Dream"),
other_level = "Missing") # If any other values are encountered, they
# will be classified as "Missing"
table(penguins_refactored)
## penguins_refactored
## Biscoe Not_Biscoe
## 168 176
- As a second example, I re-factored the variable for number of cylinders (“cyl”) from the mtcars data set from 3 to 2 factor levels:
CARS <- mtcars
CARS$cyl <- as.factor(CARS$cyl)
# 3 Factor Levels
table(CARS$cyl)
##
## 4 6 8
## 11 7 14
# 2 Factor Levels
CARS_refactored <- fct_collapse(CARS$cyl,
Four = "4",
More_than_Four = c("6", "8"),
other_level = "Missing")
table(CARS_refactored)
## CARS_refactored
## Four More_than_Four
## 11 21
Is it helpful?
- This function has its uses, but, overall, I think it may just be easier to create a new variable with the desired factor levels using mutate() and either case_when() or ifelse(). Below is an example of what I would do instead:
penguins_refactored_2 <- penguins %>%
mutate(island_2lvls = ifelse(island == "Biscoe",
"Biscoe",
"Not Biscoe"))
table(penguins_refactored_2$island_2lvls)
##
## Biscoe Not Biscoe
## 168 176
- Using the second example with the mtcars data set, I would do the following:
CARS_refactored_2 <- CARS %>%
mutate(cyl_2lvls = ifelse(cyl == "4",
"Four",
"More than four"))
table(CARS_refactored_2$cyl_2lvls)
##
## Four More than four
## 11 21