forcats::fct_infreq()

Function of the Week:

Function Name

In this document, I will introduce the fct_infreq() function and show what it’s for.

#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(forcats)
#example dataset
library(palmerpenguins)
data(penguins)

What is it for?

This function changes the order of the levels by number of observations within each level, starting with the largest number of observations.

Arguments * f * A factor

  • ordered
    • A logical which determines the “ordered” status of the output factor. NA preserves the existing status of the factor.
f <- (c("red", "yellow", "yellow", "blue", "blue", "yellow", "red", "yellow", "blue", "red", "yellow", "rainbow", "yellow", "blue"))
f<-factor(f)
summary(f) #According to this, we should see yellow in our data set first, then blue, then red, then rainbow
##    blue rainbow     red  yellow 
##       4       1       3       6
change_f <- fct_infreq(f)
summary(change_f)
##  yellow    blue     red rainbow 
##       6       4       3       1

Is it helpful?

I don’t think this is super relevant because if you were to need to automatically set the order of a factor this way from a large data set, R actually automatically does this. So this is really only helpful when we create our own vectors or datasets which I rarely have done.

Example:

summary(penguins$island)
##    Biscoe     Dream Torgersen 
##       168       124        52

Maybe it can be helpful if you purposefully re-level your vector, but then want it back to be ordered by frequency again (R starts with largest frequency).

library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
penguins$island_relevel <- relevel(penguins$island, "Torgersen")

penguins %>% 
  tabyl(island_relevel)
##  island_relevel   n   percent
##       Torgersen  52 0.1511628
##          Biscoe 168 0.4883721
##           Dream 124 0.3604651

Now back to original with function.

penguins$change_f <- fct_infreq(penguins$island_relevel)

penguins %>% 
  tabyl(change_f)
##   change_f   n   percent
##     Biscoe 168 0.4883721
##      Dream 124 0.3604651
##  Torgersen  52 0.1511628