dplyr::coalesce()

Function of the Week: dplyr::coalesce()

dplyr::coalesce()

In this document, I will introduce the coalesce() function and show what it’s for.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

What is it for?

Given a set of vectors, coalesce( )finds the first non-missing value at each position. The arguments of this function is the vectors. It can be used to replace and compare missing data.

First, it can use a single value to replace the missing data.

x<-c(1,2,NA,NA,1,2,3) #create example vector
class(x)
## [1] "numeric"
coalesce(x,4)#apply coalesce
## [1] 1 2 4 4 1 2 3
x<-c(1,2,"b",NA,NA,1,2)
class(x)
## [1] "character"
coalesce(x,"a")
## [1] "1" "2" "b" "a" "a" "1" "2"
df<-tibble(x=c(1,NA,3),
               y=c(NA,5,6))
df
## # A tibble: 3 × 2
##       x     y
##   <dbl> <dbl>
## 1     1    NA
## 2    NA     5
## 3     3     6
class(df)
## [1] "tbl_df"     "tbl"        "data.frame"
coalesce(df$x,2)
## [1] 1 2 3

It can replace NA with the same class of the vector, and it can not replace NA in the data_frame directly.

Second, it can compare and replace NA in the first vector with the value in other vectors.

x<-c(1,1,NA,1,NA,NA)
y<-c(2,NA,2,2,2,NA)
z<-c(3,NA,3,NA,3,3)
coalesce(x,y,z)
## [1] 1 1 2 1 2 3

Different size and type can not be compared and replaced.

Third, it can compare and replace NA in the list.

x<-list(c(1,NA,NA),c(NA,4,NA),c(NA,6,7))
class(x)
## [1] "list"
coalesce(!!!x)
## [1] 1 4 7

It requires each column has the same size.

Is it helpful?

Yes, when we want to deal with missing data, “coalesce()” is very useful. It can replace NA and compare different vector and list. But it has strict requirements on the type and length of data.

If we want to replace NA with some special data, “replace_na()” is much more powerful than coalesce(). It can replace NA with numerical or character whatever the vector is. It can also replace NA in data frame or replace NUlls in a list.

x<-c(1,2,NA,NA)
class(x)
## [1] "numeric"
replace_na(x,"a")
## [1] "1" "2" "a" "a"
df<-tibble(x=c(1,NA,NA,3),
               y=c(NA,5,6,NA))
df
## # A tibble: 4 × 2
##       x     y
##   <dbl> <dbl>
## 1     1    NA
## 2    NA     5
## 3    NA     6
## 4     3    NA
df%>%replace_na(list(x=0,y=1))
## # A tibble: 4 × 2
##       x     y
##   <dbl> <dbl>
## 1     1     1
## 2     0     5
## 3     0     6
## 4     3     1
df<-tibble(z=list(1:5,NULL,10:20))
df
## # A tibble: 3 × 1
##   z         
##   <list>    
## 1 <int [5]> 
## 2 <NULL>    
## 3 <int [11]>
dg<-df%>%replace_na(list(z=list(5)))
dg
## # A tibble: 3 × 1
##   z         
##   <list>    
## 1 <int [5]> 
## 2 <dbl [1]> 
## 3 <int [11]>

What’s more, if we want to convert an annoying value to NA, we can use “na_if”.

y <- c(1,2,3)
na_if(y,1)
## [1] NA  2  3