dplyr::slice_min/max()

Function of the Week: slice_min() & slice_max()

slice_min() & slice_max()

In this document, I will introduce the slice_min() & slice_max() functions and show what they’re used for.

Loading my data

I always enjoy looking at the datasets FiveThirtyEight uses, because they display data so nicely. For this presentation, I chose to look at Club Soccer Predictions–specifically, the English (Barclay’s) Premier League. To simplify this example, I grabbed a subset of data that included only the Premier League teams and their current ranking data.

spi_global_rankings <- read_excel("20220316_FiveThirtyEight_soccer-spi_data/spi_global_rankings.xlsx",
                                 sheet = 1,
                                 skip = 0,
                                 na = "NA")

barclays_premier <- subset(spi_global_rankings, league == "Barclays Premier League")
print(barclays_premier)
## # A tibble: 20 × 7
##     rank prev_rank name                     league               off   def   spi
##    <dbl>     <dbl> <chr>                    <chr>              <dbl> <dbl> <dbl>
##  1     1         1 Manchester City          Barclays Premier …  2.9   0.2   93.7
##  2     3         3 Liverpool                Barclays Premier …  2.95  0.29  92.8
##  3     4         4 Chelsea                  Barclays Premier …  2.4   0.26  89.3
##  4    11        12 Arsenal                  Barclays Premier …  2.22  0.52  82.2
##  5    14        15 Tottenham Hotspur        Barclays Premier …  2.34  0.67  80.3
##  6    17        17 Manchester United        Barclays Premier …  2.19  0.67  78.3
##  7    22        28 Aston Villa              Barclays Premier …  2.03  0.64  76.7
##  8    30        29 West Ham United          Barclays Premier …  2.01  0.73  74.3
##  9    32        27 Brighton and Hove Albion Barclays Premier …  1.85  0.62  74.2
## 10    34        45 Wolverhampton            Barclays Premier …  1.72  0.55  73.8
## 11    38        41 Crystal Palace           Barclays Premier …  1.89  0.69  73.1
## 12    50        47 Leicester City           Barclays Premier …  2.07  0.96  69.8
## 13    55        48 Southampton              Barclays Premier …  1.94  0.89  69.2
## 14    61        66 Brentford                Barclays Premier …  1.78  0.85  67.1
## 15    65        68 Newcastle                Barclays Premier …  1.76  0.85  66.7
## 16    74        70 Burnley                  Barclays Premier …  1.62  0.83  64.4
## 17    78        74 Everton                  Barclays Premier …  1.73  0.95  63.8
## 18    91        78 Leeds United             Barclays Premier …  1.85  1.14  61.6
## 19   103       100 Watford                  Barclays Premier …  1.65  1.08  59  
## 20   151       146 Norwich City             Barclays Premier …  1.51  1.19  53.0

The rankings appear to be missing spots, but we have to remember that this was pulling a subset of the greater dataset which was ranking all International Club Teams (of which there are 640).

What is it for?

The data is still in rank order by Soccer Power Ranking (SPI), so if I just pull the top 5 teams, I’ll know who the top teams to watch in the Premier League are. I can do this using slice_min() for best (lowest #) ranking:

slice_min(barclays_premier, order_by = rank, n = 5, with_ties = TRUE)
## # A tibble: 5 × 7
##    rank prev_rank name              league                    off   def   spi
##   <dbl>     <dbl> <chr>             <chr>                   <dbl> <dbl> <dbl>
## 1     1         1 Manchester City   Barclays Premier League  2.9   0.2   93.7
## 2     3         3 Liverpool         Barclays Premier League  2.95  0.29  92.8
## 3     4         4 Chelsea           Barclays Premier League  2.4   0.26  89.3
## 4    11        12 Arsenal           Barclays Premier League  2.22  0.52  82.2
## 5    14        15 Tottenham Hotspur Barclays Premier League  2.34  0.67  80.3

I can also do this by using slice_max() for highest SPI:

slice_max(barclays_premier, order_by = spi, n = 5, with_ties = TRUE)
## # A tibble: 5 × 7
##    rank prev_rank name              league                    off   def   spi
##   <dbl>     <dbl> <chr>             <chr>                   <dbl> <dbl> <dbl>
## 1     1         1 Manchester City   Barclays Premier League  2.9   0.2   93.7
## 2     3         3 Liverpool         Barclays Premier League  2.95  0.29  92.8
## 3     4         4 Chelsea           Barclays Premier League  2.4   0.26  89.3
## 4    11        12 Arsenal           Barclays Premier League  2.22  0.52  82.2
## 5    14        15 Tottenham Hotspur Barclays Premier League  2.34  0.67  80.3

Is it helpful?

These functions are helpful when you want to quickly visualize the top or bottom of a list, maybe to have a quick understanding of your data and/or how it is being organized while you’re working, but I imagine that combining them with some more filtering would give them even more power. For example, highlighting these top 5 Premier League teams, but from the original data of 640 teams. Or perhaps the top students from each of 6 classes.