ggplot2::geom_density()

Function of the Week: Geom_Density

Geom_Density

Geom_density is a function within the ggplot2 package that is used to plot/visualize data. It is a commonly used tool in statistics to display the distribution of numerical data; it does so by plotting the estimated kernel density, resulting in the smoothing/normalizing the distribution of the data.

#load tidyverse up
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#example dataset
library(palmerpenguins)
data(penguins)

What is it for?

Discuss what the function does. Learn from the examples, but show how to use it using another dataset such as penguins. If you can provide two examples, even better!

One of the most simple/classic means of displaying the distribution of data is with a histogram. The histogram below shows that the distribution of data (of the body mass of the penguins) are skewed to the right, and that there are a few outlier datapoints present.

ggplot(penguins, aes(x=body_mass_g)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).

The function Geom_Density removes outlier data points, and acts to visualize the distribution of the data by smoothing it out– it helps to display where the datapoints/values are concentrated amongst the distribution in the form of peaks (see below).

ggplot(penguins, aes(x=body_mass_g)) +
  geom_density()
## Warning: Removed 2 rows containing non-finite values (stat_density).

Is it helpful?

Discuss whether you think this function is useful for you and your work. Is it the best thing since sliced bread, or is it not really relevant to your work?

Geom_density smooths the distribution of the data through the removal of outlier data points and displays the data in a continuous smooth distribution. This function acts to simplify how the data are represented in case the distribution is bimodal, which sometimes cannot be easily determined with as much definition in a histogram.