
Function of the Week: Geom_Density


Geom_density is a function within the ggplot2 package that is used to plot/visualize data. It is a commonly used tool in statistics to display the distribution of numerical data; it does so by plotting the estimated kernel density, resulting in the smoothing/normalizing the distribution of the data.

#load tidyverse up
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#example dataset

One of the most simple/classic means of displaying the distribution of data is with a histogram. The histogram below shows that the distribution of data (of the body mass of the penguins) are skewed to the right, and that there are a few outlier datapoints present.

ggplot(penguins, aes(x=body_mass_g)) +
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).

The function Geom_Density removes outlier data points, and acts to visualize the distribution of the data by smoothing it out– it helps to display where the datapoints/values are concentrated amongst the distribution in the form of peaks (see below).

ggplot(penguins, aes(x=body_mass_g)) +
## Warning: Removed 2 rows containing non-finite values (stat_density).

Geom_density smooths the distribution of the data through the removal of outlier data points and displays the data in a continuous smooth distribution. This function acts to simplify how the data are represented in case the distribution is bimodal, which sometimes cannot be easily determined with as much definition in a histogram.