{GGally} demo by Allison Horst (Bren)

The delightful {GGally} package, by Barret Schloerke and other contributors, makes it easier to explore relationships and patterns for multivariate data. In the following examples, we show several {GGally} functions to visualize your data using the {palmerpenguins} package (install from GitHub for now – hopefully on CRAN soon).
library(tidyverse)
library(palmerpenguins)
library(skimr)
library(GGally)
1. Check out the penguins data!
First, some of our go-to ways to explore data, using the penguins object from {palmerpenguins}. The data were originally collected and published by Dr. Kristen Gorman and the Palmer Station LTER. Structural size measurements for nesting adults of three penguin species (Adélie, Chinstrap and Gentoo) were recorded on islands in Palmer Archipelago, Antarctica, from 2007 - 2009.
# Bring it up in a new tab
# View(penguins)
# Summarize it
summary(penguins)
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
# Glimpse it
tibble::glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ade…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgers…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,…
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,…
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,…
## $ sex <fct> male, female, female, NA, female, male, female, mal…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 200…
# Skim it
skimr::skim(penguins)
Name | penguins |
Number of rows | 344 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
species | 0 | 1.00 | FALSE | 3 | Ade: 152, Gen: 124, Chi: 68 |
island | 0 | 1.00 | FALSE | 3 | Bis: 168, Dre: 124, Tor: 52 |
sex | 11 | 0.97 | FALSE | 2 | mal: 168, fem: 165 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |
bill_depth_mm | 2 | 0.99 | 17.15 | 1.97 | 13.1 | 15.60 | 17.30 | 18.7 | 21.5 | ▅▅▇▇▂ |
flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |
body_mass_g | 2 | 0.99 | 4201.75 | 801.95 | 2700.0 | 3550.00 | 4050.00 | 4750.0 | 6300.0 | ▃▇▆▃▂ |
year | 0 | 1.00 | 2008.03 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.0 | 2009.0 | ▇▁▇▁▇ |
2. Visual overview of relationships with {GGally} package
More on {GGally}: https://ggobi.github.io/ggally/
Summary: “GGally extends ggplot2 by adding several functions to reduce the complexity of combining geoms with transformed data. Some of these functions include a pairwise plot matrix, a scatterplot plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.”
A. ggpairs()
: a pair plot!
Out of the box:
ggpairs(penguins)
That should make you go “What.” It’s cool but a bit much. We can specify what we want in our pairs plot, and map color onto a variable to separate groups.
penguins %>%
select(species, bill_length_mm:body_mass_g) %>%
ggpairs(aes(color = species))
B. ggbivariate()
: plot outcome variable with several explanatory variables
penguins %>%
ggbivariate(outcome = "body_mass_g",
explanatory = c("species","sex", "island","flipper_length_mm"))
An example with custom formatting:
penguins %>%
ggbivariate(outcome = "species",
explanatory = c("flipper_length_mm", "island", "sex"),
rowbar_args = list(
colour = "purple",
size = 4,
fontface = "bold",
label_format = scales::label_percent(accurary = 1)
)) +
scale_fill_brewer(palette = 10) +
theme_minimal()
C. ggnostic()
: Model diagostics
# Make a model:
penguin_lm <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm, data = penguins)
# Look at the diagnostics!
ggnostic(penguin_lm)
D. ggtable()
: Nicely formatted counts tables
# Counts, for each species, tallied by island & sex
penguins %>%
ggtable("species", c("island","sex"))
And much more! Check out {GGally} for more tools to explore and visualize your multivariate data!