{GGally} demo by Allison Horst (Bren)

The delightful {GGally} package, by Barret Schloerke and other contributors, makes it easier to explore relationships and patterns for multivariate data. In the following examples, we show several {GGally} functions to visualize your data using the {palmerpenguins} package (install from GitHub for now – hopefully on CRAN soon).

library(tidyverse)
library(palmerpenguins)
library(skimr)
library(GGally)

1. Check out the penguins data!

First, some of our go-to ways to explore data, using the penguins object from {palmerpenguins}. The data were originally collected and published by Dr. Kristen Gorman and the Palmer Station LTER. Structural size measurements for nesting adults of three penguin species (Adélie, Chinstrap and Gentoo) were recorded on islands in Palmer Archipelago, Antarctica, from 2007 - 2009.

# Bring it up in a new tab
# View(penguins)

# Summarize it 
summary(penguins)
##       species          island    bill_length_mm  bill_depth_mm  
##  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
##  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
##                                  Mean   :43.92   Mean   :17.15  
##                                  3rd Qu.:48.50   3rd Qu.:18.70  
##                                  Max.   :59.60   Max.   :21.50  
##                                  NA's   :2       NA's   :2      
##  flipper_length_mm  body_mass_g       sex           year     
##  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :197.0     Median :4050   NA's  : 11   Median :2008  
##  Mean   :200.9     Mean   :4202                Mean   :2008  
##  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :231.0     Max.   :6300                Max.   :2009  
##  NA's   :2         NA's   :2
# Glimpse it
tibble::glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ade…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgers…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,…
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,…
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,…
## $ sex               <fct> male, female, female, NA, female, male, female, mal…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 200…
# Skim it 
skimr::skim(penguins)
Table 1: Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

2. Visual overview of relationships with {GGally} package

More on {GGally}: https://ggobi.github.io/ggally/

Summary: “GGally extends ggplot2 by adding several functions to reduce the complexity of combining geoms with transformed data. Some of these functions include a pairwise plot matrix, a scatterplot plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.”

A. ggpairs(): a pair plot!

Out of the box:

ggpairs(penguins)

That should make you go “What.” It’s cool but a bit much. We can specify what we want in our pairs plot, and map color onto a variable to separate groups.

penguins %>% 
  select(species, bill_length_mm:body_mass_g) %>% 
  ggpairs(aes(color = species))

B. ggbivariate(): plot outcome variable with several explanatory variables

penguins %>% 
  ggbivariate(outcome = "body_mass_g", 
              explanatory = c("species","sex", "island","flipper_length_mm"))

An example with custom formatting:

penguins %>% 
  ggbivariate(outcome = "species", 
              explanatory = c("flipper_length_mm", "island", "sex"),
              rowbar_args = list(
    colour = "purple",
    size = 4,
    fontface = "bold",
    label_format = scales::label_percent(accurary = 1)
  )) +
  scale_fill_brewer(palette = 10) +
  theme_minimal()

C. ggnostic(): Model diagostics

# Make a model: 
penguin_lm <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm, data = penguins)

# Look at the diagnostics! 
ggnostic(penguin_lm)

D. ggtable(): Nicely formatted counts tables

# Counts, for each species, tallied by island & sex
penguins %>% 
  ggtable("species", c("island","sex"))

And much more! Check out {GGally} for more tools to explore and visualize your multivariate data!

Related