Graphic Design with ggplot2

Concepts of the {ggplot2} Package Pt. 1:
Data, Aesthetics, and Layers + Misc Stuff

Cédric Scherer // rstudio::conf // July 2022

Setup

The ggplot2 Package


… is an R package to visualize data created by Hadley Wickham in 2005

# install.packages("ggplot2")
library(ggplot2)


… is part of the {tidyverse}

# install.packages("tidyverse")
library(tidyverse)

The Grammar of {ggplot2}

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.
Statistics stat_*() The statistical transformations applied to the data.
Scales scale_*() Maps between the data and the aesthetic dimensions.
Coordinate System coord_*() Maps data into the plane of the data rectangle.
Facets facet_*() The arrangement of the data into a grid of plots.
Visual Themes theme() / theme_*() The overall visual defaults of a plot.

A Basic ggplot Example

The Data

Bike sharing counts in London, UK, powered by TfL Open Data

  • covers the years 2015 and 2016
  • incl. weather data acquired from freemeteo.com
  • prepared by Hristo Mavrodiev for Kaggle
  • further modification by myself


bikes <- readr::read_csv(
  here::here("data", "london-bikes-custom.csv"),
  ## or: "https://raw.githubusercontent.com/z3tt/graphic-design-ggplot2/main/data/london-bikes-custom.csv"
  col_types = "Dcfffilllddddc"
)

bikes$season <- forcats::fct_inorder(bikes$season)
Variable Description Class
date Date encoded as `YYYY-MM-DD` date
day_night `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) character
year `2015` or `2016` factor
month `1` (January) to `12` (December) factor
season `winter`, `spring`, `summer`, or `autumn` factor
count Sum of reported bikes rented integer
is_workday `TRUE` being Monday to Friday and no bank holiday logical
is_weekend `TRUE` being Saturday or Sunday logical
is_holiday `TRUE` being a bank holiday in the UK logical
temp Average air temperature (°C) double
temp_feel Average feels like temperature (°C) double
humidity Average air humidity (%) double
wind_speed Average wind speed (km/h) double
weather_type Most common weather type character

ggplot2::ggplot()

The help page of the ggplot() function.

Data

ggplot(data = bikes)

Aesthetic Mapping


= link variables to graphical properties

  • positions (x, y)
  • colors (color, fill)
  • shapes (shape, linetype)
  • size (size)
  • transparency (alpha)
  • groupings (group)

Aesthetic Mapping

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)

aesthetics

aes() outside as component

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)


aes() inside, explicit matching

ggplot(data = bikes, mapping = aes(x = temp_feel, y = count))


aes() inside, implicit matching

ggplot(bikes, aes(temp_feel, count))


aes() inside, mixed matching

ggplot(bikes, aes(x = temp_feel, y = count))

Geometrical Layers

Geometries


= interpret aesthetics as graphical representations

  • points
  • lines
  • polygons
  • text labels

Geometries

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point()

Visual Properties of Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5,
    shape = "X",
    stroke = 1,
    size = 4
  )

Setting vs Mapping of Visual Properties

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5
  )

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = temp_feel > 20),
    alpha = .5
  )

Your Turn!

  • Create a scatter plot of temp_feel vs temp.
    • Map the color of the points to clear weather.
    • Map the size of the points to count.
    • Turn the points into diamonds.
    • Bonus: What do you notice in the legend? How could you fix it?

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear"),
    alpha = .5,
    size = 2
  )

Mapping to Size

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 5,
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 9,
    alpha = .5
  )

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 23,
    alpha = .5
  )

An overview of a set of available shapes, ordered by their type of shape (e.g. points, triangles etc).

Source: Albert’s Blog

Setting a Constant Property

ggplot(
    bikes,
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(fill = weather_type == "clear",
        size = count),
    shape = 23,
    color = "black",
    alpha = .5
  )

Filter Data

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Filter Data

ggplot(
    bikes %>% filter(!is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Local vs. Global Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  )

Adding More Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Global Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Local Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

The `group` Aesthetic

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    aes(group = day_night),
    method = "lm"
  )

Set Both as Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Overwrite Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm",
    color = "black"
  )

Statistical Layers

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  stat_smooth(geom = "smooth")

ggplot(bikes, aes(x = temp_feel, y = count)) +
  geom_smooth(stat = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = season)) +
  stat_count(geom = "bar")

ggplot(bikes, aes(x = season)) +
  geom_bar(stat = "count")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = date, y = temp_feel)) +
  stat_identity(geom = "point")

ggplot(bikes, aes(x = date, y = temp_feel)) +
  geom_point(stat = "identity")

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary() 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary(
    fun.data = mean_se, ## the default
    geom = "pointrange"  ## the default
  ) 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  geom_boxplot() +
  stat_summary(
    fun = mean,
    geom = "point",
    color = "#28a87d",
    size = 3
  ) 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary(
    fun = mean, 
    fun.max = function(y) mean(y) + sd(y), 
    fun.min = function(y) mean(y) - sd(y) 
  ) 

Extending a ggplot

Store a ggplot as Object

g <-
  ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm",
    color = "black"
  )

class(g)
[1] "gg"     "ggplot"

Inspect a ggplot Object

g$data
# A tibble: 1,454 x 14
   date       day_night year  month season count is_workday is_weekend
   <date>     <chr>     <fct> <fct> <fct>  <int> <lgl>      <lgl>     
 1 2015-01-04 day       2015  1     winter  6830 FALSE      TRUE      
 2 2015-01-04 night     2015  1     winter  2404 FALSE      TRUE      
 3 2015-01-05 day       2015  1     winter 14763 TRUE       FALSE     
 4 2015-01-05 night     2015  1     winter  5609 TRUE       FALSE     
 5 2015-01-06 day       2015  1     winter 14501 TRUE       FALSE     
 6 2015-01-06 night     2015  1     winter  6112 TRUE       FALSE     
 7 2015-01-07 day       2015  1     winter 16358 TRUE       FALSE     
 8 2015-01-07 night     2015  1     winter  4706 TRUE       FALSE     
 9 2015-01-08 day       2015  1     winter  9971 TRUE       FALSE     
10 2015-01-08 night     2015  1     winter  5630 TRUE       FALSE     
# ... with 1,444 more rows, and 6 more variables: is_holiday <lgl>, temp <dbl>,
#   temp_feel <dbl>, humidity <dbl>, wind_speed <dbl>, weather_type <chr>

Inspect a ggplot Object

g$mapping
Aesthetic mapping: 
* `x`      -> `temp_feel`
* `y`      -> `count`
* `colour` -> `season`
* `group`  -> `day_night`

Extend a ggplot Object: Add Layers

g +
  geom_rug(
    alpha = .2
  )

Remove a Layer from the Legend

g +
  geom_rug(
    alpha = .2,
    show.legend = FALSE
  )

Extend a ggplot Object: Add Labels

g +
  xlab("Feels-like temperature (°F)") +
  ylab("Reported bike shares") +
  ggtitle("TfL bike sharing trends")

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends"
  )

Extend a ggplot Object: Add Labels

g <- g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends",
    color = "Season:"
  )

g

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends",
    subtitle = "Reported bike rents versus feels-like temperature in London",
    caption = "Data: TfL",
    color = "Season:",
    tag = "Fig. 1"
  )

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "",
    caption = "Data: TfL"
  )

g +
  labs(
    x = NULL,
    caption = "Data: TfL"
  )

A Polished ggplot Example

Extend a ggplot Object: Themes

g + theme_light()

g + theme_minimal()

Change the Theme Base Settings

g + theme_light(
  base_size = 14,
  base_family = "Roboto Condensed"
)

Set a Theme Globally

theme_set(theme_light())

g

Change the Theme Base Settings

theme_set(theme_light(
  base_size = 14,
  base_family = "Roboto Condensed"
))

g

{systemfonts}

# install.packages("systemfonts")

library(systemfonts)

system_fonts() %>%
  filter(str_detect(family, "Cabinet")) %>%
  pull(name) %>%
  sort()
 [1] "CabinetGrotesk-Black"      "CabinetGrotesk-Black"     
 [3] "CabinetGrotesk-Bold"       "CabinetGrotesk-Bold"      
 [5] "CabinetGrotesk-Extrabold"  "CabinetGrotesk-Extrabold" 
 [7] "CabinetGrotesk-Extralight" "CabinetGrotesk-Extralight"
 [9] "CabinetGrotesk-Light"      "CabinetGrotesk-Light"     
[11] "CabinetGrotesk-Medium"     "CabinetGrotesk-Medium"    
[13] "CabinetGrotesk-Regular"    "CabinetGrotesk-Regular"   
[15] "CabinetGrotesk-Thin"       "CabinetGrotesk-Thin"      

{systemfonts}

register_variant(
  name = "Cabinet Grotesk Black",
  family = "Cabinet Grotesk",
  weight = "heavy",
  features = font_feature(letters = "stylistic")
)

{systemfonts} + {ggplot2}

g +
  theme_light(
    base_size = 18,
    base_family = "Cabinet Grotesk Black"
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank()
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold")
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "none"
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "top",
    plot.title.position = "plot"
  )

Overwrite Theme Settings Globally

theme_update(
  panel.grid.minor = element_blank(),
  plot.title = element_text(face = "bold"),
  legend.position = "top",
  plot.title.position = "plot"
)

g

Save the Graphic

ggsave(g, filename = "my_plot.png")
ggsave("my_plot.png")
ggsave("my_plot.png", width = 8, height = 5, dpi = 600)
ggsave("my_plot.pdf", width = 20, height = 12, unit = "cm", device = cairo_pdf)
grDevices::cairo_pdf("my_plot.pdf", width = 10, height = 7)
g
dev.off()


A comparison of vector and raster graphics.

Modified from canva.com

How to Work with Aspect Ratios

  • don’t rely on the Rstudio viewer pane!
  • once you have a “it’s getting close” prototype, settle on a plot size

  • Approach 1: save the file to disk and inspect it; go back to your IDE
    • tedious and time-consuming…

  • Approach 2: use a qmd or rmd with inline output and chunk settings
    • set fig.width and fig.height per chunk or globally

  • Approach 3: use our {camcorder} package
    • saves output from all ggplot() calls and displays it in the viewer pane

Setting Plot Sizes in Rmd’s

A screenshot of an exemplary Rmd file with two chunks with different settings of fig.width and fig.height.

Setting Plot Sizes via {camcorder}


A screenshot of an exemplary R script with a plot automatically saved and isplayed in correct aspect ratio thanks to the camcorder package.

Recap

  • {ggplot2} is a powerful library for reproducible graphic design
  • the components follow a consistent syntax
  • each ggplot needs at least data, some aesthetics, and a layer
  • we set constant propeties outside aes()
  • … and map data-related properties inside aes()
  • local settings and mappings override global properties
  • grouping allows applying layers for subsets
  • we can store a ggplot object and extend it afterwards
  • we can change the appearance for all plots with theme_set()
    and theme_update()

Exercises

Exercise 1

  • Open the script exercises/02_concepts_pt1_ex1.qmd.
  • Explore the TfL bike share data visually:
    create a timeseries of reported bike shares on weekend days
    • Highlight day and night encoded by colors and shapes.
    • Connect the points of each period with lines.
      • What is the difference between geom_line() and geom_path()?
    • Apply your favorite theme to the plot.
    • Add meaningful labels.
  • Save the plot as a vector graphic with a decent plot size.

Exercise 2

  • Open the script exercises/02_concepts_pt1_ex2.qmd.
  • Explore the TfL bike sharing data visually:
    create a boxplot of counts per weather type
    • Turn the plot into a jitter strips plot (random noise across the x axis)
    • Combine both chart types (jittered points on top of the boxplots)
    • Bonus: Sort the boxplot-jitter hybrid by median counts
    • Apply your favorite theme to the plot.
    • Add meaningful labels.
    • Bonus: Explore other chart types to visualize the distributions.
  • Save the plot as a vector graphic with a decent plot size.