- Intro and review of ggplot capabilities
- Fun with maps!
2026-03-13
ggplot basicsggplot follows a “grammar of graphics” that is a little different from the rest of R’s coding structureaes), geoms, and formatting
ggplot takes dataframes as the basic input, not an x vector and y vectorgeom’s are different types of plot objects that you can add to the plot (e.g. points, lines, bars, etc.)aes (short for aesthetics) command tells ggplot which variables in the dataset represent the x values, y values, color, size, etc.
ggplot call or within a geomggplot basicsTry running this code for yourself! (Be sure you have downloaded the workshop folder and saved your script in the folder first!)
library(tidyverse)
library(this.path)
# Load data from csv
mobilityData = read_csv(here('data/Trips_by_Distance.csv'))
# Calculate the percent of the population staying home
mobilityData$PercentHome =
100 * mobilityData$`Population Staying at Home` /
(mobilityData$`Population Staying at Home` + mobilityData$`Population Not Staying at Home`)
ggplot(mobilityData, aes(x = Date, y = PercentHome)) +
geom_point() +
labs(title="Percent of Michiganders staying home over 2019 - 2020",
x="", y="Percent of population staying home")
Let’s also add a rolling average (this will be another geom!), and format our colors a bit more.
Since we’ll have different y axis variables for our different geoms (regular and rolling average), we’ll move the aes command into our geoms.
Let’s also set the colors for our plot using a hex code!
Just in case you haven’t seen hex colors before!
They’re formatted like this: # RR GG BB (or sometimes # RR GG BB A)
But the numbering system is hexadecimal so it goes:
1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
For example: #008800, #00FF00, #9500AB, #00FFFF, #00AAAA
ggplot(mobilityData) +
geom_point(aes(x = Date, y = PercentHome), color = "#2255AA", alpha = 0.25) +
geom_line(aes(x = Date, y = zoo::rollmean(PercentHome, 7, fill = NA)),
color = "#2255AA", size = 1) +
labs(title="Percent of Michiganders staying home over 2019 - 2020",
x="", y="Percent of population staying home")
Exercise: Adjust the code to plot the number of trips taken each day over time, and change the color of the lines and points! (Don’t forget to adjust the labels too) You should see something like:
You can add horizontal and vertical lines with geom_hline and geom_vline, and you can add text annotations with geom_text:
ggplot(mobilityData) +
geom_point(aes(x = Date, y = PercentHome), color = "#2255AA", alpha = 0.25) +
geom_line(aes(x = Date, y = zoo::rollmean(PercentHome, 7, fill = NA)),
color = "#2255AA", size = 1) +
geom_vline(aes(xintercept = as.Date("2020-03-01"))) +
# Note we only give it an x intercept value since the rest is already set!
geom_text(aes(x = as.Date("2020-03-01"), y = 40),
label = "Pandemic starts", hjust = "left") +
labs(title="Percent of Michiganders staying home over 2019 - 2020", x="",
y="Percent of population staying home")
You can use the scales functions to set up how different dimensions of the data behave—the x and y axis behaviors, as well as how line colors and shape fills are set up (more on this in a bit).
For example, let’s make the x-axis breaks occur every 3 months and change the format.
Quick primer on date format strings: UNIX strftime date formats
Okay, let’s add a scale:
ggplot(mobilityData) +
geom_point(aes(x = Date, y = PercentHome), color = "#2255AA", alpha = 0.25) +
geom_line(aes(x = Date, y = zoo::rollmean(PercentHome, 7, fill = NA)),
color = "#2255AA", size = 1) +
geom_vline(aes(xintercept = as.Date("2020-03-01"))) +
# Note we only give it an x intercept value since the rest is already set!
geom_text(aes(x = as.Date("2020-03-01"), y = 40),
label = "Pandemic starts", hjust = "left") +
### New scale ###
scale_x_date(breaks = "3 months", labels = "%b %y") +
labs(title="Percent of Michiganders staying home over 2019 - 2020", x="",
y="Percent of population staying home")
ggplot(mobilityData) +
geom_line(aes(x = Date, y = zoo::rollmean(PercentHome, 7, fill = NA),
color = "2020"), size = 1) +
# shift date by 1 year so we can plot 2019 and 2020 on the same 1 year span
geom_line(aes(x = Date + 365, y = zoo::rollmean(PercentHome, 7, fill = NA),
color = "2019"), size = 1) +
scale_x_date(breaks = "3 months", date_labels = "%b %Y",
limits = c(as.Date("2020-01-01"), as.Date("2020-12-31")) ) +
# scale_y_continuous(limits = c(0,40)) + # try this if you want to play with the y axis too!
scale_color_manual(values = c("#264653", "#2a9d8f", "#e9c46a", "#f4a261", "#e76f51")) +
labs(title="Percent of Michiganders staying home for 2019 vs. 2020", x="",
y="Percent of population staying home", color = "")
For overall formatting, you can also add themes! There are a ton of different ones, check out the cheatsheet for more, and the ggthemes package has even more options! (see here)
Themes also let you adjust overall properties, like font size for the whole plot, etc.
For example, let’s add a different theme to the last plot, try one of: theme_bw(), theme_gray() (default theme), theme_dark(), theme_classic(), theme_light(), theme_linedraw(), theme_minimal(), or theme_void()
I’ll add this to my ggplot: theme_classic(base_size = 14) to change themes and increase the overall font size
There are so many other kinds of geoms! The cheatsheet has a fuller list but a lot of the common ones are:
geom_smooth, geom_tile, geom_path, geom_ribbonAnd there are packages that provide even more functionality, like the ggsankeyfier package for Sankey diagrams!
ggplot(mobilityData) + geom_histogram(aes(`Number of Trips`))
ggplot(mobilityData) +
geom_col(aes(x= Date, y = `Number of Trips`), fill = "#44AAAA") +
# note fill vs color
scale_x_date(limits = c(as.Date("2020-03-01"), as.Date("2020-05-31")) ) +
theme_linedraw()
library(ggrepel)
# Load & merge data
IncomeData = read_csv('data/StateIncomeData.csv')
LifespanData = read_csv('data/StateLifeExpectancy.csv')
MergeData = merge(IncomeData, LifespanData, by = "State")
# Plot!
ggplot(MergeData) +
geom_point(aes(x = `Median household income`, y = Life.Expectancy,
size = Population/1000000), color = 'steelblue') +
geom_label_repel(aes(x = `Median household income`,y = Life.Expectancy, label = State)) +
# geom_label(aes(x = `Median household income`+1000,y = Life.Expectancy, label = State),
# hjust = "left") +
labs(size = "Population (millions)")
Faceting let’s you make panel plots based on the variables in your data (e.g. make a plot of cases and facet_wrap will let you make a panel of case plots for all counties/preparedness regions/etc.). See the cheatsheet for more!
For fancier panel/compound plots, try out the package cowplot!
ggsave("plot.png", width = 5, height = 5) saves last plot as 5’ x 5’ file named “plot.png” in working directory. You can use all the usual image file formats (.jpg, etc.) and ggplot will figure it out from the file name.