Plotting temporal data

Many packages exist to visualize temporal data (e.g., geological or evolutionary biological data). deeptime similarly has a few novel ways to help you plot your temporal data in useful ways. We’ll first load some packages and example data so we can demonstrate some of this functionality.

# Load deeptime
library(deeptime)
# Load other packages
library(ggplot2)
library(dplyr)
# Load palaeoverse for tetrapod occurrence data
library(palaeoverse)
data(tetrapods)

Plot occurrences through time

Do you have a bunch of temporal occurrences of taxa or some sort of geological phenomenon, but you don’t want to go through all of the pain of figuring out how to visualize those occurrences and their temporal ranges? And it needs to be customizable and have a pretty geological or stratigraphic timescale on the side? Well, then geom_points_range() is your friend!

geom_points_range() is like ggplot2::geom_pointrange() in that it plots points and their ranges. However, the “raw” data that goes into ggplot2::geom_pointrange() is the lower, upper, and the coordinates for an individual point for each group. First, we’re too lazy to calculate our own ranges! Second, only one point per group? But we have a whole bunch of occurrences for each group that need to be plotted! The difference with geom_points_range() is that the raw data it takes in is all of your grouped temporal data. It then does all of the work for you to plot those individual occurrences AND the ranges of those occurrences for each group. Let’s try it out with some Permian vertebrate occurrence data from the palaeoverse:

# sort the occurrences from most common genera to least common genera
# assume the age is just the mean of the max and min
occdf <- tetrapods %>%
  filter(accepted_rank == "genus") %>%
  select(occurrence_no, accepted_name, max_ma, min_ma) %>%
  mutate(accepted_name = reorder(accepted_name, accepted_name, length)) %>%
  arrange(desc(accepted_name)) %>%
  mutate(age = (max_ma + min_ma) / 2)
# get a reasonable subset of those occurrences
occdf <- occdf[1:300, ]

# plot those occurrences
ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name)) +
  theme_classic()

And then, of course, we want to add a timescale:

ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name)) +
  scale_x_reverse() +
  coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
            abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
  theme_classic()

What if we aren’t certain about some of our data points? Maybe we don’t necessarily know if they are assigned to the correct genus or perhaps we are uncertain about their age? Fortunately, we have all of the tools of ggplot available to us! First we’ll simulate some binary “certainty” values, then we’ll plot certainty as additional aesthetics:

occdf$certainty <- factor(sample(0:1, nrow(occdf), replace = TRUE))

ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name,
                        fill = certainty, linetype = certainty), shape = 21) +
  scale_x_reverse() +
  scale_fill_manual(values = c("white", "black")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
            abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
  theme_classic()

Finally, we can sort the taxa so that they are arranged in order of their earliest occurrence:

occdf$accepted_name <- reorder(occdf$accepted_name, occdf$age, max,
                               decreasing = TRUE)
ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name,
                        fill = certainty, linetype = certainty), shape = 21) +
  scale_x_reverse() +
  scale_fill_manual(values = c("white", "black")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
            abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
  theme_classic()

Note that our uncertain lines (0) always overlap with our certain lines (1), so there is a continuous line spanning each taxon’s entire range. However, if we tweak some of the data, we can change that, causing a gap in the ranges. Note that in this engineered example, the uncertain and certain ranges for Diictodon no longer overlap, so there is a gap between them:

oldest_certain <- occdf %>%
  filter(accepted_name == "Diictodon", certainty == 1) %>%
  pull(age) %>%
  max()

n_uncertain <- sum(occdf$accepted_name == "Diictodon" & occdf$certainty == 0)

# make the uncertain points all much older
occdf$age[occdf$accepted_name == "Diictodon" & occdf$certainty == 0] <-
  oldest_certain + runif(n_uncertain, 15, 30)

ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name,
                        fill = certainty, linetype = certainty), shape = 21) +
  scale_x_reverse() +
  scale_fill_manual(values = c("white", "black")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
            abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
  theme_classic()

However, if we want a line connecting these groups of points, we can fix this by using the background_line argument, which can be a list of aesthetic values to use for the background line segments:

ggplot(data = occdf) +
  geom_points_range(aes(x = age, y = accepted_name,
                        fill = certainty, linetype = certainty), shape = 21,
                    background_line = list(linetype = "dashed")) +
  scale_x_reverse() +
  scale_fill_manual(values = c("white", "black")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
            abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
  theme_classic()

Finally, while I’ve showcased this geom with the use case of plotting occurrence data, note that the potential usage for this function is much broader. Basically any set of data with a categorical and a continuous variable could be visualized like this (when appropriate).

Geological timescale color scales for ggplot

You may also want to color your data based on its age. deeptime has scale_color_geo() and scale_fill_geo() for this very purpose! Note that currently these scales only work with discrete data. The default behavior is for the color/fill aesthetic values to match the names of the intervals in dat. Here, we’ll use the coral_div_dis data from the first vignette tutorial:

ggplot(coral_div_dis, aes(x = n, y = diet, fill = period)) +
  geom_col() +
  scale_fill_geo(periods) +
  xlab("Coral Genera") +
  ylab("Diet") +
  theme_classic()