Off-label uses in ggplot2

  ggplot2, off-label

  Thomas Lin Pedersen

ggplot2 v3.3.4 landed on CRAN recently, and while every release of ggplot2 is cause for celebration, this was merely a patch release fixing a large number of bugs and so it came and went without much fanfare. However, for a couple of users this release brought an unwelcome and surprising change. We feel that this is a great opportunity to talk a bit about some of the topics that Hadley discussed in his rstudio::global(2021) keynote, particularly the nature of breaking changes.

The surprising use of ggsave()

We created ggsave() as an easy way to save a ggplot object to an image file, using the following API:

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy))

ggsave("my_mpg_plot.png")

ggsave() is designed so that it automatically picks up the last created (or rendered) plot, and coupled with automatic graphic device selection determined from the file extension it provides a very lean API.

The issue we will discuss in this blog post revolves around the use of ggsave() in the following manner:

library(ggplot2)

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  ggsave("my_mpg_plot.png")

Now, if this is the first time you’ve seen ggsave() being added to a plot, you are not alone. This certainly caught us by surprise. Prior to v3.3.4, this actually worked (more on that later) but with the recent release running this code will result in the following error:

Error: Can't add [`ggsave("my_mpg_plot.png")`](https://ggplot2.tidyverse.org/reference/ggsave.html) to a ggplot object.

If you were a user that had used this pattern for saving plots it very much felt like we had removed a feature, pulling the rug out from under your script with no warning. However, this use of ggsave() had never been advertised in any of the documentation and while it worked, it could not be considered a feature as such.

Off-label saving

We believe that this usage of ggsave() is the off-label use that Hadley talks about in his keynote. Off-label use of functions comprise of using functions in a way that only work by accident, and are thus susceptible to breakage at any point due to changes in the code. Another common word for this is “a hack”, but this term can often imply that the user is full aware of the brittle nature of the setup. Off-label use can just as well be passed on between users to a point where some thinks that this is the correct, supported, way of doing things (this was certainly the case with the above issue).

In an age of the pipe it is easy to understand why this use was picked up and thought off as a real feature. +, however, is not %>% (or |>). It is a compositional operator meant to assemble the description of a plot. There is no execution of logic (besides the assembly) going on, and thus the idea of adding ggsave() does not make theoretical nor practical sense. This is also the reason why we do not want to “fix” this issue and turn it into a regular feature.

Why did it work, why did it fail

For those interested in the cause of both the accidental functionality and its breakage, here follows a description. ggsave() can be used to save any plot object but defaults to the object returned by ggplot2::last_plot(). This function returns the last rendered or modified plot object. That means that whenever you add something to a plot the result will be retrievable with last_plot() but only until you manipulate or render another plot. What happens when adding ggsave() to a plot is that all the additions are resolved from the left and at each point the result is pushed to the last_plot() store. When it comes to the ggsave() term, it will evaluate it and add the result to the plot. Since the expected plot is present in the last_plot() store the evaluation of ggsave() will proceed as expected. Prior to ggplot2 v3.3.4 ggsave() returned NULL which, when added to a ggplot object is a no-op (i.e. it does nothing). The change that provoked the error is that with v3.3.4 ggsave() now returns the path to the saved file invisibly, and adding a string to a plot object is an error.

Based on this understanding there are some interesting observations we can make: First, while you’ll get an error in v3.3.4, the plot is actually saved to a file since the error is thrown after the evaluation of ggsave(). This means that you can “fix” your code by putting the whole expression in a try() block (please don’t do this though 😬):

try(
  ggplot(mpg) + 
    geom_point(aes(x = displ, y = hwy)) + 
    ggsave("my_mpg_plot.png")
)

Another tidbit is that the perceived feature was extremely brittle, even when it worked. Consider the following code:

p1 <- ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy))

p2 <- ggplot(mpg) + 
  geom_bar(aes(x = cyl))

p1 + ggsave("scatterplot.png")
p2 + ggsave("barplot.png")

If you assumed that ggsave() could be added to a plot you’d expect the above to be totally valid code and that scatterplot.png would contain the plot from p1, and barplot.png would contain the plot from p2. However, since ggsave() just fetched the last modified or rendered plot by default, both png files would be identical and contain the barplot in p2.

Wrapping up

In the end this short post is not intended to shame the users who used ggsave() in an unsupported way. ggplot2 is such a huge package that it is easy to pick up usage patterns without ever thinking about whether it is the correct way - if it works it works. Instead, this post is meant to showcase how, even with rigorous testing and no breaking changes, an update can break someones workflow, often to the surprise of the developer. Once a package becomes popular enough, even the slightest change in the code have the capacity for disruption.