rlang 0.4.0 introduced the curly-curly {{
operator to simplify writing functions around tidyverse pipelines. The minor update 0.4.3 of rlang makes it possible to use {
and {{
to create result names in tidyverse verbs taking pairs of names and expressions.
Install the latest version of rlang to make the new feature globally available throughout the tidyverse:
install.packages("rlang")
Tunnelling data-variables with curly-curly
With the {{
operator you can tunnel data-variables (i.e. columns from the data frames) through arg-variables (function arguments):
library(tidyverse)
mean_by <- function(data, by, var) {
data %>%
group_by({{ by }}) %>%
summarise(avg = mean({{ var }}, na.rm = TRUE))
}
The tunnel makes it possible to supply variables from the data frame to your wrapper function:
iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#> Species avg
#> <fct> <dbl>
#> 1 setosa 3.43
#> 2 versicolor 2.77
#> 3 virginica 2.97
Without a tunnel, the ambiguity between data-variables and arg-variables causes R to complain about objects not found:
mean_by_no_tunnel <- function(data, by, var) {
data %>%
group_by(by) %>%
summarise(avg = mean(var, na.rm = TRUE))
}
iris %>% mean_by_no_tunnel(Species, Sepal.Width)
#> Error: Must group by variables found in `.data`
#> * Column `by` is not found
That’s because of the ambiguity between the function argument by
and the data-variable Species
. R has no way of knowing that you meant the variable from the data frame.
Custom result names
In the example above, the result name is hard-coded to avg
. This is an informative generic name, but returning a more specific name that reflects the context might make the function more helpful. For this reason, tidy eval functions taking dots (like dplyr::mutate()
, dplyr::group_by()
, or dplyr::summarise()
) now support glue strings as result names.
Glue strings are implemented in the glue package. They are a flexible way of composing a string from components, interpolating R code within the string:
library(glue)
#>
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
name <- "Bianca"
glue("The result of `1 + 2` is {1 + 2}, so says {name}.")
#> The result of `1 + 2` is 3, so says Bianca.
You can now use glue strings in result names. Note that for technical reasons you need the Walrus operator :=
instead of the usual =
.
suffix <- "foo"
iris %>% summarise("prefix_{suffix}" := mean(Sepal.Width))
#> prefix_foo
#> 1 3.057333
In addition to normal glue interpolation with {
, you can also tunnel data-variables through function arguments with {{
inside the string:
mean_by <- function(data, by, var) {
data %>%
group_by({{ by }}) %>%
summarise("{{ var }}" := mean({{ var }}, na.rm = TRUE))
}
iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#> Species Sepal.Width
#> <fct> <dbl>
#> 1 setosa 3.43
#> 2 versicolor 2.77
#> 3 virginica 2.97
And you can combine both forms of interpolation in a same glue string:
mean_by <- function(data, by, var, prefix = "avg") {
data %>%
group_by({{ by }}) %>%
summarise("{prefix}_{{ var }}" := mean({{ var }}, na.rm = TRUE))
}
iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#> Species avg_Sepal.Width
#> <fct> <dbl>
#> 1 setosa 3.43
#> 2 versicolor 2.77
#> 3 virginica 2.97
You can learn more about tunnelling variables in this RStudio::conf 2020 talk.
Acknowledgements
Read about other bugfixes and features from the 0.4.3 release in the changelog. Many thanks to all the contributors for this release!
@chendaniely, @clauswilke, @DavisVaughan, @enoshliang, @hadley, @ianmcook, @jennybc, @krlmlr, @lionel-, @moodymudskipper, @neelan29, @nick-youngblut, @nteetor, @romainfrancois, @TylerGrantSmith, @vspinu, and @yutannihilation