I’m very excited to announce the ninth and final blog post in the dplyr 1.0.0 series: dplyr 1.0.0 is now available from CRAN! Install it by running:
install.packages("dplyr")
Then load it with:
library(dplyr)
New features
dplyr 1.0.0 is chock-a-block with new features; so many, in fact, that we can’t fit them all into one post. So if you want to learn more about what’s new, we recommend reading our existing series of posts:
-
Major lifecycle changes. This post focusses on the idea of the “function lifecycle” which helps you understand where functions in dplyr are going. Particularly important is the idea of a “superseded” function. A superseded function is not going away, but we no longer recommend using it in new code.
-
New
summarise()
features. Insummarise()
, a single summary expression can now create both multiple rows and multiple columns. This significantly increases its power and flexibility. -
select()
,rename()
, and (new)relocate()
.select()
andrename()
can now select by position, name, function of name, type, and any combination thereof. A newrelocate()
function makes it easy to change the position of columns. -
Working
across()
columns. A newacross()
function makes it much easier to apply the same operation to multiple columns. It supersedes the_if()
,_at()
, and_all()
function variants. -
Working within rows.
rowwise()
has been renewed and revamped to make it easier to perform operations row-by-row. This makes it much easier to solve problems that previously requiredbase::lapply()
,purrr::map()
, or friends. -
The role of the vctrs package. dplyr now makes heavy use of vctrs behind the scenes. This brings with it greater consistency and (hopefully!) more useful error messages.
-
Last minute additions
summarise()
now allows you to control how its results are grouped, and there’s a new family of functions designed for modifying rows.
You can see the full list of changes in the release notes.
New logo
dplyr has a new logo thanks to the talented Allison Horst!
(Stay tuned for details about how to get this sticker on to your laptop. We have some exciting news coming up!)
A small teaser
The best way to find out about all the cool new features dplyr has to offer is to read through the blog posts linked to above. But thanks to inspiration from Daniel Anderson here’s one example of fitting two different models by subgroup that shows off a bunch of cool features:
library(dplyr, warn.conflicts = FALSE)
models <- tibble::tribble(
~model_name, ~ formula,
"length-width", Sepal.Length ~ Petal.Width + Petal.Length,
"interaction", Sepal.Length ~ Petal.Width * Petal.Length
)
iris %>%
nest_by(Species) %>%
left_join(models, by = character()) %>%
rowwise(Species, model_name) %>%
mutate(model = list(lm(formula, data = data))) %>%
summarise(broom::glance(model))
#> `summarise()` regrouping output by 'Species', 'model_name' (override with `.groups` argument)
#> # A tibble: 6 x 13
#> # Groups: Species, model_name [6]
#> Species model_name r.squared adj.r.squared sigma statistic p.value df
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 setosa length-wi… 0.112 0.0739 0.339 2.96 6.18e- 2 3
#> 2 setosa interacti… 0.133 0.0760 0.339 2.34 8.54e- 2 4
#> 3 versic… length-wi… 0.574 0.556 0.344 31.7 1.92e- 9 3
#> 4 versic… interacti… 0.577 0.549 0.347 20.9 1.11e- 8 4
#> 5 virgin… length-wi… 0.747 0.736 0.327 69.3 9.50e-15 3
#> 6 virgin… interacti… 0.757 0.741 0.323 47.8 3.54e-14 4
#> # … with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
#> # df.residual <int>
Note the use of:
-
The new
nest_by()
, which generates a nested data frame where each row represents one subgroup. -
In
left_join()
,by = character()
which now performs a Cartesian product, generating every combination of subgroup and model. -
The newly powerful
summarise()
which summarises each model with the model fit statistics computed bybroom::glance()
.
Acknowledgements
dplyr 1.0.0 has been one of the biggest projects that we, as a team, have ever tackled. Almost everyone in the tidyverse team has been involved in some capacity. Special thanks go to Romain François, who in his role as primary developer has been working on this release for over six months, and to Lionel Henry and Davis Vaughn for all their work on the vctrs package. Jim Hester’s work on running revdep checks in the cloud also made a big impact on our ability to understand failure modes.
A big thanks to all 137 members of the dplyr community who helped make this release possible by finding bugs, discussing issues, and writing code: @AdaemmerP, @adelarue, @ahernnelson, @alaataleb111, @antoine-sachet, @atusy, @Auld-Greg, @b-rodrigues, @batpigandme, @bedantaguru, @benjaminschlegel, @benjbuch, @bergsmat, @billdenney, @brianmsm, @bwiernik, @caldwellst, @cat-zeppelin, @chillywings, @clauswilke, @colearendt, @DanChaltiel, @danoreper, @danzafar, @davidbaniadam, @DavisVaughan, @dblodgett-usgs, @ddsjoberg, @deschen1, @dfrankow, @DiegoKoz, @dkahle, @DzimitryM, @earowang, @echasnovski, @edwindj, @elbersb, @elcega, @ericemc3, @espinielli, @FedericoConcas, @FlukeAndFeather, @GegznaV, @gergness, @ggrothendieck, @glennmschultz, @gowerc, @greg-minshall, @gregorp, @ha0ye, @hadley, @Harrison4192, @henry090, @hughjonesd, @ianmcook, @ismailmuller, @isteves, @its-gazza, @j450h1, @Jagadeeshkb, @jarauh, @jason-liu-cs, @jayqi, @JBGruber, @jemus42, @jennybc, @jflournoy, @jhuntergit, @JohannesNE, @jzadra, @karldw, @kassambara, @klin333, @knausb, @kriemo, @krispiepage, @krlmlr, @kvasilopoulos, @larry77, @leonawicz, @lionel-, @lorenzwalthert, @LudvigOlsen, @madlogos, @markdly, @markfairbanks, @meghapsimatrix, @meixiaba, @melissagwolf, @mgirlich, @Michael-Sheppard, @mikmart, @mine-cetinkaya-rundel, @mir-cat, @mjsmith037, @mlane3, @msberends, @msgoussi, @nefissakhd, @nick-youngblut, @nzbart, @pavel-shliaha, @pdbailey0, @pnacht, @ponnet, @r2evans, @ramnathv, @randy3k, @richardjtelford, @romainfrancois, @rorynolan, @ryanvoyack, @selesnow, @selin1st, @sewouter, @sfirke, @SimonDedman, @sjmgarnier, @smingerson, @stefanocoretta, @strengejacke, @tfkillian, @tilltnet, @tonyvibe, @topepo, @torockel, @trinker, @tungmilan, @tzakharko, @uasolo, @werkstattcodes, @wlandau, @xiaoa6435, @yiluheihei, @yutannihilation, @zenggyu, and @zkamvar.