Introduction
We’re delighted to announce the release of dplyr 0.8.2 on CRAN 🍉 !
This is a minor maintenance release in the 0.8.*
series, addressing a collection of
issues since the
0.8.1 and
0.8.0 versions.
top_n() and top_frac()
top_n() has been around for a long time in dplyr, as a convenient wrapper around filter() and min_rank(), to select top (or bottom) entries in each group of a tibble.
In this release,
top_n() is no longer
limited to a constant number of entries per group, its n
argument is now quoted
to be evaluated later in the context of the group.
Here are the top half countries, i.e. n() / 2
, in terms of life expectancy in 2007.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
gapminder::gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
top_n(n() / 2, lifeExp)
#> # A tibble: 70 x 6
#> # Groups: continent [5]
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Algeria Africa 2007 72.3 33333216 6223.
#> 2 Argentina Americas 2007 75.3 40301927 12779.
#> 3 Australia Oceania 2007 81.2 20434176 34435.
#> 4 Austria Europe 2007 79.8 8199783 36126.
#> 5 Bahrain Asia 2007 75.6 708573 29796.
#> 6 Belgium Europe 2007 79.4 10392226 33693.
#> 7 Benin Africa 2007 56.7 8078314 1441.
#> 8 Canada Americas 2007 80.7 33390141 36319.
#> 9 Chile Americas 2007 78.6 16284741 13172.
#> 10 China Asia 2007 73.0 1318683096 4959.
#> # … with 60 more rows
top_frac() is new convenience shortcut for the top n percent, i.e.
gapminder::gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
top_frac(0.5, lifeExp)
#> # A tibble: 70 x 6
#> # Groups: continent [5]
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Algeria Africa 2007 72.3 33333216 6223.
#> 2 Argentina Americas 2007 75.3 40301927 12779.
#> 3 Australia Oceania 2007 81.2 20434176 34435.
#> 4 Austria Europe 2007 79.8 8199783 36126.
#> 5 Bahrain Asia 2007 75.6 708573 29796.
#> 6 Belgium Europe 2007 79.4 10392226 33693.
#> 7 Benin Africa 2007 56.7 8078314 1441.
#> 8 Canada Americas 2007 80.7 33390141 36319.
#> 9 Chile Americas 2007 78.6 16284741 13172.
#> 10 China Asia 2007 73.0 1318683096 4959.
#> # … with 60 more rows
tbl_vars() and group_cols()
tbl_vars() now returns a dplyr_sel_vars
object that keeps track of the grouping variables. This information powers
group_cols(), which can now be used
in every function that uses tidy selection of columns.
Functions in the tidyverse and beyond may start to use the tbl_vars()/ group_cols() duo, starting from tidyr and this pull request
# pak::pkg_install("tidyverse/tidyr#668")
iris %>%
group_by(Species) %>%
tidyr::gather("flower_att", "measurement", -group_cols())
#> # A tibble: 600 x 3
#> # Groups: Species [3]
#> Species flower_att measurement
#> <fct> <chr> <dbl>
#> 1 setosa Sepal.Length 5.1
#> 2 setosa Sepal.Length 4.9
#> 3 setosa Sepal.Length 4.7
#> 4 setosa Sepal.Length 4.6
#> 5 setosa Sepal.Length 5
#> 6 setosa Sepal.Length 5.4
#> 7 setosa Sepal.Length 4.6
#> 8 setosa Sepal.Length 5
#> 9 setosa Sepal.Length 4.4
#> 10 setosa Sepal.Length 4.9
#> # … with 590 more rows
group_split(), group_map(), group_modify()
group_split() always keeps
a ptype
attribute to track the prototype of the splits.
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_split()
#> list()
#> attr(,"ptype")
#> # A tibble: 0 x 11
#> # … with 11 variables: mpg <dbl>, cyl <dbl>, disp <dbl>, hp <dbl>,
#> # drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>,
#> # carb <dbl>
group_map() and group_modify() benefit from this in the edge case where there are no groups.
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_map(~.x)
#> list()
#> attr(,"ptype")
#> # A tibble: 0 x 10
#> # … with 10 variables: mpg <dbl>, disp <dbl>, hp <dbl>, drat <dbl>,
#> # wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>, carb <dbl>
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_modify(~.x)
#> # A tibble: 0 x 11
#> # Groups: cyl [0]
#> # … with 11 variables: cyl <dbl>, mpg <dbl>, disp <dbl>, hp <dbl>,
#> # drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>,
#> # carb <dbl>
Thanks
Thanks to all contributors for this release.
@abirasathiy, @ajkroeg, @alejandroschuler, @anuj2054, @arider2, @arielfuentes, @artidata, @BenPVD, @bkmontgom, @brodieG, @cderv, @clanker, @clemenshug, @CSheehan1, @danielecook, @dannyparsons, @daskandalis, @davidbaniadam, @DavisVaughan, @deliciouslytyped, @earowang, @fkatharina, @hadley, @Hardervidertsie, @iago-pssjd, @IndrajeetPatil, @jackdolgin, @jangorecki, @jimhester, @jjesusfilho, @jonjhitchcock, @jxu, @krlmlr, @laresbernardo, @lionel-, @LukeGoodsell, @madmark81, @MarkusBerroth, @matheus-donato, @mattfidler, @MatthieuStigler, @md0u80c9, @michaelhogersosis, @MikeJohnPage, @MJL9588, @moodymudskipper, @mwillumz, @Nelson-Gon, @qdread, @randomgambit, @rcorty, @romainfrancois, @romatik, @spressi, @sstoeckl, @stephLH, @urskalbitzer, @vpanfilov, and @ZahraEconomist.