We’re chuffed to announce the release of dbplyr 2.3.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.
You can install it from CRAN with:
install.packages("{package}")
This post will highlight some of the most important new features in 2.3.0: eliminating subqueries for many verb combinations, better errors, and a handful of new translations. As usual, this release comes with a large number of improvements to translations for individual backends and you can see the full list in the release notes
SQL optimisation
dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combinations of verbs no longer require subqueries:
*_join()
+select()
andselect()
+*_join()
.-
mutate()
+filter()
andfilter()
+filter()
. -
select()
/mutate()
/filter()
+distinct()
. -
summarise()
+filter()
now translates toHAVING
. left/inner_join()
+left/inner_join()
.
Here are a couple of examples of queries that are now much more compact:
lf1 <- lazy_frame(x = 1, a = "a", .name = "lf1")
lf2 <- lazy_frame(x = 1, b = "b", .name = "lf2")
lf3 <- lazy_frame(x = 1, c = "c", .name = "lf3")
lf1 |>
left_join(lf2, by = "x") |>
left_join(lf3, by = "x") |>
select(b, c)
#> <SQL>
#> SELECT `b`, `c`
#> FROM `lf1`
#> LEFT JOIN `lf2`
#> ON (`lf1`.`x` = `lf2`.`x`)
#> LEFT JOIN `lf3`
#> ON (`lf1`.`x` = `lf3`.`x`)
lf1 |>
group_by(x) |>
summarise(a = mean(a, na.rm = TRUE), n = n()) |>
filter(n > 5)
#> <SQL>
#> SELECT `x`, AVG(`a`) AS `a`, COUNT(*) AS `n`
#> FROM `lf1`
#> GROUP BY `x`
#> HAVING (COUNT(*) > 5.0)
(As ususal in these blog posts, I’m using
lazy_frame()
to focus on the SQL generation, without having to set up a dummy database.)
Additionally, where possible, dbplyr now uses SELECT *
after a join instead of explicitly selecting every column.
Improved errors
Variables that aren’t found in either the data or in the environment now produce an error:
lf <- lazy_frame(x = 1,y = 2)
lf |> mutate(x = z + 1)
#> Error in `mutate()`:
#> ! Problem while computing `x = z + 1`
#> Caused by error:
#> ! Object `z` not found.
(Previously they were silently translated to SQL variables.)
We’ve also generally reviewed the error messages to ensure they show more clearly where the error happened:
lf |> mutate(x = y %/% 1)
#> Error in `purrr::pmap()` at dbplyr/R/lazy-select-query.R:282:2:
#> ℹ In index: 1.
#> ℹ With name: x.
#> Caused by error in `y %/% 1`:
#> ! %/% is not available in this SQL variant
lf |> mutate(across(x:y, "a"))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(x:y, "a")`
#> Caused by error in `across()`:
#> ! `.fns` must be a NULL, a function, formula, or list
New translations
stringr::str_like()
(new in stringr 1.5.0) is translated to LIKE
:
lf1 |>
filter(stringr::str_like(a, "abc"))
#> <SQL>
#> SELECT *
#> FROM `lf1`
#> WHERE (`a` LIKE 'abc')
dbplyr 2.3.0 is also supports features coming in dplyr 1.1.0:
- The
.by
argument is supported as alternative togroup_by()
. - Passing
...
toacross()
is deprecated because the evaluation timing of...
is ambiguous. - New
pick()
andcase_match()
functions are translated. -
case_when()
now supports the.default
argument.
This version does not support the new join_by()
syntax, but we’re working on it, and we’ll release an update after dplyr 1.1.0 is out.
Acknowledgements
The vast majority of this release (particularly the SQL optimisations) are from Maximilian Girlich; thanks so much for your continued work on this package.
We’d also like to thank all 74 contributors who help in someway, whether it was filing issues or contributing code and documentation: @a4sberg, @ablack3, @akgold, @aleighbrown, @andreassoteriadesmoj, @apalacio9502, @baileych, @barnesparker, @bhuvanesh1707, @bkraft4257, @bobbymc0, @brian-law-rstudio, @bthe, @But2ene, @capitantyler, @carlganz, @cboettig, @chwpearse, @copernican, @DSLituiev, @ehudtr7, @eitsupi, @ejneer, @eutwt, @ewright-vcan, @fabkury, @fh-afrachioni, @fh-mthomson, @filipemsc, @gadenbuie, @gbouzill, @giocomai, @hadley, @hershelm, @iangow, @iMissile, @IndrajeetPatil, @j-wester, @Janlow, @jasonmhoule, @jensmassberg, @jmbarbone, @joe-rodd, @kongdd, @krlmlr, @lschneiderbauer, @machow, @mgarbuzov, @mgirlich, @MichaelChirico, @moodymudskipper, @multimeric, @namarkus, @noamross, @NZambranoc, @oriolarques, @overmar, @owenjonesuob, @p-schaefer, @rohitg33, @rowrowrowyourboat, @rsund, @samssann, @samterfa, @schradj, @scvail195, @slhck, @splaisan, @stephenashton-dhsc, @ThomasMorland, @thothal, @viswaduttp, @XoliloX, and @yuhenghuang.