We are very excited to announce that dbplyr 1.2.0 is now available on CRAN! dbplyr is the database backend for dplyr. It interacts with databases directly by translating dplyr code into SQL queries. dbplyr is part of an expanding ecosystem of packages meant to help analysts access and analyze data in databases. To learn more, visit our website dedicated to all things R and databases: http://db.rstudio.com/
Install the latest version of dbplyr with:
install.packages("dbplyr")
What’s new?
Here are some highlights of new features and fixes found in this release. To see the full details please refer to the release notes:
New custom translation for Microsoft Access and Teradata
Amazon Redshift connections now point to the PostgreSQL translation.
- Adds support for two new database R packages. These new packages are fully DBI-compliant and tested with DBItest. We recommend to use these instead of older packages:
- ROracle connections now point to the Oracle translation.
Cache query results using
copy_to
. This feature is very useful because after cached, the results of a long running query can be iteratively analyzed without having to wait for the same base query to run over and over.stringr functions:
str_length()
,str_to_upper()
,str_to_lower()
,str_replace_all()
,str_detect()
, andstr_trim()
are now supported.in_schema()
should now work in more places, particularly incopy_to()
For those who need to extract the resulting SQL query from dplyr code, there is now a new function called
remote_query()
. Unlikeshow_query()
,remote_query()
returns a character object, and unlikesql_render()
, the new function automatically detects the type of database attached to the given dplyr code and translates accordingly. There are four additional functions that provide a standard API to get metadata about a remotetbl
, these are:remote_name()
,remote_con()
,remote_src()
, andremote_query_plan()
.Adds support for temporary tables in Microsoft SQL Server. Additionally, the issue of certain operators working only in
mutate()
or only infilter()
has been resolved.Two new functions for developers:
sql_aggregate()
andwin_aggregate()
. They are for generating SQL and windowed SQL functions for aggregates.
Caching results
The copy_to()
function is now able to cache the results of a set of dplyr database transformations. An additional advantage is that copy_to()
uses the database to run and cache the data, so there is no data being transmitted back to R. For this feature to work, the user needs sufficient rights to create temporary tables in the database.
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, rownames_to_column(mtcars), "mtcars")
only_auto <- mtcars_db %>%
filter(am == 1) %>%
select(mpg, cyl, am)
auto <- copy_to(con, only_auto, "auto") # New in dbplyr 1.2.0
head(auto)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.19.3 [:memory:]
#> mpg cyl am
#> <dbl> <dbl> <dbl>
#> 1 21.0 6.00 1.00
#> 2 21.0 6.00 1.00
#> 3 22.8 4.00 1.00
#> 4 32.4 4.00 1.00
#> 5 30.4 4.00 1.00
#> 6 33.9 4.00 1.00
If the set of dplyr transformations need to be cached to a different database, copy_to()
is smart enough to download the data into memory in R, and then uploads the data to the target database.
second_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
second_auto <- copy_to(second_con, only_auto, "auto") # New in dbplyr 1.2.0
head(second_auto)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.19.3 [:memory:]
#> mpg cyl am
#> <dbl> <dbl> <dbl>
#> 1 21.0 6.00 1.00
#> 2 21.0 6.00 1.00
#> 3 22.8 4.00 1.00
#> 4 32.4 4.00 1.00
#> 5 30.4 4.00 1.00
#> 6 33.9 4.00 1.00
stringr functions
Six stringr functions are now supported. Regular expression support varies from database to database, but most simple regular expressions should be ok. The functions are:
str_length()
str_to_upper()
str_to_lower()
str_replace_all()
str_detect()
str_trim()
tbl(con, "mtcars") %>%
head() %>%
select(rowname) %>%
mutate(
s1 = str_length(rowname), # New in dbplyr 1.2.0
s2 = str_to_upper(rowname), # New in dbplyr 1.2.0
s3 = str_to_lower(rowname), # New in dbplyr 1.2.0
s4 = str_replace_all(rowname, "M", "X"), # New in dbplyr 1.2.0
s5 = str_detect(rowname, "M"), # New in dbplyr 1.2.0
s6 = str_trim(rowname) # New in dbplyr 1.2.0
)
#> # Source: lazy query [?? x 7]
#> # Database: sqlite 3.19.3 [:memory:]
#> rowname s1 s2 s3 s4 s5 s6
#> <chr> <int> <chr> <chr> <chr> <int> <chr>
#> 1 Mazda RX4 9 MAZDA RX4 mazda r~ Xazda ~ 0 Mazda ~
#> 2 Mazda RX4 Wag 13 MAZDA RX4 WAG mazda r~ Xazda ~ 0 Mazda ~
#> 3 Datsun 710 10 DATSUN 710 datsun ~ Datsun~ 0 Datsun~
#> 4 Hornet 4 Drive 14 HORNET 4 DRIVE hornet ~ Hornet~ 0 Hornet~
#> 5 Hornet Sportabout 17 HORNET SPORTABOUT hornet ~ Hornet~ 0 Hornet~
#> 6 Valiant 7 VALIANT valiant Valiant 0 Valiant
Contributors
A big thanks goes to those who made this release possible by contributing code or documentation: @DavisVaughan, @baileych, @Hong-Revo, @cwarden, @zozlak, @denismaciel, @jonassundman, @wibeasley, @mungojam, @hoxo-m, @dpprdan,@jimhester, @krlmlr, and @javierluraschi. Also, a thank you to those who helped up improve this package by suggesting features and reporting issues: @drf5n, @larspijnappel, @lbakerIsazi, @mtmorgan, @refik, @sicarul, and @vitallish