bench is now available on CRAN!
The goal of bench is to benchmark code, by tracking execution time, memory allocations and garbage collections.
Install the latest version with:
install.packages("bench")
Usage
Benchmarks can be run with bench::mark()
, which takes one or more expressions to benchmark against each other.
library(bench)
set.seed(42)
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))
bench::mark()
will throw an error if the results are not equivalent, so you don’t accidentally benchmark non-equivalent code.
bench::mark(
dat[dat$x > 500, ],
dat[which(dat$x > 499), ],
subset(dat, x > 500))
#> Error: Each result must equal the first result:
#> `dat[dat$x > 500, ]` does not equal `dat[which(dat$x > 499), ]`
Results are easy to interpret, with human readable units in a rectangular data frame.
bnch <- bench::mark(
dat[dat$x > 500, ],
dat[which(dat$x > 500), ],
subset(dat, x > 500))
bnch
#> # A tibble: 3 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time
#> <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm>
#> 1 dat[dat$x > 500, ] 300µs 347µs 321µs 1.26ms 2884. 416KB 55 949 329ms
#> 2 dat[which(dat$x > 500), ] 230µs 281µs 259µs 1.12ms 3563. 357KB 52 1156 324ms
#> 3 subset(dat, x > 500) 374µs 461µs 420µs 1.52ms 2169. 548KB 43 803 370ms
By default, the summary uses absolute measures, however relative results can be obtained by using relative = TRUE
in your call to bench::mark()
or by calling summary(relative = TRUE)
on the results.
summary(bnch, relative = TRUE)
#> # A tibble: 3 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 dat[dat$x > 500, ] 1.30 1.24 1.24 1.13 1.33 1.16 1.28 1.18 1.01
#> 2 dat[which(dat$x > 500), ] 1 1 1 1 1.64 1 1.21 1.44 1
#> 3 subset(dat, x > 500) 1.63 1.64 1.62 1.36 1 1.53 1 1 1.14
bench::press()
is used to run benchmarks against a grid of parameters. Provide setup and benchmarking code as a single unnamed argument then define sets of values as named arguments. The full combination of values will be expanded and the benchmarks are then pressed together in the result. This allows you to benchmark a set of expressions across a wide variety of input sizes, perform replications and other useful tasks.
set.seed(42)
create_df <- function(rows, cols) {
as.data.frame(setNames(
replicate(cols, runif(rows, 1, 1000), simplify = FALSE),
rep_len(c("x", letters), cols)))
}
results <- bench::press(
rows = c(10000, 100000),
cols = c(10, 100),
{
dat <- create_df(rows, cols)
bench::mark(
min_iterations = 100,
bracket = dat[dat$x > 500, ],
which = dat[which(dat$x > 500), ],
subset = subset(dat, x > 500)
)
}
)
results
#> # A tibble: 12 x 12
#> expression rows cols min mean median max `itr/sec` mem_alloc n_gc n_itr total_time
#> <chr> <dbl> <dbl> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm>
#> 1 bracket 10000 10 830µs 1.06ms 987.08µs 2.29ms 940. 1.17MB 18 304 323.47ms
#> 2 which 10000 10 447.96µs 652.94µs 564.73µs 1.6ms 1532. 827.04KB 21 551 359.77ms
#> 3 subset 10000 10 906.91µs 1.15ms 1.04ms 2.27ms 866. 1.28MB 21 320 369.44ms
#> 4 bracket 100000 10 14.96ms 17.34ms 17.39ms 19.95ms 57.7 11.54MB 46 54 936.47ms
#> 5 which 100000 10 9.09ms 11.24ms 11.04ms 15.25ms 89.0 7.91MB 32 68 764.24ms
#> 6 subset 100000 10 14.76ms 16.86ms 16.07ms 20.74ms 59.3 12.68MB 46 54 910.46ms
#> 7 bracket 10000 100 7.19ms 9.16ms 8.76ms 13ms 109. 9.71MB 34 66 604.84ms
#> 8 which 10000 100 2.74ms 4.17ms 3.98ms 8.17ms 240. 5.91MB 19 81 338.03ms
#> 9 subset 10000 100 7.19ms 9.63ms 9.46ms 12.54ms 104. 9.84MB 35 65 626.03ms
#> 10 bracket 100000 100 100.19ms 111.1ms 111.08ms 121.63ms 9.00 97.47MB 83 21 2.33s
#> 11 which 100000 100 54.19ms 59.62ms 59.36ms 65.77ms 16.8 59.51MB 36 64 3.82s
#> 12 subset 100000 100 103.36ms 113.58ms 111.83ms 134ms 8.80 98.62MB 84 16 1.82s
Plotting
ggplot2::autoplot()
can be used to generate an informative default plot. This plot is colored by GC level (0, 1, or 2) and faceted by parameters (if any). By default it generates a beeswarm plot, however you can also specify other plot types (jitter
, ridge
, boxplot
, violin
). See ?autoplot.bench_mark
for full details. This gives you a nice overview of the runs and allows you to gauge the effects of garbage collection on the results.
ggplot2::autoplot(results)
You can also produce fully custom plots by un-nesting the results and working with the data directly. In this case we are exploring how the amount of memory allocated by each expression interacts with the time taken to run.
library(tidyverse)
results %>%
unnest() %>%
filter(gc == "none") %>%
ggplot(aes(x = mem_alloc, y = time, color = expression)) +
geom_point() +
scale_color_brewer(type = "qual", palette = 3) +
geom_smooth(method = "lm", se = F, colour = "grey50")
Compared to existing methods
Compared to other methods such as system.time, rbenchmark, tictoc or microbenchmark we feel it has a number of benefits.
- Uses the highest precision APIs available for each operating system (often nanosecond-level).
- Tracks memory allocations for each expression.
- Tracks the number and type of R garbage collections per run.
- Verifies equality of expression results by default, to avoid accidentally benchmarking non-equivalent code.
- Uses adaptive stopping by default, running each expression for a set amount of time rather than for a specific number of iterations.
- Runs expressions in batches and calculates summary statistics after filtering out iterations with garbage collections. This allows you to isolate the performance and effects of garbage collection on running time (for more details see Neal 2014).
- Allows benchmarking across a grid of input values with
bench::press()
.
Dependency load
When the development version of bench was introduced a few people expressed concern over the number of dependencies in the package. I will attempt to explain why these dependencies exist and why the true load may actually be less than you might think.
While bench currently has 19 dependencies, only 8 of these are hard dependencies; that is they are needed to install the package. Of these 8 hard dependencies 3 of them (methods, stats, utils) are base packages installed with R. Of these 5 remaining packages 3 have no additional dependencies (glue, profmem, rlang). The two remaining packages (tibble and pillar) are used to provide nice printing of the times and memory sizes and support for list columns to store the timings, garbage collections, and allocations. These are major features of the bench package and it would not work without these dependencies.
The remaining 11 packages are soft dependencies, used either for testing or for optional functionality, most notably plotting. They will not be installed unless explicitly requested.
The microbenchmark package is a good alternative for those looking for a package with only base dependencies.
Feedback wanted!
We hope bench is a useful tool for benchmarking short expressions of code. Please open GitHub issues for any feature requests or bugs.
Learn more about bench at http://bench.r-lib.org
A big thanks goes to all the community members who contributed code and opened issues since for this release! @espinielli, @hadley, @HughParsonage, @jasonserviss, @jimhester, @jonocarroll, @lionel-, @MilesMcBain, @njtierney, and @zkamvar