We’re indubitably amped to announce the release of tune 1.2.0, a package for hyperparameter tuning in the tidymodels framework.
You can install it from CRAN, along with the rest of the core packages in tidymodels, using the tidymodels meta-package:
install.packages("tidymodels")
The 1.2.0 release of tune has introduced support for two major features that we’ve written about on the tidyverse blog already:
While those features got their own blog posts, there are several more features in this release that we thought were worth calling out. This post will highlight improvements to our support for parallel processing, the introduction of support for percentile confidence intervals for performance metrics, and a few other bits and bobs. You can see a full list of changes in the release notes.
Throughout this post, I’ll refer to the example of tuning an XGBoost model to predict the fuel efficiency of various car models. I hear this is already a well-explored modeling problem, but alas:
set.seed(2024)
xgb_res <-
tune_grid(
boost_tree(mode = "regression", mtry = tune(), learn_rate = tune()),
mpg ~ .,
bootstraps(mtcars),
control = control_grid(save_pred = TRUE)
)
Note that we’ve used the
control option save_pred = TRUE
to indicate that we want to save the predictions from our resampled models in the tuning results. Both int_pctl()
and compute_metrics()
below will need those predictions. The metrics for our resampled model look like so:
collect_metrics(xgb_res)
#> # A tibble: 20 × 8
#> mtry learn_rate .metric .estimator mean n std_err .config
#> <int> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 2 0.00204 rmse standard 19.7 25 0.262 Preprocessor1_Model01
#> 2 2 0.00204 rsq standard 0.659 25 0.0314 Preprocessor1_Model01
#> 3 6 0.00859 rmse standard 18.0 25 0.260 Preprocessor1_Model02
#> 4 6 0.00859 rsq standard 0.607 25 0.0270 Preprocessor1_Model02
#> 5 3 0.0276 rmse standard 14.0 25 0.267 Preprocessor1_Model03
#> 6 3 0.0276 rsq standard 0.710 25 0.0237 Preprocessor1_Model03
#> # ℹ 14 more rows
Modernized support for parallel processing
The tidymodels framework has long supported evaluating models in parallel using the foreach package. This release of tune has introduced support for parallelism using the futureverse framework, and we will begin deprecating our support for foreach in a coming release.
To tune a model in parallel with foreach, a user would load a parallel backend package (usually with a name like
library(doBackend)
) and then register it with foreach (with a function call like registerDoBackend()
). The tune package would then detect that registered backend and take it from there. For example, the code to distribute the above tuning process across 10 cores with foreach would look like:
library(doMC)
registerDoMC(cores = 10)
set.seed(2024)
xgb_res <-
tune_grid(
boost_tree(mode = "regression", mtry = tune(), learn_rate = tune()),
mpg ~ .,
bootstraps(mtcars),
control = control_grid(save_pred = TRUE)
)
The code to do so with future is similarly simple. Users first load the
future package, and then specify a
plan()
which dictates how computations will be distributed. For example, the code to distribute the above tuning process across 10 cores with future looks like:
library(future)
plan(multisession, workers = 10)
set.seed(2024)
xgb_res <-
tune_grid(
boost_tree(mode = "regression", mtry = tune(), learn_rate = tune()),
mpg ~ .,
bootstraps(mtcars),
control = control_grid(save_pred = TRUE)
)
For users, the transition to parallelism with future has several benefits:
- The futureverse presently supports a greater number of parallelism technologies and has been more likely to receive implementations for new ones.
- Once foreach is fully deprecated, users will be able to use the interactive logger when tuning in parallel.
From our perspective, transitioning our parallelism support to future makes our packages much more maintainable, reducing complexity in random number generation, error handling, and progress reporting.
In an upcoming release of the package, you’ll see a deprecation warning when a foreach parallel backend is registered but no future plan has been specified, so start transitioning your code sooner than later!
Percentile confidence intervals
Following up on changes in the
most recent rsample release, tune has introduced a
method for int_pctl()
that calculates percentile confidence intervals for performance metrics. To calculate a 90% confidence interval for the values of each performance metric returned in collect_metrics()
, we’d write:
set.seed(2024)
int_pctl(xgb_res, alpha = .1)
#> # A tibble: 20 × 8
#> .metric .estimator .lower .estimate .upper .config mtry learn_rate
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <dbl>
#> 1 rmse bootstrap 18.1 19.9 22.0 Preprocessor1_Mod… 2 0.00204
#> 2 rsq bootstrap 0.570 0.679 0.778 Preprocessor1_Mod… 2 0.00204
#> 3 rmse bootstrap 16.6 18.3 19.9 Preprocessor1_Mod… 6 0.00859
#> 4 rsq bootstrap 0.548 0.665 0.765 Preprocessor1_Mod… 6 0.00859
#> 5 rmse bootstrap 12.5 14.1 15.9 Preprocessor1_Mod… 3 0.0276
#> 6 rsq bootstrap 0.622 0.720 0.818 Preprocessor1_Mod… 3 0.0276
#> # ℹ 14 more rows
Note that the output has the same number of rows as the collect_metrics()
output: one for each unique pair of metric and workflow.
This is very helpful for validation sets. Other resampling methods generate replicated performance statistics. We can compute simple interval estimates using the mean and standard error for those. Validation sets produce only one estimate, and these bootstrap methods are probably the best option for obtaining interval estimates.
Breaking change: relocation of ellipses
We’ve made a breaking change in argument order for several functions in the package (and downstream packages like finetune and workflowsets). Ellipses (…) are now used consistently in the package to require optional arguments to be named. For functions that previously had unused ellipses at the end of the function signature, they have been moved to follow the last argument without a default value, and several other functions that previously did not have ellipses in their signatures gained them. This applies to methods for augment()
, collect_predictions()
, collect_metrics()
, select_best()
, show_best()
, and conf_mat_resampled()
.
Compute new metrics without re-fitting
We’ve also added a new function,
compute_metrics()
, that allows for calculating metrics that were not used when evaluating against resamples. For example, consider our xgb_res
object. Since we didn’t supply any metrics to evaluate, and this model is a regression model, tidymodels selected RMSE and R2 as defaults:
collect_metrics(xgb_res)
#> # A tibble: 20 × 8
#> mtry learn_rate .metric .estimator mean n std_err .config
#> <int> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 2 0.00204 rmse standard 19.7 25 0.262 Preprocessor1_Model01
#> 2 2 0.00204 rsq standard 0.659 25 0.0314 Preprocessor1_Model01
#> 3 6 0.00859 rmse standard 18.0 25 0.260 Preprocessor1_Model02
#> 4 6 0.00859 rsq standard 0.607 25 0.0270 Preprocessor1_Model02
#> 5 3 0.0276 rmse standard 14.0 25 0.267 Preprocessor1_Model03
#> 6 3 0.0276 rsq standard 0.710 25 0.0237 Preprocessor1_Model03
#> # ℹ 14 more rows
In the past, if you wanted to evaluate that workflow against a performance metric that you hadn’t included in your tune_grid()
run, you’d need to re-run tune_grid()
, fitting models and predicting new values all over again. Now, using the compute_metrics()
function, you can use the tune_grid()
output you’ve already generated and compute any number of new metrics without having to fit any more models as long as you use the control option save_pred = TRUE
when tuning.
So, say I want to additionally calculate Huber Loss and Mean Absolute Percent Error. I just pass those metrics along with the tuning result to compute_metrics()
, and the result looks just like collect_metrics()
output for the metrics originally calculated:
compute_metrics(xgb_res, metric_set(huber_loss, mape))
#> # A tibble: 20 × 8
#> mtry learn_rate .metric .estimator mean n std_err .config
#> <int> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 2 0.00204 huber_loss standard 18.3 25 0.232 Preprocessor1_Mode…
#> 2 2 0.00204 mape standard 94.4 25 0.0685 Preprocessor1_Mode…
#> 3 6 0.00859 huber_loss standard 16.7 25 0.229 Preprocessor1_Mode…
#> 4 6 0.00859 mape standard 85.7 25 0.178 Preprocessor1_Mode…
#> 5 3 0.0276 huber_loss standard 12.6 25 0.230 Preprocessor1_Mode…
#> 6 3 0.0276 mape standard 64.4 25 0.435 Preprocessor1_Mode…
#> # ℹ 14 more rows
Easily pivot resampled metrics
Finally, the collect_metrics()
method for tune results recently
gained a new argument, type
, indicating the shape of the returned metrics. The default, type = "long"
, is the same shape as before. The argument value type = "wide"
will allot each metric its own column, making it easier to compare metrics across different models.
collect_metrics(xgb_res, type = "wide")
#> # A tibble: 10 × 5
#> mtry learn_rate .config rmse rsq
#> <int> <dbl> <chr> <dbl> <dbl>
#> 1 2 0.00204 Preprocessor1_Model01 19.7 0.659
#> 2 6 0.00859 Preprocessor1_Model02 18.0 0.607
#> 3 3 0.0276 Preprocessor1_Model03 14.0 0.710
#> 4 2 0.0371 Preprocessor1_Model04 12.3 0.728
#> 5 5 0.00539 Preprocessor1_Model05 18.8 0.595
#> 6 9 0.0110 Preprocessor1_Model06 17.4 0.577
#> # ℹ 4 more rows
Under the hood, this is indeed just a pivot_wider()
call. We’ve found that it’s time-consuming and error-prone to programmatically determine identifying columns when pivoting resampled metrics, so we’ve localized and thoroughly tested the code that we use to do so with this feature.
More love for the Brier score
Tuning and resampling functions use default metrics when the user does not specify a custom metric set. For regression models, these are RMSE and R2. For classification, accuracy and the area under the ROC curve were the default. We’ve also added the Brier score to the default classification metric list.
Acknowledgements
As always, we’re appreciative of the community contributors who helped make this release happen: @AlbertoImg, @dramanica, @epiheather, @joranE, @jrosell, @jxu, @kbodwin, @kenraywilliams, @KJT-Habitat, @lionel-, @marcozanotti, @MasterLuke84, @mikemahoney218, @PathosEthosLogos, and @Peter4801.