New tidymodels releases for July 2021

  workflows, parsnip, workflowsets, tune, discrim, finetune, hardhat

  Max Kuhn

The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. Earlier this year, we started regular updates here on the tidyverse blog summarizing recent developments in the tidymodels ecosystem. You can check out the tidymodels tag to find all tidymodels blog posts here, including those that focus on a single package or more major releases. The purpose of these roundup posts is to keep you informed about any releases you may have missed and useful new functionality as we maintain these packages.

Recently, we had a series of CRAN releases: hardhat, workflows, parsnip, tune, finetune, workflowsets, and discrim. These were coordinated because of some cross-package improvements. This blog post summarizes the changes.

Object extraction

The tidymodels team decided that we needed a consistent set of APIs for extracting things from objects. For example, a parsnip model contains the underlying model fit based on the engine. A linear_reg() model with the "lm" engine contains an lm object. There were some existing functions to do this (mostly named pull_*()) but they were fairly inconsistent and were not generics.

We added the following functions: extract_fit_engine(), extract_fit_parsnip(), extract_mold(), extract_numeric(), extract_preprocessor(), extract_recipe(), extract_spec_parsnip(), extract_workflow(), and extract_workflow_set_result().

The nice thing about this change is that a function such as extract_recipe() can be used with objects created by the tune, workflows, or workflowsets packages.

The existing pull_*() methods have been soft-deprecated and will stick around for a while.

Better model documentation

One issue that we’ve seen in the parsnip documentation is that there is just so much on each model page. It can be intimidating and difficult to find that one piece of information that you were looking for.

We’ve reorganized the model pages so that there are now sub-packages for each engine. For example, when you use ?linear_reg, the help page has a dynamic list of engines from parsnip or any parsnip-adjacent package that has been loaded. Here is what the pkgdown site looks like:

plot of chunk parsnip

There is a similar dynamic list in the See Also section.

Each engine page provides basic information about tuning parameters, modes, preprocessing requirements, and anything else that we thing is relevant. For example, for the C5.0 engine for boost_tree():

plot of chunk C50

Finally, the existing parsnip documentation didn’t show the actual fitting and/or prediction in action. A new pkgdown article has worked examples demonstrating the use of parsnip models on real data. Here is a screen shot for MARS regression via the earth package:

plot of chunk earth

We think that these changes will greatly improve the whole parsnip experience, especially for new users.

Simpler parsnip and workflows interfaces

Our good friend and colleague David Robinson had some great ideas for specific improvements for our APIs. After some discussion, both of his suggestions were implemented.

First, we enabled a default engine for parsnip models (you may have noticed this in the screen shots above). This produces simpler code for some model functions and, if a model has a single mode, fitting is as concise as

# use lm() for regression
linear_reg() %>% fit(mpg ~ ., data = mtcars)

Another nice feature is more succinct piping for workflows. A preprocessor, such as a formula or recipe, can be piped into workflow() now. Also, there is an optional second argument in that function for the model specification.

Instead of

car_rec <- 
  recipe(mpg ~ ., data = mtcars) %>% 
  step_ns(disp, deg_free = 5)

car_wflow <- 
  workflow() %>% 
  add_recipe(car_rec) %>% 
  add_model(linear_reg())

you can now use

car_wflow <- 
  recipe(mpg ~ ., data = mtcars) %>% 
  step_ns(disp, deg_free = 5) %>% 
  workflow(linear_reg()) 

If you might be on the fence about using tidymodels, David’s blog post does an excellent job encapsulating the benefits of our approach, so give it a read.

Other changes

parsnip now has a generalized additive model function gen_additive_mod()! There is currently one engine (mgcv).

The tune package has better control over random numbers since, in some cases, the RNGkind was changed after tuning a model.

The discrim package has the new parsnip-like documentation and new model engines. Also, the shrunken discriminant analysis method of Ahdesmaki and Strimmer (2010) was added as an engine to discrim_linear(). The newly resurrected sparsediscrim package allowed use to include new engines for discrim_linear() and discrim_quad().

Acknowledgements

We’d like to thank everyone who has contributed to these packages since their last release:

hardhat: @cregouby, @DavisVaughan, @DiabbZegpi, @hfrick, @jwijffels, @LasWin, and @topepo.

workflows: @DavisVaughan, @dgrtwo, @EmilHvitfeldt, @LiamBlake, and @topepo.

parsnip: @cgoo4, @dgrtwo, @EmilHvitfeldt, @graysonwhite, @hfrick, @juliasilge, @mdancho84, @RaymondBalise, @topepo, and @yutannihilation.

tune: @amazongodman, @brshallo, @dpanyard, @EmilHvitfeldt, @juliasilge, @klin333, @mbac, @PathosEthosLogos, @tjcason, @topepo, and @yogat3ch.

finetune: @DavisVaughan, @hfrick, @hnagaty, @lukasal, @Mayalaroz, @mrkaye97, @shinyquant, @skeydan, and @topepo.

workflowsets: @amazongodman, @jonthegeek, @juliasilge, @oskasf, @topepo, and @yogat3ch.

discrim: @topepo.