The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles.
Since the beginning of 2021, we have been publishing quarterly updates here on the tidyverse blog summarizing what’s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the tidymodels tag to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused.
Since our last update we have had some larger releases that you can read about in these posts.
The post will update, you on which packages have changed and the improvements you should know about that haven’t been covered in the above posts.
Here’s a list of the packages and their News sections:
Let’s look at a few specific updates.
Quiet linear svm models
When you used to fit a linear SVM model, you would get a message that you were not able to avoid.
library(parsnip)
library(modeldata)
res <-
svm_linear(mode = "classification", engine = "kernlab") |>
fit(Class ~ ., data = two_class_dat)
#> Setting default kernel parameters
This message by itself was not that useful and was unable to turn off in a reasonable way. We have silenced this message to hopefully alleviate some of the noise that came from using this method.
library(parsnip)
library(modeldata)
#>
#> Attaching package: 'modeldata'
#> The following object is masked from 'package:datasets':
#>
#> penguins
res <-
svm_linear(mode = "classification", engine = "kernlab") |>
fit(Class ~ ., data = two_class_dat)
res
#> parsnip model object
#>
#> Support Vector Machine object of class "ksvm"
#>
#> SV type: C-svc (classification)
#> parameter : cost C = 1
#>
#> Linear (vanilla) kernel function.
#>
#> Number of Support Vectors : 361
#>
#> Objective Function Value : -357.1487
#> Training error : 0.178255
#> Probability model included.
Fewer numeric overflow issues in brulee
The brulee package has been improved to try to help avoid numeric overflow in the loss functions. The following things have been done to help deal with this type of issue.
Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation.
The results always contain the initial results to use as a fallback if there is overflow during the first epoch.
brulee_mlp()has two additional parameters,grad_value_clipandgrad_value_clip, that prevent issues.The warning was changed to “Early stopping occurred at epoch {X} due to numerical overflow of the loss function.”
Additional torch optimizers in brulee
Several additional optimizers have been added: "ADAMw", "Adadelta", "Adagrad", and "RMSprop". Previously, the options were "SGD" and LBFGS". ## Acknowledgements
We want to sincerely thank everyone who contributed to these packages since their previous versions:
- dials: @brendad8, @hfrick, @topepo, and @Wander03.
- parsnip: @chillerb, @EmilHvitfeldt, @jmgirard, @topepo, and @ZWael.
- rsample: @abichat, @hfrick, @mkiang, and @vincentarelbundock.
- recipes: @EmilHvitfeldt, @SimonDedman, and @topepo.
- probably: @abichat, @ayueme, @dchiu911, @EmilHvitfeldt, @frankiethull, @gaborcsardi, @hfrick, @Jeffrothschild, @jgaeb, @jrwinget, @mark-burdon, @martinhulin, @simonpcouch, @teunbrand, @topepo, @wjakethompson, and @yellowbridge.
- brulee: @genec1, @talegari, and @topepo.