Along with the release of
parsnip
there are new versions of many tidymodels
packages:
recipes
,
yardstick
,
embed
,
tidyposterior
, and
tidymodels
.
We made the conscious choice to add all of the breaking changes now instead of spreading them out over a few versions. The biggest changes are in yardstick
and recipes
and these are described below.
One change across all of these packages: broom
is no longer used to obtain the tidy
S3 methods. Instead, the
generics
package is imported so that we might reduce dependencies.
yardstick
This is a large release for yardstick, with more metrics, grouped data frame integration, multiclass metric support, and a few breaking changes.
Breaking changes
These changes were made with the intention of standardizing both the API and the output of each metric.
All metrics now return a tibble rather than a single numeric value. This sets the groundwork for allowing metrics to be used with grouped data frames, and allows more informative output to be returned from each metric.
To preserve some of the old behavior, _vec()
functions have been added for each
metric. These take vectors as inputs and return a single numeric result.
A number of small breaking changes have been made to be in line with the tidymodels
model implementation principles. These include: mnLogLoss()
being renamed to
mn_log_loss()
, the na.rm
argument being renamed to na_rm
, and other similar changes that reflect a standardization that is being implemented across the entire tidymodels ecosystem. All of the changes are documented in the
NEWS.
Multiclass metrics
A multiclass model is a classification model that has more than two potential outputs. Until now, the only metric with multiclass support was
accuracy()
because its definition extends naturally into the multiclass world. Now, all metrics have some form of multiclass support through the concepts of macro and micro averaging. To learn about how these types of averaging work, read the new
vignette.
As an example, the following data set has columns for an observed multiclass result, the predicted class, individual class probability predictions, and the current resample (out of 10).
library(dplyr)
data("hpc_cv")
hpc_single_resample <- filter(hpc_cv, Resample == "Fold01")
head(hpc_single_resample, n = 1)
#> obs pred VF F M L Resample
#> 1 VF VF 0.914 0.0779 0.00848 1.99e-05 Fold01
# The outcome has 4 potential values
unique(hpc_single_resample$obs)
#> [1] VF F M L
#> Levels: VF F M L
yardstick will automatically detect that the input is from a multiclass model, and will choose to use macro averaging by default in most cases.
precision(hpc_single_resample, obs, pred)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 precision macro 0.637
To tell yardstick metrics to use a different variant of averaging, use the estimator
argument to specify "macro"
, "micro"
or "macro_weighted"
averaging, among
others depending on the metric.
Grouped data frames
To calculate metrics on multiple resamples at once, yardstick now recognizes grouped data frames and calculates the metric on each group separately.
hpc_grouped <- hpc_cv %>%
group_by(Resample)
hpc_grouped %>%
pr_auc(obs, VF:L) %>%
head(3)
#> # A tibble: 3 x 4
#> Resample .metric .estimator .estimate
#> <chr> <chr> <chr> <dbl>
#> 1 Fold01 pr_auc macro 0.595
#> 2 Fold02 pr_auc macro 0.599
#> 3 Fold03 pr_auc macro 0.682
Combined with
metric_set()
, a new function for combining multiple metrics into one function call, this workflow makes calculating a large number of metrics over multiple resamples a quick task. We encourage you to check out the example section of
metric_set()
's help page if you are interested in learning more.
Curve functions
Four new “curve” functions have been added to compute the full ROC curve, precision-recall curve, lift curve, and gain curve. Each of these functions has a corresponding ggplot2::autoplot()
method. Combined with the grouped data frame support, this greatly simplifies some aspects of visualizing model performance.
library(ggplot2)
hpc_grouped %>%
roc_curve(obs, VF:L) %>%
autoplot() +
ggtitle("One-VS-All ROC Curve", subtitle = "Computed for each resample")
New metrics and vignettes
The following metrics are new in this release:
mape()
,
kap()
,
detection_prevalence()
,
bal_accuracy()
,
roc_curve()
,
pr_curve()
,
gain_curve()
,
lift_curve()
, and
gain_capture()
.
There are also three new vignettes. One has already been mentioned that describes multiclass averaging. The other two focus on the three main metric types in yardstick, and on implementing custom metrics for personal use.
recipes
Breaking changes
One big change was to make the argument names more consistent with the tidyverse standards and to also make them consistent with dials
and other packages. For example,
step_pca()
now has an argument num_comp
that replaces the previous num
argument. This will pay off later when we enable the detection of tuning parameters and the automatic determination of grid values or parameter ranges. The biggest name change is in
bake()
; newdata
is now new_data
. For the time being, a warning will be issued when newdata
is used but that won’t last past the next version. The list of name changes are detailed
here.
In recipes, variables can have different roles (e.g. “predictor” or “outcome”). Beyond those set by the package, roles are largely user specified and can be pretty much anything. Previously, only a single role was allowed. The new version of recipes expands the number of roles per column. This now means that
add_role()
will append roles, and the new function
update_role()
will reset them. It also changes how the summary()
results for a recipe are returned since there can now be multiple rows per column variable.
A feature that we will be working on in the next version is to be able to reference (and use) previous steps. For example, if you center some variables, you might want to uncenter them at a later step. For this future feature, this version of recipes
mandates an ID field for each step. The ID can be anything, but the current convention is to use the step name followed by random digits (e.g. "center_irqtH"
).
Another change was to default the
prep()
option retain
to TRUE
. We (and others) found that this was something that is always done since it allows
juice()
to get the processed training set at no extra cost. The down-side is that, if the training set is large, you carry a large copy of the data inside the recipe. When the verbose
option is turned on, a message is printed showing the size of the training set, i.e.:
“The retained training set is ~ 20.0 Mb in memory.”
This size estimate is approximate since the base R function object.size()
is used, which does not count objects in any environments that are carried along.
Finally, a number of steps check for duplicate names and will throw an error during prep()
if this occurs. This behavior may slightly change in the future due to changes in the tibble
package related to how unique names should treated be when creating data frames.
New steps
A big new feature in this version of recipes
is the addition of dplyr
-related steps:
step_arrange()
,
step_filter()
,
step_mutate()
,
step_sample()
, and
step_slice()
. They follow their dplyr
analogs.
step_sample()
covers both dplyr::sample_n()
and dplyr::sample_frac()
. Other new steps include:
-
step_integer()
converts data to ordered integers similar toLabelEncoder
. -
step_geodist()
can be used to calculate the distance between geocodes and a single reference location. -
step_nnmf()
computes the non-negative matrix factorization components for non-negative data.
List-columns are also supported in recipes
now.
summary.recipe()
now shows type
column values as “list” and these can be selected using has_type("list")
. When printing the recipe, a row is labeled as missing when its entire list element is missing (e.g. is.na(list[[i]])
is TRUE
). If the list element has some non-missing values, it is not counted as missing.
There are also bug fixes and other small changes that can be found in the News file.
rsample
A function
initial_time_split()
was added. It can be used to create ordered initial splits and would be appropriate for time series data.
(breaking change) Also, the recipes
-related
prepper()
function was moved to the recipes
package. This makes the rsample
's install footprint much smaller.
Finally, rsplit
objects have a better representation inside of tibbles when the sample sizes are large.
embed
The tensorflow function
step_embed()
can now handle callbacks to keras
. This enables a few different features, including stopping when a convergence criterion is met.
tidymodels
We added
parsnip
and
dials
to the core set of packages and bumped all packages up to their current versions.