We’re excited to announce the release of workflows 0.2.0. workflows is a tidymodels package for bundling a model specification from parsnip with a preprocessor, such as a formula or recipe. Doing this can streamline the model fitting workflow and combines nicely with tune for performing hyperparameter tuning.
You can install it from CRAN with:
install.packages("workflows")
Adding variables to a workflow
The main change in this release of workflows is the introduction of a new preprocessor method:
add_variables()
. This adds a third method to specify model terms, in addition to
add_formula()
and
add_recipe()
.
add_variables()
has a tidyselect interface, where outcomes
are specified using bare column names, followed by predictors
.
linear_spec <- linear_reg() %>%
set_engine("lm")
wf <- workflow() %>%
add_model(linear_spec) %>%
add_variables(outcomes = mpg, predictors = c(cyl, disp))
wf
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: mpg
#> Predictors: c(cyl, disp)
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
model <- fit(wf, mtcars)
mold <- pull_workflow_mold(model)
mold$predictors
#> # A tibble: 32 x 2
#> cyl disp
#> <dbl> <dbl>
#> 1 6 160
#> 2 6 160
#> 3 4 108
#> 4 6 258
#> 5 8 360
#> 6 6 225
#> 7 8 360
#> 8 4 147.
#> 9 4 141.
#> 10 6 168.
#> # … with 22 more rows
mold$outcomes
#> # A tibble: 32 x 1
#> mpg
#> <dbl>
#> 1 21
#> 2 21
#> 3 22.8
#> 4 21.4
#> 5 18.7
#> 6 18.1
#> 7 14.3
#> 8 24.4
#> 9 22.8
#> 10 19.2
#> # … with 22 more rows
outcomes
are removed before predictors
is evaluated, which means that formula specifications like y ~ .
can be easily reproduced as:
workflow() %>%
add_variables(mpg, everything())
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: None
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: mpg
#> Predictors: everything()
Importantly,
add_variables()
doesn’t do any preprocessing to your columns whatsoever. This is in contrast to
add_formula()
, which uses the standard
model.matrix()
machinery from R, and
add_recipe()
, which will
recipes::prep()
the recipe for you. It is especially useful when you aren’t using a recipe, but you do have S3 columns that you don’t want run through
model.matrix()
for fear of losing the S3 class, like with Date columns.
library(modeltime)
arima_spec <- arima_reg() %>%
set_engine("arima")
df <- data.frame(
y = sample(5),
date = as.Date("2019-01-01") + 0:4
)
wf <- workflow() %>%
add_variables(y, date) %>%
add_model(arima_spec)
arima_model <- fit(wf, df)
#> frequency = 1 observations per 1 day
arima_model
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: arima_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: y
#> Predictors: date
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Series: outcome
#> ARIMA(0,0,0) with non-zero mean
#>
#> Coefficients:
#> mean
#> 3.0000
#> s.e. 0.6325
#>
#> sigma^2 estimated as 2.5: log likelihood=-8.83
#> AIC=21.66 AICc=27.66 BIC=20.87
mold <- pull_workflow_mold(arima_model)
mold$predictors
#> # A tibble: 5 x 1
#> date
#> <date>
#> 1 2019-01-01
#> 2 2019-01-02
#> 3 2019-01-03
#> 4 2019-01-04
#> 5 2019-01-05
Tune
workflows created with
add_variables()
do not work with the current CRAN version of tune (0.1.1). However, the development version of tune does have support for this, which you can install in the meantime until a new version of tune hits CRAN.
devtools::install_github("tidymodels/tune")
Acknowledgements
Thanks to the three contributors that helped with this version of workflows @EmilHvitfeldt, @mdancho84, and @RaviHela!