We’re excited to announce the release of broom 0.7.0 on CRAN!
broom is a package for summarizing statistical model objects in tidy tibbles. While several compatibility updates have been released in recent months, this is the first major update to broom in almost two years. This update includes many new tidier methods, bug fixes, improvements to existing tidier methods and their documentation, and improvements to maintainability and internal consistency. The full list of changes is available in the package release notes.
This release was made possible in part by the RStudio internship program, which has allowed one of us ( Simon Couch) to work on broom full-time for the last month.
You can install the most recent broom update with the following code:
install.packages("broom")
Then attach it for use with:
library(broom)
We’ll outline some of the more notable changes below!
New Tidier Methods
For one, this release includes support for several new model objects—many of these additions came from first-time contributors to broom!
anova
objects from thecar
packagepam
objects from thecluster
packagedrm
objects from thedrc
packagesummary_emm
objects from theemmeans
packageepi.2by2
objects from theepiR
packagefixest
objects from thefixest
packageregsubsets
objects from theleaps
packagelm.beta
objects from thelm.beta
packagerma
objects from themetafor
packagemfx
,logitmfx
,negbinmfx
,poissonmfx
,probitmfx
, andbetamfx
objects from themfx
packagelmrob
andglmrob
objects from therobustbase
packagesarlm
objects from thespatialreg
packagespeedglm
objects from thespeedglm
packagesvyglm
objects from thesurvey
package- We have restored a simplified version of
glance.aov()
Improvements and Bug Fixes for Existing Tidiers
This update also features many bug fixes improvements to existing tidiers. Some of the more notable ones:
- Many improvements to the consistency of
augment.*()
methods:- If you pass a dataset to
augment()
via thedata
ornewdata
arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previouslyaugment()
would drop rows containingNA
. This should no longer be the case. As a result,augment.*()
methods no longer accept anna.action
argument. - In previous versions, several
augment.*()
methods inherited theaugment.lm()
method, but required additions to theaugment.lm()
method itself. We have shifted away from this approach in favor of re-implementing manyaugment.*()
methods as standalone methods making use of internal helper functions. As a result,augment.lm()
and some related methods have deprecated (previously unused) arguments. - The
.resid
column in the output ofaugment().*
methods is now consistently defined asy - y_hat
. -
augment()
tries to give an informative error whendata
isn’t the original training data.
- If you pass a dataset to
- Several
glance.*()
methods have been refactored in order to return a one-row tibble even when the model matrix is rank-deficient. - Many
glance()
methods now return anobs
column, which contains the number of data points used to fit the model! - Various warnings resulting from changes to the tidyr API in v1.0.0 have been fixed.
- Added options to provide additional columns in the outputs of
glance.biglm()
,tidy.felm()
,tidy.lmsobj()
,tidy.lmodel2()
,tidy.polr()
,tidy.prcomp()
,tidy.zoo()
,tidy_optim()
Breaking Changes and Deprecations
This release also contains a number of breaking changes and deprecations meant to improve maintainability and internal consistency.
- We have changed how we report degrees of freedom for
lm
objects. This is especially important for instructors in statistics courses. Previously thedf
column inglance.lm()
reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is equal to the rank of the model matrix minus one (unless you omit an intercept column), so the newdf
should be the olddf
minus one. - We are moving away from supporting
summary.*()
objects. In particular, we have removedtidy.summary.lm()
as part of a major overhaul of internals. Instead of callingtidy()
onsummary
-like objects, please calltidy()
directly on model objects moving forward. - We have removed all support for the
quick
argument intidy()
methods. This is to simplify internals and is for maintainability purposes. We anticipate this will not influence many users as few people seemed to use it. If this majorly cramps your style, let us know, as we are considering a new verb to return only model parameters. In the meantime,stats::coef()
together withtibble::enframe()
provides most of the functionality oftidy(..., quick = TRUE)
. - All
conf.int
arguments now default toFALSE
, and allconf.level
arguments now default to0.95
. This should primarily affecttidy.survreg()
, which previously always returned confidence intervals, although there are some others. - Tidiers for
emmeans
-objects use the argumentsconf.int
andconf.level
instead of relying on the argument names native to theemmeans::summary()
-methods (i.e.,infer
andlevel
). Similarly,multcomp
-tidiers now include a call tosummary()
as previous behavior was akin to setting the now removed argumentquick = TRUE
. Both families of tidiers now use theadj.p.value
column name when appropriate. Finally,emmeans
-,multcomp
-, andTukeyHSD
-tidiers now consistently use the column namescontrast
andnull.value
instead ofcomparison
,level1
andlevel2
, orlhs
andrhs
.
This release of broom also deprecates several helper functions as well as tidier methods for a number of non-model objects, each in favor of more principled approaches from other packages (outlined in the NEWS file). Notably, though, tidiers have been deprecated for data frames, rowwise data frames, vectors, and matrices. Further, we have moved forward with the planned transfer of tidiers for mixed models to broom.mixed
.
Other Changes
Most all unit testing for the package is now supported by the modeltests package!
Also, we have revised several vignettes and moved them to the tidymodels website. For backward compatibility, the existing vignettes will now simply link to the revised versions.
Finally, the package’s website has moved from its previous tidyverse domain to broom.tidymodels.org.
Looking Forward
Most notably, the broom dev team is changing the process to add new tidying methods to the package. Instead, we ask that issues/PRs requesting support for new model objects be directed to the model-owning package (i.e. the package that the model is exported from) rather than to broom. If the maintainers of those packages are unable or unwilling to provide tidying methods in the model-owning package, it might be possible to add the new tidier to broom. broom is near its limit of tidiers; adding more may make the package unsustainable.
For developers exporting tidying methods directly from model-owning packages, we are actively working to provide resources to both ease the process of writing new tidiers methods and reduce the dependency burden of taking on broom generics and helpers. As for the first point, we recently posted an
article on the tidymodels website providing notes on best practices for writing tidiers. This article will be kept up to date as we develop new resources for easing the process of writing new tidier methods. As for the latter, the
r-lib/generics
package provides lightweight dependencies for the main broom generics. We hope to soon provide a coherent suite of helper functions for use in external broom methods.
We anticipate that the most active development on the broom package, looking forward, will center on improving
augment()
methods. We are also hoping to change our CRAN release cycle and to provide incremental updates every several months rather than major changes every couple years.
Contributors
This release features work and input from over 140 contributors (over 50 of them for their first time) since the last major release. See the package release notes to see more specific notes on contributions. Thank you all for your thoughtful comments, patience, and hard work!
@abbylsmith, @acoppock, @ajb5d, @aloy, @AndrewKostandy, @angusmoore, @anniew, @aperaltasantos, @asbates, @asondhi, @asreece, @atyre2, @bachmeil, @batpigandme, @bbolker, @benjbuch, @bfgray3, @BibeFiu, @billdenney, @BrianOB, @briatte, @bruc, @brunaw, @brunolucian, @bschneidr, @carlislerainey, @CGMossa, @CharlesNaylor, @ChuliangXiao, @cimentadaj, @crsh, @cwang23, @DavisVaughan, @dchiu911, @ddsjoberg, @dgrtwo, @dmenne, @dylanjm, @ecohen13, @economer, @EDiLD, @ekatko1, @ellessenne, @ethchr, @florencevdubois, @GegznaV, @gershomtripp, @grantmcdermott, @gregmacfarlane, @hadley, @haozhu233, @hasenbratan, @HenrikBengtsson, @hermandr, @hideaki, @hughjonesd, @iago-pssjd, @ifellows, @IndrajeetPatil, @Inferrator, @istvan60, @jamesmartherus, @JanLauGe, @jasonyang5, @jaspercooper, @jcfisher, @jennybc, @jessecambon, @jkylearmstrongibx, @jmuhlenkamp, @JulianMutz, @Jungpin, @jwilber, @jyuu, @karissawhiting, @karldw, @khailper, @krauskae, @kuriwaki, @kyusque, @KZARCA, @Laura-O, @ldlpdx, @ldmahoney, @lilymedina, @llendway, @lrose1, @ltobalina, @LukasWallrich, @lukesonnet, @lwjohnst86, @malcolmbarrett, @margarethannum, @mariusbarth, @MatthieuStigler, @mattle24, @mattpollock, @mattwarkentin, @mine-cetinkaya-rundel, @mkirzon, @mlaviolet, @Move87, @namarkus, @nlubock, @nmjakobsen, @ns-1m, @nt-williams, @oij11, @petrhrobar, @PirateGrunt, @pjpaulpj, @pkq, @poppymiller, @QuLogic, @randomgambit, @riinuots, @RobertoMuriel, @Roisin-White, @romainfrancois, @rsbivand, @serina-robinson, @shabbybanks, @Silver-Fang, @Sim19, @simonpcouch, @sjackson1236, @softloud, @stefvanbuuren, @strengejacke, @sushmitavgopalan16, @tcuongd, @thisisnic, @topepo, @tyluRp, @vincentarelbundock, @vjcitn, @vnijs, @weiyangtham, @william3031, @x249wang, @xieguagua, @yrosseel, and @zoews