gtsummary 2.0.0
New Features
-
Clearer error messages have been introduced throughout the package. We've adopted {cli} for all our messaging to users. Our goal was to return a clear message to users for all scenarios.
-
Added functions
tbl_wide_summary()
andtbl_ard_wide_summary()
for simple summaries across multiple columns. -
The {gt} package is now the default printer for all Quarto and R markdown output formats.
- Previously, when printing a gtsummary table in a Quarto or R markdown document, we would detect the output format and convert to gt, flextable, or kable to provide the best-looking table. The {gt} package has matured and provides lovely tables for nearly all output types, and we have now made {gt} the default table drawing tool for all gtsummary tables. These output types are still supported.
-
Previously, if I wanted a single statistic to be reported to additional levels of precision in a
tbl_summary()
table, I would need to specify the precision of every summary statistic for a variable. Now, we can simple update the one statistic we're interested in with a named list of vector:tbl_summary(digits = age ~ list(sd = 2))
. -
New functions
tbl_ard_summary()
andtbl_ard_continuous()
have been added. These provide general tools for creating bespoke summary tables. Rather than accepting a data frame, these functions accept an ARD object (Analysis Results Dataset often created with the {cards} or {cardx} packages). The ARD objects align with the emerging CDISC Analysis Results Standard. ARDs are now used throughout the package. See below under the "Internal Storage" heading. -
The default
add_global_p(anova_fun)
argument value has been updated toglobal_pvalue_fun()
, which is an S3 generic. The default method still callscar::Anova()
for the calculation. Methods fortidycmprsk::crr()
andgeepack::geeglm()
have been added that wrapaod::wald.test()
as these regression model types are not supported bycar::Anova()
. -
The
add_ci.tbl_summary()
S3 method has been updated with new ways to calculate the confidence interval: Wald with and without continuity correction, Agresti-Coull, and Jeffreys. -
Added a family of function
label_style_*()
that are similar to thestyle_*()
except they return a styling function, rather than a styled value. -
Functions
tbl_summary()
andtbl_svysummary()
have gained themissing_stat
argument, which gives users great control over the statistics presented in the missing row of a summary table.
Internal Storage
-
Greater consistency has been put in place for all calculated statistics in gtsummary. Previously, each function handled its own calculations and transforming these statistics into data frames that would be printed. Now each function will first prepare an Analysis Result Dataset (ARD), and ARDs are converted to gtsummary structures using bridge functions (prefixed with
brdg_*()
). The bridge functions will be exported to allow anyone to more easily extend gtsummary functions.- These ARDs are now used to calculate the summary statistics for nearly every function in gtsummary. The raw summary statistics are saved in
.$cards
. - Users who previously accessed the internals of a gtsummary object will find the structure has been updated, and this may be an important breaking change.
- These ARDs are now used to calculate the summary statistics for nearly every function in gtsummary. The raw summary statistics are saved in
-
Calculations that require other packages have been placed in another package called {cardx}. This package creates ARD objects with the calculated statistics.
-
In
tbl_regression()
, the.$model_obj
is no longer returned with the object. The modeling object is, and always has been, available in.$inputs$x
. -
When the gtsummary package was first written, the gt package was not on CRAN and the version of the package that was available did not have the ability to merge columns. Due to these limitations, the
"ci"
column was added to show the combined"conf.low"
and"conf.high"
columns. Column merging in both gt and gtsummary packages has matured over the years, and we are now adopting a more modern approach by using these features. As a result, the"ci"
column will eventually be dropped from.$table_body
. By using column merging, the conf.low and conf.high remain numeric and we can to continue to update how these columns are formatted. Review?deprecated_ci_column
for details.
Documentation
- The vignettes "FAQ+Gallery",
tbl_summary()
Tutorial,tbl_regression()
Tutorial, and Quarto+R Markdown have been converted to articles. The URLs on the website have not changed for these pages, but the vignettes are no longer is bundled in the package. This change allows us to provide better documentation, utilizing more tools that don't need to be included in the package.
Minor Improvements
-
Argument
add_p.tbl_summary(adj.vars)
was added to more easily add p-values that are adjusted/stratified by other columns in a data frame. -
Messaging and checks have been improved when tidyselect is invoked in the package, i.e. when the tilda is used to select variables
age ~ "Patient Age"
. The subset of variables that can be selected is now reduced the variables present in the table. For example, if you have a summary table of patient age (and only patient age), and age is a single column from a data set of many columns and you mis-spell age (aggge ~ "Patient Age"
), the error message will now ask if you meant"age"
instead of listing every column in the data set.- Note that as before, you can circumvent tidyselect by using a named list, e.g.
list(age = "Patient Age")
.
- Note that as before, you can circumvent tidyselect by using a named list, e.g.
-
Added the following methods for calculating differences in
add_difference.tbl_summary()
: Hedge's G, Paired data Cohen's D, and Paired data Hedge's G. All three are powered by the {effectsize} package. -
The counts in the header of
tbl_summary(by)
tables now appear on a new line, e.g."**{level}** \nN = {n}"
. -
In
tbl_summary()
, the default calculation for quantiles (e.g. statistics of the form"p25"
or"p75"
) has been updated with typequantile(type=2)
. -
In
tbl_summary()
, dates and times showed the minimum and maximum values only by default. They are now treated as all other continuous summaries and share their default statistics of the median and IQR. -
Previously, indentation was handled with
modify_table_styling(text_format = c("indent", "indent2"))
, which would indent a cell 4 and 8 spaces, respectively. Handling of indentation has been migrated tomodify_table_styling(indent = integer())
, and by default, the label column is indented to zero spaces. This makes it easier to indent a group of rows. -
The inputs for
modify_table_styling(undo_text_format)
has been updated to mirror its counterpartmodify_table_styling(text_format)
and no longer acceptsTRUE
orFALSE
. -
The values passed in
tbl_summary(value)
are now only checked for columns that are summary type"dichotomous"
. -
The gtsummary selecting functions, e.g.
all_categorical()
,all_continuous()
, etc., are now simplified by wrappingtidyselect::where()
, which not available when these functions were originally written. Previously, these functions would error if used out of context; they now, instead,select no columns when used out-of-context. -
The design-based t-test has been added as possible methods for
add_difference.tbl_svysummary()
and is now the default for continuous variables. -
When
add_ci()
is run afteradd_overall()
, the overall column is now populated with the confidence interval. (#1569) -
Added
pkgdown_print.gtsummary()
method that is only registered when the pkgdown package is loaded. This enables printing of gtsummary tables on the pkgdown site in the Examples section. (#1771) -
The package now uses updated
survey::svyquantile()
function to calculate quatiles, which was introduced in survey v4.1
Bug fixes
- Fix in
add_difference()
for paired t-tests. Previously, the sign of the reported difference depended on which group appeared first in the source data. Function has been updated to consistently report the difference as the first group mean minus the second group mean. (#1557)
Lifecycle changes
-
A couple of small changes to the default summary type in
tbl_summary()
have been made.- If a column is all
NA_character_
intbl_summary()
, the default summary type is now"continuous"
, where previously it was"dichotomous"
. - Previously, in a
tbl_summary()
variables that werec(0, 1)
,c("no", "yes")
,c("No", "Yes")
, andc("NO", "YES")
would default to a dichotomous summary with the1
andyes
level being shown in the table. This would occur even in the case when, for example, only0
was observed. In this release, the line shown for dichotomous variables must be observed OR the unobserved level must be either explicitly defined in a factor or be a logical vector. This means that a character vector of all"yes"
or all"no"
values will default to a categorical summary instead of dichotomous.
- If a column is all
-
When using the
tbl_summary(value)
argument, we no longer allow unobserved levels to be used unless it is an unobserved factor level or logical level. -
The
quiet
argument has been deprecated throughout the package, except intbl_stack()
. Documentation has been updated to ensure clarity in all methods. -
The
inline_text(level)
argument now expects a character value. -
The
tbl_butcher(include)
argument now only accepts character vectors. -
The following theme elements have been deprecated:
- These theme elements will eventually be removed from the package:
'tbl_summary-arg:label'
,'add_p.tbl_summary-arg:pvalue_fun'
,'tbl_regression-arg:pvalue_fun'
,'tbl_regression-chr:tidy_columns'
.- The
pvalue_fun
elements should switch to the package-wide theme for p-value styling--'pkgwide-fn:pvalue_fun'
.
- The
- These theme elements have been removed from the package immediately due to structural changes within the package:
'tbl_summary-str:continuous_stat'
,'tbl_summary-str:categorical_stat'
.- The default statistics can still be modified with
'tbl_summary-arg:statistic'
- The default statistics can still be modified with
- These theme elements will eventually be removed from the package:
-
The
add_p(test = ~'aov')
test is now deprecated as identical results can be obtained withadd_p(test = ~'oneway.test', test.args = ~list(var.equal = TRUE))
. -
Previously,
add_p.tbl_summary()
would coerce various data types to classes compatible with some base R tests. For example, we would convertdifftime
classes to general numeric before passing towilcox.test()
. We have eliminated type- and class-specific handling in these functions and it is now left to the the user pass data compatible with the functions that calculate the p-values or to create a custom test that wrapswilcox.test()
and performs the conversion. This change is effective immediately. -
Arguments
modify_header(update)
,modify_footnote(update)
,modify_spanning_header(update)
, andmodify_fmt_fun(update)
have been deprecated. Use dynamic dots instead, e.g.modify_header(...)
, which has been the preferred method for passing updates for a few years. -
Function
continuous_summary()
has been deprecated immediately. Apologies for the inconvenience of the immeidate deprecation. The way the function originally worked is not compatible with the updated internal structures. In most cases, users can use thetbl_continuous()
function instead. -
Arguments
add_stat(fmt_fun, header, footnote, new_col_name)
have been deprecated since v1.4.0 (2021-04-13). They have now been fully removed from the package. -
Global options have been deprecated in gtsummary since v1.3.1 (2020-06-02). They have now been fully removed from the package.
-
The
modify_header(stat_by)
argument was deprecated in v1.3.6 (2021-01-08), and has now been fully removed from the package. -
Use of the
vars()
selector was first removed in v1.2.5 (2020-02-11), and the messaging about the deprecation was kicked up in June 2022. This use is now defunct and the function will soon no longer be exported. -
The
as_flextable()
function was deprecated in v1.3.3 (2020-08-11), and has now been fully removed from the package. -
Custom selectors
all_numeric()
,all_character()
,all_integer()
,all_double()
,all_logical()
,all_factor()
functions were deprecated in v1.3.6 (2021-01-08), and has now been fully removed from the package. These functions were added before thetidyselect::where()
function was released, which is a replacement for all these functions. -
The
modify_cols_merge()
functions was renamed tomodify_column_merge()
to match the other function names in v1.6.1 (2022-06-22). The deprecation has been upgraded from a warning to an error. -
There is a change in the
theme_gtsummary_journal("qjecon")
theme for gt output. The journal prefers to present regression coefficients above their standard errors. To achieve this placement in gt table, we were taking advantage of a bug or feature (depending on your point of view) that allowed this placement when a gt table was output to HTML and HTML only. The gt package is now working on a proper solution for linebreaks within a cell, and until that feature is active, we are not using our hack. There is no change for this theme for the other tabling engine packages.