Skip to content

gtsummary 2.0.0

Compare
Choose a tag to compare
@ddsjoberg ddsjoberg released this 23 Jul 15:34

New Features

  • Clearer error messages have been introduced throughout the package. We've adopted {cli} for all our messaging to users. Our goal was to return a clear message to users for all scenarios.

  • Added functions tbl_wide_summary() and tbl_ard_wide_summary() for simple summaries across multiple columns.

  • The {gt} package is now the default printer for all Quarto and R markdown output formats.

    • Previously, when printing a gtsummary table in a Quarto or R markdown document, we would detect the output format and convert to gt, flextable, or kable to provide the best-looking table. The {gt} package has matured and provides lovely tables for nearly all output types, and we have now made {gt} the default table drawing tool for all gtsummary tables. These output types are still supported.
  • Previously, if I wanted a single statistic to be reported to additional levels of precision in a tbl_summary() table, I would need to specify the precision of every summary statistic for a variable. Now, we can simple update the one statistic we're interested in with a named list of vector: tbl_summary(digits = age ~ list(sd = 2)).

  • New functions tbl_ard_summary() and tbl_ard_continuous() have been added. These provide general tools for creating bespoke summary tables. Rather than accepting a data frame, these functions accept an ARD object (Analysis Results Dataset often created with the {cards} or {cardx} packages). The ARD objects align with the emerging CDISC Analysis Results Standard. ARDs are now used throughout the package. See below under the "Internal Storage" heading.

  • The default add_global_p(anova_fun) argument value has been updated to global_pvalue_fun(), which is an S3 generic. The default method still calls car::Anova() for the calculation. Methods for tidycmprsk::crr() and geepack::geeglm() have been added that wrap aod::wald.test() as these regression model types are not supported by car::Anova().

  • The add_ci.tbl_summary() S3 method has been updated with new ways to calculate the confidence interval: Wald with and without continuity correction, Agresti-Coull, and Jeffreys.

  • Added a family of function label_style_*() that are similar to the style_*() except they return a styling function, rather than a styled value.

  • Functions tbl_summary() and tbl_svysummary() have gained the missing_stat argument, which gives users great control over the statistics presented in the missing row of a summary table.

Internal Storage

  • Greater consistency has been put in place for all calculated statistics in gtsummary. Previously, each function handled its own calculations and transforming these statistics into data frames that would be printed. Now each function will first prepare an Analysis Result Dataset (ARD), and ARDs are converted to gtsummary structures using bridge functions (prefixed with brdg_*()). The bridge functions will be exported to allow anyone to more easily extend gtsummary functions.

    • These ARDs are now used to calculate the summary statistics for nearly every function in gtsummary. The raw summary statistics are saved in .$cards.
    • Users who previously accessed the internals of a gtsummary object will find the structure has been updated, and this may be an important breaking change.
  • Calculations that require other packages have been placed in another package called {cardx}. This package creates ARD objects with the calculated statistics.

  • In tbl_regression(), the .$model_obj is no longer returned with the object. The modeling object is, and always has been, available in .$inputs$x.

  • When the gtsummary package was first written, the gt package was not on CRAN and the version of the package that was available did not have the ability to merge columns. Due to these limitations, the "ci" column was added to show the combined "conf.low" and "conf.high" columns. Column merging in both gt and gtsummary packages has matured over the years, and we are now adopting a more modern approach by using these features. As a result, the "ci" column will eventually be dropped from .$table_body. By using column merging, the conf.low and conf.high remain numeric and we can to continue to update how these columns are formatted. Review ?deprecated_ci_column for details.

Documentation

  • The vignettes "FAQ+Gallery", tbl_summary() Tutorial, tbl_regression() Tutorial, and Quarto+R Markdown have been converted to articles. The URLs on the website have not changed for these pages, but the vignettes are no longer is bundled in the package. This change allows us to provide better documentation, utilizing more tools that don't need to be included in the package.

Minor Improvements

  • Argument add_p.tbl_summary(adj.vars) was added to more easily add p-values that are adjusted/stratified by other columns in a data frame.

  • Messaging and checks have been improved when tidyselect is invoked in the package, i.e. when the tilda is used to select variables age ~ "Patient Age". The subset of variables that can be selected is now reduced the variables present in the table. For example, if you have a summary table of patient age (and only patient age), and age is a single column from a data set of many columns and you mis-spell age (aggge ~ "Patient Age"), the error message will now ask if you meant "age" instead of listing every column in the data set.

    • Note that as before, you can circumvent tidyselect by using a named list, e.g. list(age = "Patient Age").
  • Added the following methods for calculating differences in add_difference.tbl_summary(): Hedge's G, Paired data Cohen's D, and Paired data Hedge's G. All three are powered by the {effectsize} package.

  • The counts in the header of tbl_summary(by) tables now appear on a new line, e.g. "**{level}** \nN = {n}".

  • In tbl_summary(), the default calculation for quantiles (e.g. statistics of the form "p25" or "p75") has been updated with type quantile(type=2).

  • In tbl_summary(), dates and times showed the minimum and maximum values only by default. They are now treated as all other continuous summaries and share their default statistics of the median and IQR.

  • Previously, indentation was handled with modify_table_styling(text_format = c("indent", "indent2")), which would indent a cell 4 and 8 spaces, respectively. Handling of indentation has been migrated to modify_table_styling(indent = integer()), and by default, the label column is indented to zero spaces. This makes it easier to indent a group of rows.

  • The inputs for modify_table_styling(undo_text_format) has been updated to mirror its counterpart modify_table_styling(text_format) and no longer accepts TRUE or FALSE.

  • The values passed in tbl_summary(value) are now only checked for columns that are summary type "dichotomous".

  • The gtsummary selecting functions, e.g. all_categorical(), all_continuous(), etc., are now simplified by wrapping tidyselect::where(), which not available when these functions were originally written. Previously, these functions would error if used out of context; they now, instead,select no columns when used out-of-context.

  • The design-based t-test has been added as possible methods for add_difference.tbl_svysummary() and is now the default for continuous variables.

  • When add_ci() is run after add_overall(), the overall column is now populated with the confidence interval. (#1569)

  • Added pkgdown_print.gtsummary() method that is only registered when the pkgdown package is loaded. This enables printing of gtsummary tables on the pkgdown site in the Examples section. (#1771)

  • The package now uses updated survey::svyquantile() function to calculate quatiles, which was introduced in survey v4.1

Bug fixes

  • Fix in add_difference() for paired t-tests. Previously, the sign of the reported difference depended on which group appeared first in the source data. Function has been updated to consistently report the difference as the first group mean minus the second group mean. (#1557)

Lifecycle changes

  • A couple of small changes to the default summary type in tbl_summary() have been made.

    • If a column is all NA_character_ in tbl_summary(), the default summary type is now "continuous", where previously it was "dichotomous".
    • Previously, in a tbl_summary() variables that were c(0, 1), c("no", "yes"), c("No", "Yes"), and c("NO", "YES") would default to a dichotomous summary with the 1 and yes level being shown in the table. This would occur even in the case when, for example, only 0 was observed. In this release, the line shown for dichotomous variables must be observed OR the unobserved level must be either explicitly defined in a factor or be a logical vector. This means that a character vector of all "yes" or all "no" values will default to a categorical summary instead of dichotomous.
  • When using the tbl_summary(value) argument, we no longer allow unobserved levels to be used unless it is an unobserved factor level or logical level.

  • The quiet argument has been deprecated throughout the package, except in tbl_stack(). Documentation has been updated to ensure clarity in all methods.

  • The inline_text(level) argument now expects a character value.

  • The tbl_butcher(include) argument now only accepts character vectors.

  • The following theme elements have been deprecated:

    • These theme elements will eventually be removed from the package: 'tbl_summary-arg:label', 'add_p.tbl_summary-arg:pvalue_fun', 'tbl_regression-arg:pvalue_fun', 'tbl_regression-chr:tidy_columns'.
      • The pvalue_fun elements should switch to the package-wide theme for p-value styling--'pkgwide-fn:pvalue_fun'.
    • These theme elements have been removed from the package immediately due to structural changes within the package: 'tbl_summary-str:continuous_stat', 'tbl_summary-str:categorical_stat'.
      • The default statistics can still be modified with 'tbl_summary-arg:statistic'
  • The add_p(test = ~'aov') test is now deprecated as identical results can be obtained with add_p(test = ~'oneway.test', test.args = ~list(var.equal = TRUE)).

  • Previously, add_p.tbl_summary() would coerce various data types to classes compatible with some base R tests. For example, we would convert difftime classes to general numeric before passing to wilcox.test(). We have eliminated type- and class-specific handling in these functions and it is now left to the the user pass data compatible with the functions that calculate the p-values or to create a custom test that wraps wilcox.test() and performs the conversion. This change is effective immediately.

  • Arguments modify_header(update), modify_footnote(update), modify_spanning_header(update), and modify_fmt_fun(update) have been deprecated. Use dynamic dots instead, e.g. modify_header(...), which has been the preferred method for passing updates for a few years.

  • Function continuous_summary() has been deprecated immediately. Apologies for the inconvenience of the immeidate deprecation. The way the function originally worked is not compatible with the updated internal structures. In most cases, users can use the tbl_continuous() function instead.

  • Arguments add_stat(fmt_fun, header, footnote, new_col_name) have been deprecated since v1.4.0 (2021-04-13). They have now been fully removed from the package.

  • Global options have been deprecated in gtsummary since v1.3.1 (2020-06-02). They have now been fully removed from the package.

  • The modify_header(stat_by) argument was deprecated in v1.3.6 (2021-01-08), and has now been fully removed from the package.

  • Use of the vars() selector was first removed in v1.2.5 (2020-02-11), and the messaging about the deprecation was kicked up in June 2022. This use is now defunct and the function will soon no longer be exported.

  • The as_flextable() function was deprecated in v1.3.3 (2020-08-11), and has now been fully removed from the package.

  • Custom selectors all_numeric(), all_character(), all_integer(), all_double(), all_logical(), all_factor() functions were deprecated in v1.3.6 (2021-01-08), and has now been fully removed from the package. These functions were added before the tidyselect::where() function was released, which is a replacement for all these functions.

  • The modify_cols_merge() functions was renamed to modify_column_merge() to match the other function names in v1.6.1 (2022-06-22). The deprecation has been upgraded from a warning to an error.

  • There is a change in the theme_gtsummary_journal("qjecon") theme for gt output. The journal prefers to present regression coefficients above their standard errors. To achieve this placement in gt table, we were taking advantage of a bug or feature (depending on your point of view) that allowed this placement when a gt table was output to HTML and HTML only. The gt package is now working on a proper solution for linebreaks within a cell, and until that feature is active, we are not using our hack. There is no change for this theme for the other tabling engine packages.