-
In
dfSummary()
:- It is now possible to control which statistics to show in the
Freqs / Values column (see
help("st_options", "summarytools")
for examples) - In html outputs, tables are better aligned horizontally (categories >>
counts >> charts); if misalignment occurs, adjusting
graph.magnif
should resolve it - List-type columns and
Inf
values no longer generate errors tmp.img.dir
can be left toNA
whenstyle = "grid"
- Fixed typo in attribute name
Dataf.rame.label
- Removal of grouping variables is now consistent across all languages
- It is now possible to control which statistics to show in the
Freqs / Values column (see
-
In
descr()
:- Fixed headings being shown even if
headings=FALSE
(when usingstby()
ordplyr::group_by()
)
- Fixed headings being shown even if
-
In
ctable()
:- Fixed row/column names not always properly displayed
- Fixed risk ratios showing when only odds ratios should
- Fixed error when
prop="none"
with integer data
-
Selected heading elements can be totally omitted by one of two ways:
- Setting their value to empty string using
print()
orview()
parameters (in?print.summarytools
, refer to list of arguments that can be used to override heading elements) - Using
define_keywords()
and setting the heading's label to empty string
- Setting their value to empty string using
-
Improved functionality for customized terms / translations (see
vignette("introduction", "summarytools")
for details) -
fix-valign.tex
is now in the includes directory for use with R Markdown when creating pdf documents withdfSummary()
outputs - see `vignette("rmarkdown", ) -
Navigation links and table of contents were added to introductory vignette, making it is easier to navigate
- Style "jira" has been added to reflect pander's support for it.
- Documentation has been reviewed and improved.
- In
dfSummary()
:- When generating a
dfSummary()
in Rmarkdown usingmethod = "render"
, it is possible to settmp.img.dir = NA
. It must still be defined (not asNA
) whenmethod = "pander"
andstyle = "grid"
. - Grouping variable(s) are now excluded from results when using
stby()
ordpyr::group_by()
. Usekeep.grp.vars = TRUE
to replicate previous behavior. - Removed an extra (empty) line in text graphs
- When generating a
- In
ctable()
andfreq()
:- Fixed bug with integers
- The
ctable.round.digits
was added to the list ofst_options()
(there is already a globalround.digits
option, but it uses2
as default, while1
is a more sensible value forctable()
. print.summarytools()
now removes titles from headings when keyword "title.function" is set toNA
or empty string.
Version 0.9.8 is essentially the CRAN release of the 0.9.7 GitHub-Only release which saw gradual changes being implemented over the course of several months. See changes listed under 0.9.7 for changes since last CRAN release (0.9.6)
GitHub-only release - this was a constantly evolving version to be eventually released as 0.9.8 on CRAN when it reached maturity.
- Added shortcut function
stview()
pointing tosummarytools::view()
. This avoids potential conflicts with other packages using the more and more popularview()
function (notably, tibble, part of the tidyverse family, definesview()
as an alias forView()
) - Enforced adequate number formatting using
format()
internally, so that using the following optional arguments with any core function or withprint()
orview()
will produce expected results:- decimal.mark, big.mark, small.mark
- nsmall, digits
- scientific
- big.interval, small.interval (limited support)
- Fixed a bug arising when an object created using a language other
than the active one (
st_options("lang")
) was displayed - Improved string encoding behavior
- Added global option "char.split" to control maximum number of characters
allowed in
descr()
andctable()
column headings showing variable names - html footnotes are now always enclosed within a
<p>
tag - Updated hex logo and added a favicon in html reports
- Simplified and improved performance of what.is()
- In
dfSummary()
:- Added support for list-type columns
- Improved performance by optimizing barcode detection and blank character replacements, which are the two main bottlenecks
- Fixed a bug with barcode detection
- Changed default value of round.numbers to 1 (which was de facto applied)
- round.numbers doesn't affect proportions - only 1 decimal is shown, always
- Made slight adjustments to the html graphs appearance
- Improved alignment of Freq cell when numerical values are shown
- Replaced "!" with "*" for rounded-values notice
- Fixed issue where grouped
dfSummary()
tables would end up nested in one another - Added a check for numerical variables having infinitesimal variability, in which case a linear transformation is applied to obtain better histograms
- In
descr()
:- Added the
order
argument that gives the option to display variables in their order of appearance in the data or in a custom order (as opposed to the default behavior which is to display them alphabetically sorted)
- Added the
- In
freq()
, values for theorder
argument are now singular (backward compatibility is preserved for now)- levels --> level
- names --> name
- Three global options (set via
st_options()
were added:- dfSummary.style ("multiline" by default; can also be set to "grid")
- freq.cumul (TRUE by default; set to FALSE to hide cumulative proportions)
- freq.ignore.threshold (25 by default; when feeding
freq()
a whole data frame, this number determines how many distinct values are allowed for numerical variables. Above that number, the variable will be ignored)
- In
ctable()
:- Added Odds Ratio and Risk Ratio (aka Relative Risk) statistics with 95% C.I.'s
- Fixed issue with chi-square statistic not reporting appropriate values
- Fixed html alignment of statistics below the table (now centering based on table width as it should)
- In
dfSummary()
, fixed an issue arising when a very large range of numeric values exists in a column
- Eliminated automatic check for X11 capabilities as it caused problems on some
systems; the user can instead set global option
st_options(use.x11 = FALSE)
if encountering problems - To simplify installation on Unix-like systems (including Mac OS), the
RCurl::base64Encode()
function used to create ascii-encoded graphs in html documents isn't used anymore;base64enc::base64encode()
is used instead - When saving outputs to .Rmd documents; 'plain.ascii' is now automatically set to FALSE and 'style' is automatically set to "rmarkdown", in accordance with with the way .md documents are generated
- Fixed bug arising with data frames called "data"
freq.silent
was added to global options- Weights are now supported for
freq()
used in conjunction withstby()
ordplyr::group_by()
- Weights are also supported for
ctable()
used in conjunction withstby()
(but not withdplyr::group_by()
) - Improvements and fixes for
dfSummary()
:- Fixed null graphic device appearing in RGui and non-GUI interfaces
- Calling
summarytools::dfSummary()
(without loading the package) is now possible - Improved Rmarkdown compatibility
- Improvements and fixes for
descr()
:- When
descr()
withstby()
, results are no longer assembled into a single table if more than one grouping variables are used - Fixed bug arising when using
stby()
with several grouping variables
- When
- Added support for dplyr's
group_by()
function as an alternative tostby()
- Added support for magrittr
%$%
operator - Added support for pipeR
%>>%
operator freq()
recognizes factor level "(Missing)" fromforcats::fct_explicit_na
asNA
's- For
freq()
objects,collapse
boolean parameter has been added as an experimental feature (must be set in theprint()
method) - Improved output when grouping by more than one variable, either with
stby()
ordplyr::group_by()
tb()
supports objects having several grouping variablestb()
has an added parameter "na.rm" forfreq()
objects- Improved how
descr()
deals with empty vectors and invalid weights - Fixed a problem with
freq()
arising when using sampling weights while no missing values were present
- Function
tb()
turnsfreq()
anddescr()
outputs into "tidy" tibbles - Function
define_keywords()
allows defining translatable terms in GUI and optionally save the results in a csv file (through Save File... dialog) - Function
use_custom_lang()
replacesuseTranslations()
and triggers an Open File... dialog when no argument is supplied - In
freq()
:- A new parameter
cumul
allows turning on or off cumulative proportions - The order parameter values "names", "freq", and "levels" now have their counterparts "-names" (or "names-"), "-freq" and "-levels"
- A new parameter
rows
has been added; it allows subsetting the output table either with a numeric vector, a character vector, or a single search string (regular expression)
- A new parameter
- In
ctable()
:- Added support for weights
- Added logical argument "chisq.test" to display chi-square results below the cross-tabulation table
- In
dfSummary()
, added content specific to email addresses: valid, invalid, duplicates - Added translations : Portuguese ("pt"), Turkish ("tr"), and Russian ("ru")
byst()
had to be dropped because of issues related to objects names; onlystby()
is accepted from now onuseTranslations()
has been replaced byuse_custom_lang()
No changes (re-submission of 0.9.1 to CRAN)
For users updating solely from CRAN, this is a major update. Many changes were introduced since version 0.8.8 (versions 0.8.9 and 0.9.0 were released solely on GitHub). Please refer to the README file, the two vignettes and the information below for all the details.
stby()
, a summarytools-specific version ofby()
, is introduced. It is highly recommended that you use it instead ofby()
; its syntax is identical and it greatly simplifies the printing of the generated objects- In
dfSummary()
:- 'max.tbl.height' allows printing summaries in scrollable windows (useful in .Rmd when a data frame contains numerous variables)
- Setting 'tmp.img.dir' allows the inclusion of png graphs in Rmarkdown documents when using pander method in combination with arguments plain.ascii = FALSE and style = "grid"
- Platform-specific png device types are used, improving image quality
- Several examples are added to all main functions; use the
example()
function to access them
- Output translations are introduced. For instance, setting
st_options(lang='fr')
gives access to French translations. Spanish ('es') translations are also available. - Function
useTranslations()
(which in later versions becomesuse_custom_lang()
) allows using custom translations - In
descr()
, the weight variable (when used) is automatically removed from the list of variables to analyze - In
dfSummary()
, images are processed using functions from the magick package, improving the general layout of the output tables - Improved support for magrittr operators
- In
dfSummary()
:- Number of columns and number of duplicates added to the headings section
- Integer sequences as well as UPC/EAN codes are detected and identified
- Statistics for unary / binary data are simplified
- Dimension of the bars in barplots now reflect frequencies relative to the whole dataset, allowing comparisons across variables
- In
descr()
, the 'stats' parameter accepts values "common" and "fivenum" - With
st_options()
, setting multiple options at once is now possible; all options have their own parameter (the legacy way of setting options is still supported) - More parameters can be overridden when calling
print()
orview()
- refer to theprint()
method's documentation to learn more - The 'omit.headings' parameter is replaced by the more straightforward (and still boolean) 'headings'. The former is still supported but will disappear in a future release (possibly 0.9.2)
- Row subsetting is no longer displayed in the headings section, as it was error-prone
Special thanks to Paul Feitsma for his numerous suggestions.
- Fixed character encoding issues
- In
dfSummary()
:- Fixed an issue with
dfSummary()
where reported percentages could exceed 100% under specific circumstances - Fixed issue with groups not being properly updated when used in Shiny apps
- Fixed an issue with
- Fixed an issue with
dfSummary()
when missing values were present along with whole numbers - Fixed an issue with
descr()
where group was not shown for the first group when omitting headings
- In
dfSummary()
:- "Label" column now shows proper line breaks in the html versions
- In lists of values / frequencies, digits are omitted for integer variables and for numerics containing whole numbers only
- Fixed an error when using the pipe (
%>%
) operator
- In
descr()
, fixed a calculation error for coefficient of variation (cv) - In
ctable()
html outputs, '<' and '>' are properly escaped when appearing in row or column names
- In
dfSummary()
:- Time intervals in
dfSummary()
now uselubridate::as.period()
- Line feeds in ASCII barplots now displayed correctly
- String trimming is applied consistently
- Time intervals in
- In
ctable()
, argument 'useNA' now correctly accepts value "no"
- Method for calculating number of bins in
dfSummary()
histograms changed (fromnclass.FD()
tonclass.Sturges()
) - Removed extra space in
dfSummary()
with time objects - Allowed one more value for frequency counts in
dfSummary()
when "[1 other value]" was displayed; this actual value is now displayed instead
- Introduced global options with
st_options()
- New logical options for
freq()
: 'totals' and 'display.nas' - In
descr()
, Q1 and Q3 were added; also, the order in the 'stats' argument is now reflected in the output table - In all functions, argument 'split.table' becomes 'split.tables' as per changes in the 'pander' package
- Argument 'omit.headings' added to all main functions
- In
dfSummary()
:- Number alignment improvements
- Fixed frequencies not appearing when value 0 was present
- Fixed arguments 'valid.col' and 'na.col' being unresponsive
- Fixed warnings when generating graphics columns
- 'graph.magnif' argument now allows shrinking or enlarging graphs
- For
print()
andview()
, argument 'html.table.class' is now called 'table.classes' and its usage is simplified (please refer to the corresponding help files for details - CSS class 'st-small' can be used to make html tables smaller by slightly reducing font size and cell padding
- Added support for Date / POSIXt objects in
dfSummary()
- Improved support for
lapply()
when used withfreq()
- Fixed performance issue with numerical data having a very large range
- Fixed missing line feeds in
dfSummary()
bar charts
- Fixed issue with all-NA factors in
dfSummary()
- Fixed issues with graphs in
dfSummary()
- Added vignettes
- Improved handling of negative column indexing by the parsing function
- Removed accentuated characters from docs
- Introducing graphs in
dfSummary()
- Added rudimentary support for
lapply()
to be used withfreq()
- Improved alignment of numbers in both html and ASCII tables
- Cleanup in css and upgrade to Bootstrap 4 beta
- Improved support for
by()
andwith()
withdescr()
,freq()
, andctable()
In dfSummary()
, parameter name 'display.labels' has been changed to
'labels.col' for consistency reasons. Also, see Notes for Version 0.6.9 about
the 'file' parameter.
GitHub-only release
- Improved alignment in cells having counts + proportions
- Updated vignette to reflect latest changes and added examples using the example datasets "exams" and "tobacco"
dfSummary()
's last column now includes counts and percentages for both valid and missing data- Internal change: Roxygen2 is now used to generate documentation
GitHub-only release
- Introduced
ctable()
for cross-tabulations - Extended support for printing objects created using
by()
and/orwith()
: variable names, labels and by-groups are now displayed correctly view()
is now more than just a wrapper function for theprint()
method; it is the function to use when printing an object created withby()
- Appending to summarytools-generated html files is now possible
- Most pander options stored in summarytools objects can be overridden by
print()
orview()
freq()
has an new parameter, 'order', allowing to order rows by count rather than values- Alignment of numbers in
descr()
observations table has been improved
The 'file' parameter must now be used with print()
or view()
; its
use with other functions is now deprecated.
- Improved the way
dfSummary()
reports frequencies for character variables - Fixed problems with outputs when using weights
- Added hash markup to table headings for better markdown integration
- Added an option to the
print()
method to suppress the footnote in HTML outputs - Fixed a problem with
dfSummary()
which arose when number of factor levels exceeded max.distinct.values
- Added Introductory vignette
- Fixed markdown output that would not render strings such as
- Improved multiline tables line feeds
- Improved sample datasets
- Removed Bootstrap content not likely to be used
- Changed the way
method = "browser"
sends file path to browser for better cross-platform compatibility - Improved results when using
by()
GitHub-only release
- Function
descr()
now supports weights - Output from
what.is()
has been simplified - Other changes are transparent to the user, but make the internals more consistent across functions
freq()
now supports weights.- Better knitr integration
- Added sample datasets
view()
allows opening HTML tables in RStudio's Viewerdesc()
is renameddescr()
- Returned objects are now of class "summarytools" and have several
attributes that are used by
print.summarytools()
print.summarytools()
has argument 'method' that can be one of "pander", "viewer", or "browser", the last two being used to display an HTML version of the output, using Bootstrap's CSS (https://getbootstrap.com)- Row indexing is "detected" and reported
- Rounding occurs when results are displayed (non-rounded results are stored)
- Argument 'echo' is deprecated
unistats()
is now calleddesc()
frequencies()
is now calledfreq()
desc()
now accepts data frames as first argument; factors and character columns will be ignoreddesc()
results tables can be transposedfreq()
returns a matrix-table rather than a list- rapportools is used instead of Hmisc for variable labels
- Function
properties()
was removed.
Initial Release