diff --git a/DESCRIPTION b/DESCRIPTION index bc4fd437..f7f32f8d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: collapse Title: Advanced and Fast Data Transformation -Version: 1.8.3 +Version: 1.8.4 Authors@R: c( person("Sebastian", "Krantz", role = c("aut", "cre"), email = "sebastian.krantz@graduateinstitute.ch"), diff --git a/NEWS.md b/NEWS.md index 785f8371..30856a5c 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# collapse 1.8.4 + +* Makevars text substitution hack to have CRAN accept a package that combines C, C++ and OpenMP. Thanks also to @MichaelChirico for pointing me in the right direction. + # collapse 1.8.3 * Significant speed improvement in `qF/qG` (factor-generation) for character vectors with more than 100,000 obs and many levels if `sort = TRUE` (the default). For details see the `method` argument of `?qF`. diff --git a/docs/404.html b/docs/404.html index 0b74fc34..52f9b30f 100644 --- a/docs/404.html +++ b/docs/404.html @@ -32,7 +32,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index bee7ad95..3082fb0b 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -17,7 +17,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/collapse_and_data.table.html b/docs/articles/collapse_and_data.table.html index d57e69c9..d23873df 100644 --- a/docs/articles/collapse_and_data.table.html +++ b/docs/articles/collapse_and_data.table.html @@ -31,7 +31,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/collapse_and_dplyr.html b/docs/articles/collapse_and_dplyr.html index f5b0cf12..5a199771 100644 --- a/docs/articles/collapse_and_dplyr.html +++ b/docs/articles/collapse_and_dplyr.html @@ -31,7 +31,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/collapse_and_plm.html b/docs/articles/collapse_and_plm.html index 265940cf..6e91555c 100644 --- a/docs/articles/collapse_and_plm.html +++ b/docs/articles/collapse_and_plm.html @@ -31,7 +31,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/collapse_and_sf.html b/docs/articles/collapse_and_sf.html index 4ce5f256..a38999b5 100644 --- a/docs/articles/collapse_and_sf.html +++ b/docs/articles/collapse_and_sf.html @@ -31,7 +31,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/collapse_intro.html b/docs/articles/collapse_intro.html index fcc9c149..09d2c026 100644 --- a/docs/articles/collapse_intro.html +++ b/docs/articles/collapse_intro.html @@ -31,7 +31,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/articles/index.html b/docs/articles/index.html index 36144571..ae691354 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -71,7 +71,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/authors.html b/docs/authors.html index ab0c1fd6..de174c2a 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/index.html b/docs/index.html index ee2fc4de..2317bfd5 100644 --- a/docs/index.html +++ b/docs/index.html @@ -60,7 +60,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/news/collapse1.7.digest.html b/docs/news/collapse1.7.digest.html index 1f068a88..1189d0a2 100644 --- a/docs/news/collapse1.7.digest.html +++ b/docs/news/collapse1.7.digest.html @@ -18,7 +18,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/news/index.html b/docs/news/index.html index 89e4f5f6..82c67db7 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 @@ -61,128 +61,488 @@

Changelog

Source: NEWS.md +
+ +
-
-
-
-

collapse 1.8.0, released mid of May 2022, brings enhanced support for indexed computations on time series and panel data by introducing flexible ‘indexed_frame’ and ‘indexed_series’ classes and surrounding infrastructure, sets a modest start to OpenMP multithreading as well as data transformation by reference in statistical functions, and enhances the packages descriptive statistics toolset.

+

collapse 1.8.0, released mid of May 2022, brings enhanced +support for indexed computations on time series and panel data by +introducing flexible ‘indexed_frame’ and ‘indexed_series’ classes and +surrounding infrastructure, sets a modest start to OpenMP multithreading +as well as data transformation by reference in statistical functions, +and enhances the packages descriptive statistics toolset.

Changes to functionality

-

Bug Fixes

-

Additions

Improvements

-
-
-
-
-
-
-

collapse 1.7.0, released mid January 2022, brings major improvements in the computational backend of the package, it’s data manipulation capabilities, and a whole set of new functions that enable more flexible and memory efficient R programming - significantly enhancing the language itself. For the vast majority of codes, updating to 1.7 should not cause any problems.

+

collapse 1.7.0, released mid January 2022, brings major +improvements in the computational backend of the package, it’s data +manipulation capabilities, and a whole set of new functions that enable +more flexible and memory efficient R programming - significantly +enhancing the language itself. For the vast majority of codes, updating +to 1.7 should not cause any problems.

Changes to functionality

-

Bug Fixes

-

Additions

Basic Computational Infrastructure
-
  • Function group was added, providing a low-level interface to a new unordered grouping algorithm based on hashing in C and optimized for R’s data structures. The algorithm was heavily inspired by the great kit package of Morgan Jacob, and now feeds into the package through multiple central functions (including GRP / fgroup_by, funique and qF) when invoked with argument sort = FALSE. It is also used in internal groupings performed in data transformation functions such as fwithin (when no factor or ‘GRP’ object is provided to the g argument). The speed of the algorithm is very promising (often superior to radixorder), and it could be used in more places still. I welcome any feedback on it’s performance on different datasets.

  • -
  • Function gsplit provides an efficient alternative to split based on grouping objects. It is used as a new backend to rsplit (which also supports data frame) as well as BY, collap, fsummarise and fmutate - for more efficient grouped operations with functions external to the package.

  • -
  • Added multiple functions to facilitate memory efficient programming (written in C). These include elementary mathematical operations by reference (setop, %+=%, %-=%, %*=%, %/=%), supporting computations involving integers and doubles on vectors, matrices and data frames (including row-wise operations via setop) with no copies at all. Furthermore a set of functions which check a single value against a vector without generating logical vectors: whichv, whichNA (operators %==% and %!=% which return indices and are significantly faster than ==, especially inside functions like fsubset), anyv and allv (allNA was already added before). Finally, functions setv and copyv speed up operations involving the replacement of a value (x[x == 5] <- 6) or of a sequence of values from a equally sized object (x[x == 5] <- y[x == 5], or x[ind] <- y[ind] where ind could be pre-computed vectors or indices) in vectors and data frames without generating any logical vectors or materializing vector subsets.

  • -
  • Function vlengths was added as a more efficient alternative to lengths (without method dispatch, simply coded in C).

  • -
  • Function massign provides a multivariate version of assign (written in C, and supporting all basic vector types). In addition the operator %=% was added as an efficient multiple assignment operator. (It is called %=% and not %<-% to facilitate the translation of Matlab or Python codes into R, and because the zeallot package already provides multiple-assignment operators (%<-% and %->%), which are significantly more versatile, but orders of magnitude slower than %=%)

  • +
    • Function group was added, providing a low-level +interface to a new unordered grouping algorithm based on hashing in C +and optimized for R’s data structures. The algorithm was heavily +inspired by the great kit package of Morgan Jacob, and now +feeds into the package through multiple central functions (including +GRP / fgroup_by, funique and +qF) when invoked with argument sort = FALSE. +It is also used in internal groupings performed in data transformation +functions such as fwithin (when no factor or ‘GRP’ object +is provided to the g argument). The speed of the algorithm +is very promising (often superior to radixorder), and it +could be used in more places still. I welcome any feedback on it’s +performance on different datasets.

    • +
    • Function gsplit provides an efficient alternative to +split based on grouping objects. It is used as a new +backend to rsplit (which also supports data frame) as well +as BY, collap, fsummarise and +fmutate - for more efficient grouped operations with +functions external to the package.

    • +
    • Added multiple functions to facilitate memory efficient +programming (written in C). These include elementary mathematical +operations by reference (setop, %+=%, +%-=%, %*=%, %/=%), supporting +computations involving integers and doubles on vectors, matrices and +data frames (including row-wise operations via setop) with +no copies at all. Furthermore a set of functions which check a single +value against a vector without generating logical vectors: +whichv, whichNA (operators %==% +and %!=% which return indices and are significantly faster +than ==, especially inside functions like +fsubset), anyv and allv +(allNA was already added before). Finally, functions +setv and copyv speed up operations involving +the replacement of a value (x[x == 5] <- 6) or of a +sequence of values from a equally sized object +(x[x == 5] <- y[x == 5], or +x[ind] <- y[ind] where ind could be +pre-computed vectors or indices) in vectors and data frames without +generating any logical vectors or materializing vector subsets.

    • +
    • Function vlengths was added as a more efficient +alternative to lengths (without method dispatch, simply +coded in C).

    • +
    • Function massign provides a multivariate version of +assign (written in C, and supporting all basic vector +types). In addition the operator %=% was added as an +efficient multiple assignment operator. (It is called %=% +and not %<-% to facilitate the translation of Matlab or +Python codes into R, and because the zeallot package +already provides multiple-assignment operators (%<-% and +%->%), which are significantly more versatile, but +orders of magnitude slower than %=%)

High-Level Features
-
  • Fully fledged fmutate function that provides functionality analogous to dplyr::mutate (sequential evaluation of arguments, including arbitrary tagged expressions and across statements). fmutate is optimized to work together with the packages Fast Statistical and Data Transformation Functions, yielding fast, vectorized execution, but also benefits from gsplit for other operations.

  • -
  • across() function implemented for use inside fsummarise and fmutate. It is also optimized for Fast Statistical and Data Transformation Functions, but performs well with other functions too. It has an additional arguments .apply = FALSE which will apply functions to the entire subset of the data instead of individual columns, and thus allows for nesting tibbles and estimating models or correlation matrices by groups etc.. across() also supports an arbitrary number of additional arguments which are split and evaluated by groups if necessary. Multiple across() statements can be combined with tagged vector expressions in a single call to fsummarise or fmutate. Thus the computational framework is pretty general and similar to data.table, although less efficient with big datasets.

  • -
  • Added functions relabel and setrelabel to make interactive dealing with variable labels a bit easier. Note that both functions operate by reference. (Through vlabels<- which is implemented in C. Taking a shallow copy of the data frame is useless in this case because variable labels are attributes of the columns, not of the frame). The only difference between the two is that setrelabel returns the result invisibly.

  • -
  • function shortcuts rnm and mtt added for frename and fmutate. across can also be abbreviated using acr.

  • -
  • Added two options that can be invoked before loading of the package to change the namespace: options(collapse_mask = c(...)) can be set to export copies of selected (or all) functions in the package that start with f removing the leading f e.g. fsubset -> subset (both fsubset and subset will be exported). This allows masking base R and dplyr functions (even basic functions such as sum, mean, unique etc. if desired) with collapse’s fast functions, facilitating the optimization of existing codes and allowing you to work with collapse using a more natural namespace. The package has been internally insulated against such changes, but of course they might have major effects on existing codes. Also options(collapse_F_to_FALSE = FALSE) can be invoked to get rid of the lead operator F, which masks base::F (an issue raised by some people who like to use T/F instead of TRUE/FALSE). Read the help page ?collapse-options for more information.

  • +
    • Fully fledged fmutate function that provides +functionality analogous to dplyr::mutate (sequential +evaluation of arguments, including arbitrary tagged expressions and +across statements). fmutate is optimized to +work together with the packages Fast Statistical and Data +Transformation Functions, yielding fast, vectorized execution, but +also benefits from gsplit for other operations.

    • +
    • across() function implemented for use inside +fsummarise and fmutate. It is also optimized +for Fast Statistical and Data Transformation Functions, but +performs well with other functions too. It has an additional arguments +.apply = FALSE which will apply functions to the entire +subset of the data instead of individual columns, and thus allows for +nesting tibbles and estimating models or correlation matrices by groups +etc.. across() also supports an arbitrary number of +additional arguments which are split and evaluated by groups if +necessary. Multiple across() statements can be combined +with tagged vector expressions in a single call to +fsummarise or fmutate. Thus the computational +framework is pretty general and similar to data.table, although +less efficient with big datasets.

    • +
    • Added functions relabel and setrelabel +to make interactive dealing with variable labels a bit easier. Note that +both functions operate by reference. (Through vlabels<- +which is implemented in C. Taking a shallow copy of the data frame is +useless in this case because variable labels are attributes of the +columns, not of the frame). The only difference between the two is that +setrelabel returns the result invisibly.

    • +
    • function shortcuts rnm and mtt added +for frename and fmutate. across +can also be abbreviated using acr.

    • +
    • Added two options that can be invoked before loading of the +package to change the namespace: +options(collapse_mask = c(...)) can be set to export copies +of selected (or all) functions in the package that start with +f removing the leading f +e.g. fsubset -> subset (both +fsubset and subset will be exported). This +allows masking base R and dplyr functions (even basic functions such as +sum, mean, unique etc. if +desired) with collapse’s fast functions, facilitating the +optimization of existing codes and allowing you to work with +collapse using a more natural namespace. The package has been +internally insulated against such changes, but of course they might have +major effects on existing codes. Also +options(collapse_F_to_FALSE = FALSE) can be invoked to get +rid of the lead operator F, which masks +base::F (an issue raised by some people who like to use +T/F instead of +TRUE/FALSE). Read the help page +?collapse-options for more information.

Improvements

-
-

Changes to Functionality

-

Bug Fixes

@@ -342,94 +1159,273 @@

Bug Fixes#99). +unlist2d produced a subsetting error if an empty list +was present in the list-tree. This is now fixed, empty or +NULL elements in the list-tree are simply ignored +(#99).

Additions

-

Improvements

-

A small patch for 1.5.0 that:

-
-

collapse 1.5.0, released early January 2021, presents important refinements and some additional functionality.

+

collapse 1.5.0, released early January 2021, presents +important refinements and some additional functionality.

Back to CRAN

-

Bug Fixes

-

Changes to Functionality

-

Additions

-

Improvements

-
-

collapse 1.4.1 is a small patch for 1.4.0 that:

-
-

collapse 1.4.0, released early November 2020, presents some important refinements, particularly in the domain of attribute handling, as well as some additional functionality. The changes make collapse smarter, more broadly compatible and more secure, and should not break existing code.

+

collapse 1.4.0, released early November 2020, presents some +important refinements, particularly in the domain of attribute handling, +as well as some additional functionality. The changes make +collapse smarter, more broadly compatible and more secure, and +should not break existing code.

Changes to Functionality

-

Additions

-

Improvements

-
-

collapse 1.3.2, released mid September 2020:

-
-

collapse 1.3.1, released end of August 2020, is a patch for v1.3.0 that takes care of some unit test failures on certain operating systems (mostly because of numeric precision issues). It provides no changes to the code or functionality.

+

collapse 1.3.1, released end of August 2020, is a patch for v1.3.0 +that takes care of some unit test failures on certain operating systems +(mostly because of numeric precision issues). It provides no changes to +the code or functionality.

-

collapse 1.3.0, released mid August 2020:

+

collapse 1.3.0, released mid August 2020: +

Changes to Functionality

-

Additions

-

Improvements

-
-

collapse 1.2.1, released end of May 2020:

-
-

collapse 1.2.0, released mid May 2020:

+

collapse 1.2.0, released mid May 2020: +

Changes to Functionality

-

Bug Fixes

-

Additions

Improvements

-
-

collapse 1.1.0 released early April 2020:

-
-
@@ -595,7 +1812,8 @@

collapse Package Options

  • "fast-fun" adds the functions contained in the macro: .FAST_FUN.

  • "fast-stat-fun" adds the functions contained in the macro: .FAST_STAT_FUN.

  • "fast-trfm-fun" adds the functions contained in: setdiff(.FAST_FUN, .FAST_STAT_FUN).

  • -
  • "all" turns on all of the above, and additionally exports a function n() for use in summarise and mutate.

  • +
  • "all" turns on all of the above, and additionally exports a function n() for use in summarise and mutate.

  • Note that none of these options will impact internal collapse code, but they may change the way your programs run. "manip" is probably the safest option to start with. Specifying "fast-fun", "fast-stat-fun", "fast-trfm-fun" or "all" are ambitious as they replace basic R functions like sum and max, introducing collapse's na.rm = TRUE default and different behavior for matrices and data frames. These options also change some internal macros so that base R functions like sum or max called inside fsummarise, fmutate or collap will also receive vectorized execution. In other words, if you put options(collapse_mask = "all") before loading the package, and you have a collapse-compatible line of dplyr code like wlddev |> group_by(region, income) |> summarise(across(PCGDP:POP, sum)), this will now receive fully optimized execution. Note however that because of collapse's na.rm = TRUE default, the result will be different unless you add na.rm = FALSE.

    -

    In General, this option is for your convenience, if you want to write visually more appealing code or you want to translate existing dplyr codes to collapse. Use with care! Note that the option takes effect upon loading the package (code is in the .onLoad file), not upon attaching it, so it needs to be set before any function from the package is accessed in any way by any code you run. A safe way to enable it is by using a .Rprofile file in your user or project directory (see also here or here, the user-level file is located at file.path(Sys.getenv("HOME"), ".Rprofile") and can be edited using file.edit(Sys.getenv("HOME"), ".Rprofile")), or by using a .fastverse configuration file in the project directory.

    +

    In General, this option is for your convenience, if you want to write visually more appealing code or you want to translate existing dplyr codes to collapse. Use with care! Note that the option takes effect upon loading the package (code is in the .onLoad function), not upon attaching it, so it needs to be set before any function from the package is accessed in any way by any code you run. A safe way to enable it is by using a .Rprofile file in your user or project directory (see also here or here, the user-level file is located at file.path(Sys.getenv("HOME"), ".Rprofile") and can be edited using file.edit(Sys.getenv("HOME"), ".Rprofile")), or by using a .fastverse configuration file in the project directory.

    @@ -120,7 +120,8 @@

    See also

    -

    Site built with pkgdown 2.0.2.

    +

    Site built with pkgdown +2.0.2.

    diff --git a/docs/reference/collapse-package.html b/docs/reference/collapse-package.html index 1b1f7f9b..e06371dd 100644 --- a/docs/reference/collapse-package.html +++ b/docs/reference/collapse-package.html @@ -25,7 +25,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/collapse-renamed.html b/docs/reference/collapse-renamed.html index 794866b6..daf86768 100644 --- a/docs/reference/collapse-renamed.html +++ b/docs/reference/collapse-renamed.html @@ -72,7 +72,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/colorder.html b/docs/reference/colorder.html index 0de42b05..b811c226 100644 --- a/docs/reference/colorder.html +++ b/docs/reference/colorder.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/dapply.html b/docs/reference/dapply.html index 1395d63e..44faa238 100644 --- a/docs/reference/dapply.html +++ b/docs/reference/dapply.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/data-transformations.html b/docs/reference/data-transformations.html index 4f778b1a..90d53835 100644 --- a/docs/reference/data-transformations.html +++ b/docs/reference/data-transformations.html @@ -48,7 +48,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/descr.html b/docs/reference/descr.html index a1023d53..23c94839 100644 --- a/docs/reference/descr.html +++ b/docs/reference/descr.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/efficient-programming.html b/docs/reference/efficient-programming.html index 017c6649..a2991fe0 100644 --- a/docs/reference/efficient-programming.html +++ b/docs/reference/efficient-programming.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/extract_list.html b/docs/reference/extract_list.html index 9610f83c..5403482d 100644 --- a/docs/reference/extract_list.html +++ b/docs/reference/extract_list.html @@ -25,7 +25,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/fFtest.html b/docs/reference/fFtest.html index fa28c3c6..a39e6b30 100644 --- a/docs/reference/fFtest.html +++ b/docs/reference/fFtest.html @@ -19,7 +19,7 @@ collapse - 1.8.3 + 1.8.4 diff --git a/docs/reference/fast-data-manipulation.html b/docs/reference/fast-data-manipulation.html index 3770511a..32c559bf 100644 --- a/docs/reference/fast-data-manipulation.html +++ b/docs/reference/fast-data-manipulation.html @@ -1,7 +1,7 @@ - Fast Data Manipulation — fast-data-manipulation • collapseFast Data Manipulation — fast-data-manipulation • collapse collapse - 1.8.3 + 1.8.4 @@ -78,10 +78,10 @@

    Fast Data Manipulation

    collapse provides the following functions for fast manipulation of (mostly) data frames.