-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stringsAsFactors default is FALSE on R 4.0+ / not generic solution for factor levels #130
Comments
Saw this coming a couple months ago. Relevant background: |
Yes I saw that blog also, and expected we would have to update soilDB shortly. uncode() only converts those columns that match the ColumnPhysicalNames from within NASIS's metadata domains into factors. All other character data columns, such as taxonname, are left as characters. I'm not sure converting the stringsAsFactors argument into an option is the way to go, as that obscures the functionality from your typical user, but I'd be open to if that is the consensus. We should probably still document that functionality in each functions help file details section. If so it won't really reduce the documentation overhead. I really think it's preferable to maintain parity with base R, therefore I suggest we change the argument name from stringsAsFactors to as.is. Although I think in issue #53 it was decided that the default should be TRUE. As to the issue of the resulting factor ordering produced by uncode(). I think the best long-term solution would be to create a subroutine within uncode() to correct the ordering for those factors deemed to be 'out of order'. Rather than modify within each and every function where it occurs. The same solution might be used for those variables that should be excluded. I had considered the possibility of going to the source, and fixing the ordering in NASIS with the help of George or Kyle, but... decided that probably wasn't a good idea as it would destroy any backward compatibility with old data. |
Not sure how it suddenly causes a problem, since stringsAsFactors has been FALSE by default since R 4+, and this just started testing bad last week. Comparison toggling stringsAsFactors back to former default daff::diff_data(
soilDB::get_cosoilmoist_from_NASISWebReport(
"EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded",
stringsAsFactors = FALSE
),
soilDB::get_cosoilmoist_from_NASISWebReport(
"EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded",
stringsAsFactors = TRUE
)
)
#> Loading required namespace: rvest
#> Warning in eval(substitute(expr), e): NAs introduced by coercion
#> Warning in eval(substitute(expr), e): NAs introduced by coercion
#> Daff Comparison: 'soilDB::get_cosoilmoist_from_NASISWebReport("EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded", ' ' stringsAsFactors = FALSE)' vs. 'soilDB::get_cosoilmoist_from_NASISWebReport("EVAL - MLRA 111A - Ross silt loam, 0 to 2 percent slopes, frequently flooded", ' ' stringsAsFactors = TRUE)'
#> First 6 and last 6 patch lines:
#> nationalmusym muname
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> -> 5d8l Ross silt loam
#> ... ... ...
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> -> 2w56h Ross silt loam, 0 to 2 percent slopes, frequently flooded
#> dmuiid coiid compname comppct_r drainagecl month
#> -> 119696 238422 Ross 100 well jan
#> -> 119696 238422 Ross 100 well feb
#> -> 119696 238422 Ross 100 well mar
#> -> 119696 238422 Ross 100 well apr
#> -> 119696 238422 Ross 100 well may
#> -> 119696 238422 Ross 100 well jun
#> ... ... ... ... ... ... ...
#> -> 749825 2494482 Sloan 5 very poorly sep
#> -> 749825 2494482 Sloan 5 very poorly oct
#> -> 749825 2494482 Sloan 5 very poorly nov
#> -> 749825 2494482 Sloan 5 very poorly nov
#> -> 749825 2494482 Sloan 5 very poorly dec
#> -> 749825 2494482 Sloan 5 very poorly dec
#> flodfreqcl floddurcl pondfreqcl ponddurcl
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->frequent very brief Not populated->none <NA>
#> -> Not populated->very rare very brief Not populated->none <NA>
#> ... ... ... ... ...
#> -> Not populated <NA> Not populated <NA>
#> -> Not populated <NA> Not populated <NA>
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> -> Not populated->frequent brief Not populated->occasional brief
#> cosoilmoistiid dept_l dept_r dept_h depb_l depb_r depb_h
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> -> NA NA->201 NA->201 NA->201 NA->201 NA->201 NA->201
#> ... ... ... ... ... ... ... ...
#> -> 23327632 0 0 0 200 200 200
#> -> 23327616 0 0 0 200 200 200
#> -> 23327627 0 0 0 0 7 15
#> -> 23327628 0 7 15 200 200 200
#> -> 23327630 0 0 0 0 7 15
#> -> 23327629 0 7 15 200 200 200
#> status
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> -> NULL->Not populated
#> ... ...
#> -> NULL->moist
#> -> NULL->moist
#> -> NULL->moist
#> -> NULL->wet
#> -> NULL->moist
#> -> NULL->wet |
Thanks Andrew. I looked into and fixed. I'll try and examine how do deal with all of the other functions shortly. |
- default.stringsAsFactors() has been deprecated;
* Deprecate `stringsAsFactors` argument to `uncode()` #130 - default.stringsAsFactors() has been deprecated; * fetchHenry: backward compatibility with R<4.0 * Add soilDB.NASIS.DomainsAsFactor option and helper function NASISDomainsAsFactor() to facilitate deprecation of base R stringsAsFactors option * Docs
We are passing around
default.stringsAsFactors()
to a lot of functions, each with its own arguments, flow-control, and documentation to manage. As far as I can tell (@smroecker please correct me if I am wrong), this is all to handle howuncode()
converts or does not convert characters to factors, using the NASIS domains as the default levels. We have discussed at great length in #53, but some issues remain:droplevels()
usually required to clean-up output)uncode()
should probably use an option to reduce (41 functions below) argument / documentation overheadtaxonname
) should never be converted to factorsfetchNASIS
)Functions to Fix
fetchNASIS_components.R:.fetchNASIS_components <- function(SS=TRUE, rmHzErrors=TRUE, fill = FALSE, stringsAsFactors = default.stringsAsFactors()) {
fetchNASIS_pedons.R:.fetchNASIS_pedons <- function(SS=TRUE, rmHzErrors=TRUE, nullFragsAreZero=TRUE, soilColorState='moist', lab=FALSE, stringsAsFactors = default.stringsAsFactors()) {
fetchNASIS_report.R:.get_site_from_NASISReport <- function(url = NULL, nullFragsAreZero = TRUE, stringsAsFactors = default.stringsAsFactors()
fetchNASIS_report.R:.get_pediagfeatures_from_NASISTemp <- function(stringsAsFactors = default.stringsAsFactors()
fetchVegdata.R:fetchVegdata <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_component_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_legend_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_lmuaoverlap_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_mapunit_from_NASIS <- function(SS = TRUE, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_component_correlation_data_from_NASIS_db <- function(SS=TRUE, dropAdditional=TRUE, dropNotRepresentative=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_component_copm_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_component_esd_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_data_from_NASIS_db.R:get_comonth_from_NASIS_db <- function(SS=TRUE, fill=FALSE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_component_from_NASISWebReport <- function(projectname, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_chorizon_from_NASISWebReport <- function(projectname, fill = FALSE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_legend_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_lmuaoverlap_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_mapunit_from_NASISWebReport <- function(areasymbol, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_projectmapunit_from_NASISWebReport <- function(projectname, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_LIMS.R:get_projectmapunit2_from_NASISWebReport <- function(mlrassoarea, fiscalyear, projectname, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_SDA.R:get_legend_from_SDA <- function(WHERE = NULL, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_component_from_SDA.R:get_lmuaoverlap_from_SDA <- function(WHERE = NULL, droplevels = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_concentrations_from_NASIS_db.R:get_concentrations_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_cosoilmoist_from_LIMS.R:get_cosoilmoist_from_NASISWebReport <- function(projectname, impute = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_cosoilmoist_from_NASIS.R:get_cosoilmoist_from_NASIS <- function(impute = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_extended_data_from_NASIS_db.R:get_extended_data_from_NASIS_db <- function(SS=TRUE, nullFragsAreZero=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_hz_data_from_NASIS_db.R:get_hz_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_phfmp_from_NASIS_db.R:get_phfmp_from_NASIS_db <- function(SS = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_projectmapunit_from_NASIS.R:get_projectmapunit_from_NASIS <- function(SS = TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_site_data_from_NASIS_db.R:get_site_data_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_soilseries_from_NASIS.R:get_soilseries_from_NASIS <- function(stringsAsFactors = default.stringsAsFactors()) {
get_soilseries_from_NASIS.R:get_soilseries_from_NASISWebReport <- function(soils, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R:get_vegplot_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_location_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_trhi_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_species_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_transect_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_transpecies_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_tree_si_summary_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_tree_si_details_from_NASIS_db <- function(SS=TRUE, stringsAsFactors = default.stringsAsFactors()) {
get_vegplot_data_from_NASIS_db.R: get_vegplot_textnote_from_NASIS_db <- function(SS=TRUE, fixLineEndings=TRUE, stringsAsFactors = default.stringsAsFactors()) {
The text was updated successfully, but these errors were encountered: