Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID is not returned when testing for empty string #14

Open
Rekyt opened this issue Feb 14, 2023 · 1 comment
Open

ID is not returned when testing for empty string #14

Rekyt opened this issue Feb 14, 2023 · 1 comment
Assignees

Comments

@Rekyt
Copy link

Rekyt commented Feb 14, 2023

I'm working through an example with 100,000s of names.
To easily recover the names I use data.frame input from the package.
Because there are many names, some of them may be empty, but they do correspond to proper uncorrected name and I would like to recover them.

However, empty strings are silently removed from input and never found in the output dataset. While NAs don't get any IDs associated to them.

# Problem with IDs when using empty name 
taxa_frame = data.frame(
  ID = paste0("test-", 1:4),
  name = c(NA, "Helianthus", "", " ")
)

matched = TNRS::TNRS(taxa_frame)

# The ID is not preserved when testing for an empty string, while it is for NA
# or spaces
matched[, 1:5]
#>       ID Name_submitted Unmatched_terms Overall_score Name_matched_id
#> 1   <NA>          FALSE           FALSE            NA                
#> 2 test-2     Helianthus                             1          668749
#> 3 test-4                                           NA

Created on 2023-02-14 with reprex v2.0.2

@ojalaquellueva
Copy link
Member

@Rekyt @bmaitner This issue is closely related to #15 and has to do with how the perl controller (correction: in the core code, not the API) prepares the request before submitting parallel batches to the (non-parallel) batch-processing application, which in turn submits each name individually to the PHP+MySQL name resolver. Empty strings are stripped as these can cause the resolver to crash. I'm not sure what's going on with NAs. I will need to take a close look at how R NAs get transformed as they get passed from R to PHP to Perl to PHP to MySQL and back.

In any case, it seems like the best way to handle this would be to store the users original request (names + optional IDs) unaltered as an array, then stitch it back together with the response after the entire request has been processed. That way, rows missing from the response due to empty strings or NAs would be present in their original form in the data returned to the user. I don't think skills are up to messing with the controller (not my code), but perhaps I could handle it in PHP on the API end. I'll take a look and see what I can do...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants