Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: no implicit conversion of String into Integer #131

Open
sentry-io bot opened this issue Feb 28, 2022 · 6 comments
Open

TypeError: no implicit conversion of String into Integer #131

sentry-io bot opened this issue Feb 28, 2022 · 6 comments
Assignees

Comments

@sentry-io
Copy link

sentry-io bot commented Feb 28, 2022

Sentry Issue: LUPO-1KW

TypeError: no implicit conversion of String into Integer
  app/models/event.rb:625:in `new'
    meta = Bolognese::Metadata.new(input: id, from: "crossref")
  app/models/event.rb:625:in `import_doi'
    meta = Bolognese::Metadata.new(input: id, from: "crossref")
  app/jobs/other_doi_by_id_job.rb:17:in `perform'
    Event.import_doi(id, options)
  config/initializers/_shoryuken.rb:11:in `block in call'
    Raven.capture(tags: tags, extra: context) { yield }
  config/initializers/_shoryuken.rb:11:in `call'
    Raven.capture(tags: tags, extra: context) { yield }
...
(73 additional frame(s) were not displayed)
@svogt0511 svogt0511 self-assigned this Apr 25, 2022
@svogt0511
Copy link
Contributor

svogt0511 commented Apr 26, 2022

This is really 3 issues:

  1. Some Crossref DOIs are failing to import due to the reason detailed below:

It looks like line 310 of crossref_reader.rb in bolognese is signalling an exception due to the inability to handle more than one funder_identifier. Ex is: 10.1093/mnras/stab1131 which has an array of funder_identifiers for one funder. Causes dig to signal 'no implicit conversion of string to integer' when trying to get the funder_identifier and so the import fails.

image.png

  1. Crossref DOIs with multiple award_numbers for a funder only import the first award number. This is in bolognese crossref_reader.rb, lines prior to line 310. Example is doi: 10.3390/ijms23084442. This screengrab shows the original crossref metadata.

image.png

This screengrab shows the metadata in the DB:

image.png

  1. This may or may not be an issue and if it is, should be documented and fixed separately from this issue. It is related to reducing the number of AWS SQS messages in the levriero queue. That is recognizing when a request is failed and not making related requests, or killing related requests that are the result of an initial failed request. This would help alleviate the load on levriero and allow resources to be used more efficiently. For ex: datadog shows that there are a number of requests for adding 'references' metadata to the DOI 10.1093/mnras/stab1131. They take several days to work their way out of the queue, but they fail because the original doi import failed. See the DataDog log for an illustration.

Clarification: 3 is related to the overall issue documented here because the failure to import crossref doi metadata allows numerous related requests to be queued but never handled successfully due to the failure of the original request - which was to import doi metadata.

LINKS TO CROSSREF METADATA:

@richardhallett
Copy link
Contributor

So this is just a case where we've most likely got a slightly different crossref metadata case which is not being handled by bolognese.
The easiest way to fix this is to create a spec within bolognese using a fixture of some metadata that currently fails, you can then change the appropriate code to get this to work, for both 1 and 2.

For 1. I'd need to run the spec to see what values are, but something seems odd as thereis an Array.Wrap around assertion which should cope with the multiple assertions.
I'm not overly familiar with crossref metadata, but its probably all just a case of handling whether its array or not (probably more Array.wrap), be easier once you setup a spec case.

For 3. This is too big for a single issue and revolves around the architecture of EventData, I'd leave it out of this for now.

@svogt0511
Copy link
Contributor

@richardhallett, what do you mean by 'run the spec'?

@richardhallett
Copy link
Contributor

As in create a test with rspec against a fake fixture (metadata file) then you can actually do some debugging to find out what values are, even with a simple debug print out to the console.

@svogt0511 svogt0511 transferred this issue from datacite/lupo Apr 30, 2022
@svogt0511
Copy link
Contributor

svogt0511 commented May 1, 2022

This is a work-in-progress. I have committed a fix to the feature branch that will allow the Crossref XML reader to handle multiple funder_identifiers without throwing an exception. Currently it reads through all of the funder_identifiers and uses the last value. For multiple funder_identifiers, I am not sure what the final representation (in the DB) should be and what the writers find acceptable so that any translation to other metadata schemas are correct.

Same problem with award numbers. There can be multiple award numbers, but we only use the last one.

@svogt0511
Copy link
Contributor

svogt0511 commented May 2, 2022

Still a work-in-progress. I discovered a couple more issues with funder information.

  1. The xml from crossref is sending 'funderIdentifierType' as 'Organization', but crossref_reader.rb changes that to 'Crossref Funder ID'.
  2. Last funderIdentifier is the one that is picked up. If there is more than one, the rest are skipped.
  3. Last awardNumber is the one that is picked up. If there are more, the rest are skipped.
  4. If there are both funderIdentifiers and award numbers, the last thing to be read is picked up, the rest are skipped.

There are 2 tests that are included in spec/readers/crossref_reader_spec.rb that test dois which have these issues (at lines 107 and 196).

See 'Zhejiang Innovation Team Grant' in the Crossref XML for 10.3390/ijms23084442. (multiple award numbers with funder identifier)
See 'New York University University of Notre Dame' or 'Ohio State University Pennsylvania State University Universidad Nacional Autónoma de México University of Arizona University of Colorado Boulder' in the Crossref XML for 10.1093/mnras/stab1131. (multiple funder identifiers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants