Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some funderIdentifiers are not indexed in ElasticSearch #142

Open
KellyStathis opened this issue Aug 24, 2022 · 2 comments · Fixed by #143
Open

Some funderIdentifiers are not indexed in ElasticSearch #142

KellyStathis opened this issue Aug 24, 2022 · 2 comments · Fixed by #143
Assignees
Labels

Comments

@KellyStathis
Copy link

Describe the bug

Expected Behaviour

If a funderIdentifier is in the XML, it should be indexed in ElasticSearch and therefore show in the REST API response.

Current Behaviour

Some funderIdentifiers are not indexed.

This was identified because there are more records with funderIdentifierType than funderIdentifier in the index, even though funderIdentifierType cannot exist without funderIdentifier.

Steps to Reproduce

  1. Go to https://api.datacite.org/dois/10.5285/8e59f849-5b93-438e-a5e0-3c65636f9053 - see fundingReference is missing funderIdentifier:
{
"awardUri": "http://gotw.nerc.ac.uk/list_full.asp?pcode=NE%2FL002434%2F1",
"awardTitle": "NERC GW4+ Doctoral Training Partnership studentship",
"funderName": "Natural Environment Research Council, UK Research & Innovation",
"awardNumber": "NE/L002434/1",
"funderIdentifierType": "ISNI"
}
  1. Decode the XML from base64 - it shows this:
<fundingReference>
      <funderName>Natural Environment Research Council, UK Research &amp; Innovation</funderName>
      <funderIdentifier funderIdentifierType="ISNI">0000 0001 2181 0377</funderIdentifier>
      <awardNumber awardURI="http://gotw.nerc.ac.uk/list_full.asp?pcode=NE%2FL002434%2F1">NE/L002434/1</awardNumber>
      <awardTitle>NERC GW4+ Doctoral Training Partnership studentship</awardTitle>
    </fundingReference>

Context (Environment)

Screenshots

n/a

Further details

n/a

Proposal

Hypothesis

As @richardhallett identified, it looks like funderIdentifier is validating for a URL in bolognese:
https://github.com/datacite/bolognese/blob/master/lib/bolognese/readers/datacite_reader.rb#L161. Because this funderIdentifier is not a URL, it is excluded.

Possible Implementation

Ideally ISNIs would be entered as URLs, e.g.: https://isni.org/isni/0000000121810377

However, this is not enforced by the XSD. When funderIdentifiers are not entered as URLs, our parsing could be more forgiving so as not to exclude them from indexing entirely.

@KellyStathis
Copy link
Author

Related: #137 where there was discussion of normalization logic for ISNI.

@codycooperross
Copy link
Contributor

This bug is fixed for new metadata imports. Existing indexed metadata without funderIdentifiers will need to be reimported for the fix to be reflected in the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment