Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for review metric: Gen2_FM_F3.md #38

Open
markwilkinson opened this issue Feb 20, 2019 · 9 comments
Open

Call for review metric: Gen2_FM_F3.md #38

markwilkinson opened this issue Feb 20, 2019 · 9 comments
Labels
bug Metric Proposal A user-submitted proposal for a metric - to be reviewed by all

Comments

@markwilkinson
Copy link
Member

please review this new 2nd gen metric

@markwilkinson markwilkinson added the Metric Proposal A user-submitted proposal for a metric - to be reviewed by all label Feb 20, 2019
@markwilkinson
Copy link
Member Author

The test for schema:mainEntity is not valid. mainEntity points to a block of metadata that DOES NOT necessarily contain the identifier.

@markwilkinson
Copy link
Member Author

markwilkinson commented Mar 25, 2019

A better test would be mainEntity -> identifier (or one of the subclasses: accountId
confirmationNumber
duns
flightNumber
globalLocationNumber
gtin12
gtin13
gtin14
gtin8
isbn
issn
legislationIdentifier
leiCode
orderNumber
productID
serialNumber
sku
taxID)

@DanBerrios
Copy link

@markwilkinson Hi Mark. We register our dataset DOIs with DataCite and pass to DataCite various dataset metadata at time of registration (currently using schema.org predicates). We do not pass to DataCite any of the predicates this metric is searching for, and thus our records are failing this metric. Where did you get the list of valid and required predicates for the dataset type of objects? I am looking in the Nature Sci Data guidance in https://www.nature.com/articles/s41597-019-0031-8.pdf and don't see the the schema.org predicates tested by this metric in their example dataset metadata or explicitly mentioned in their recommendations @jlbales

@markwilkinson
Copy link
Member Author

Hi Dan,

Anything that has a DOI should pass this test! There may be something else failing... can you send me an example of a DOI you find is failing this test?

That article makes good suggestions, and the test follows those suggestions (and more!). Unfortunately, there is no such thing as a 'list of valid predicates', since nobody has the authority to say what is 'valid'. As such, my list comes from a survey of what people are using "in the real world". I make no claim to validity... I only claim that, based on usage, an agent that was looking for data would usually be able to find it if it looked for a predicate on that list.

Please send me an example of what you are seeing, and I will try to troubleshoot the test.

Cheers!

@DanBerrios
Copy link

@markwilkinson Sure: see https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/evaluations/5118 If you look at the test for F3 output, the last part says:

FAILURE: Was unable to locate the data identifier in the metadata using any (common) property/predicate reserved for this purpose. Tested the following ["http://www.w3.org/ns/ldp#contains", "http://xmlns.com/foaf/0.1/primaryTopic", "http://purl.obolibrary.org/obo/IAO_0000136", "http://purl.obolibrary.org/obo/IAO:0000136", "https://www.w3.org/ns/ldp#contains", "https://xmlns.com/foaf/0.1/primaryTopic", "http://schema.org/mainEntity", "http://schema.org/codeRepository", "http://schema.org/distribution", "https://schema.org/mainEntity", "https://schema.org/codeRepository", "https://schema.org/distribution", "http://www.w3.org/ns/dcat#distribution", "https://www.w3.org/ns/dcat#distribution", "http://www.w3.org/ns/dcat#dataset", "https://www.w3.org/ns/dcat#dataset", "http://www.w3.org/ns/dcat#downloadURL", "https://www.w3.org/ns/dcat#downloadURL", "http://www.w3.org/ns/dcat#accessURL", "https://www.w3.org/ns/dcat#accessURL", "http://semanticscience.org/resource/SIO_000332", "http://semanticscience.org/resource/is-about", "https://semanticscience.org/resource/SIO_000332", "https://semanticscience.org/resource/is-about", "https://purl.obolibrary.org/obo/IAO_0000136"]

That is the list of predicates I was referring to as being checked. We lack embedded metadata on our page and it looks from this output like we need to use at least one of those predicates when we do embed the metadata for the DOI and the DOI itself on the page.

@markwilkinson
Copy link
Member Author

Yes, I see. you're injecting data/metadata via script, and the DOI provider has no information at all.

Unfortunately, there's not much I can do to resolve this problem... I'm not inclined to train my harvester to run scripts, since it explores arbitrary pages and isn't in such a protected space as a browser.

Note that the predicates it is searching for (the list you copy/paste above) are the predicates that point at the data (your CEL.gz records on that page). The DOI, which should also appear somewhere in the page, would require a different predicate (likely schema:identifier or dc:identifier)

Sorry I can't help more!

@DanBerrios
Copy link

@markwilkinson Ugh, I should have explained before asking you to take a look. Yes, we have not yet embedded our metadata on the dataset landing page, but we are planning to do that very soon. DataCite, the DOI provider, DOES in fact have the metadata for the dataset associated with this DOI (you can see it here: https://api.datacite.org/dois/application/vnd.datacite.datacite+json/10.26030/cwan-7h58 ), but our choice of schema.org predicates that we currently give to DataCite doesn't include any from the list that the output from the test of this F3 metric says it is looking for (e.g., we don't have schema.org:mainEntity predicate). What I was asking was if the listing of the predicates in the output of the test is from set of published predicates required to pass F3-based tests.

@markwilkinson
Copy link
Member Author

Interesting... it looks like several of the DataCite content types are not responding at the moment - if you request turtle or rdf/xml, it fails, but if you request json-ld it succeeds. that's why I thought it wasn't providing any metadata at all!

Yes, if you're using schema, then mainEntity is one of the few choices (there are other choices for e.g. code repositories, but not for data)

Cheers!

@DanBerrios
Copy link

@markwilkinson Where did you get the list of predicates this metric test is testing? ...can you provide the reference? I don't see schema.org:mainEntity in the Nature citation roadmap paper for any types including dataset types....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Metric Proposal A user-submitted proposal for a metric - to be reviewed by all
Projects
None yet
Development

No branches or pull requests

2 participants