Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idot:AccessPattern #100

Open
micheldumontier opened this issue Jan 26, 2015 · 8 comments
Open

idot:AccessPattern #100

micheldumontier opened this issue Jan 26, 2015 · 8 comments

Comments

@micheldumontier
Copy link
Member

On the call today we discussed concerns about the suitability of the formulation of the idot:AccessPattern. In particular, we are concerned that appending the idot:identifierPattern to the idot:accessPattern is underspecified and could lead to errors.
Let's take the Gene Ontology (http://identifiers.org/go/) as an example. The idot:identifierPattern is ^GO:\d{7}$

This identifier pattern does not work for the original ontology URI, which is of the form
http://purl.obolibrary.org/obo/GO_\d+$

This identifier pattern is not correct for Bio2RDF, as 'GO' should be lowercase 'go' - really the correct access pattern should be a regex of the form http://bio2rdf.org/go:\d{7}+$

I propose that somewhere in the instance of an access pattern is a predicate that specifies the regex pattern.

@micheldumontier
Copy link
Member Author

sent email to identifiers.org group on Jan 26. no response as of yet.

@perkeo
Copy link
Contributor

perkeo commented Feb 18, 2015

Hi,

Sorry about the delay in response, and thank you for the reminder!

Firstly, going back to the example of Gene Ontology that you gave: Gene Ontology defines their identifier as being equivalent to the 'GlobalID', which constitutes a 'GO' prefix, and a numerical 'LocalID', separated by a colon [1]. This identifier is used by official Gene Ontology Resources [2], by both BioPortal and OLS, and is by far the most prevalent form found in publications and cross-references.

The OBO Foundry have a policy for the creation of URIs [3], which dictates the transformation of the colon into an underscore. While this policy is not enforced, it is recommended (when using URIs). Hence there are a mixture of ontologies who do or do not implement this.

Anyway, for the Identifiers.org registry, our aim is to store the regular expression reflecting the identifiers assigned by the data provider. If no documentation is available describing identifier strategies, we make an informed decision based on existing identifiers and common practice within the user community.

We originally captured this pattern for our own use, for example to provide users information on potentially malformed URIs. Of course, if we can extend this feature to be more useful to the community at large, then we would encourage them to give us feedback. If there is a clear and demonstrable need from our users to store identifier patterns at the level of individual resources and identification schemes, then we can add it to our roadmap for future development.

However, as far as I understand, all this should not impact the dataset description document, as far as the definition of the idot terms you wish to use is clear and cover the needs.

Cheers,

[1] http://wiki.geneontology.org/index.php/Identifiers
[2] http://amigo.geneontology.org/amigo/term/GO:0006915 (official GO resource)
[3] http://www.obofoundry.org/id-policy.shtml

@AlasdairGray
Copy link

I don't think that we should be focusing on the GO example here. What we are really looking for is a property which allows for the specification of the complete URI pattern where the regex is used to capture the identifier part.

As I understand it, identifiers.org make use of two properties – idot:accessPattern and idot:identifierPattern – to construct the URI. However, we have no formal way of specifying that the two properties need to be spliced together.

I think that what we are looking for is a single property that would allow for the specification of the whole pattern as a regex; something like

:chembl xxx:accessIdentifierPattern "^http://rdf.ebi.ac.uk/resource/chembl/CHEMBL\\d+" .

VoID's void:uriRegexPattern doesn't quite meet our needs since

  1. It entails that the data must be in RDF and we might be linking to a web page
  2. It focuses on the access pattern part rather than the identifier part

@micheldumontier
Copy link
Member Author

+1

@AlasdairGray AlasdairGray added this to the Publication milestone Mar 2, 2015
@micheldumontier
Copy link
Member Author

@perkeo . We discussed the issue and have proposed an idot:accessIdentifierPattern as an attribute to an instance of the idot:AccessPattern. see the commit c30d6c9

@micheldumontier
Copy link
Member Author

@perkeo would you be able to add an entry to the identifiers.org ontology document?

@AlasdairGray
Copy link

@micheldumontier Do we need to have the following in the example?

<http://www.ebi.ac.uk/chembl/compound/inspect/>
    idot:primarySource true ;
    dct:format "text/html" ;
    dct:publisher <http://www.ebi.ac.uk> ;
    idot:accessIdentifierPattern "^http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://identifiers.org/chembl.compound/>
    dct:format "text/html" ;
    idot:accessIdentifierPattern "^http://identifiers.org/chembl.compound/CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://bio2rdf.org/chembl:>
    dct:format "application/rdf+xml" ;
    dct:publisher <http://bio2rdf.org> ;
    idot:accessIdentifierPattern "^http://bio2rdf.org/chembl:CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://linkedchemistry.info/chembl/chemblid>
    dct:format "application/rdf+xml" ;
    idot:accessIdentifierPattern "^http://linkedchemistry.info/chembl/CHEMBL\\d+" ;
    a idot:AccessPattern .

@micheldumontier
Copy link
Member Author

for completeness, yes, we should include.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford
University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

On Mon, Mar 9, 2015 at 2:15 PM, Alasdair Gray notifications@github.com
wrote:

@micheldumontier https://github.com/micheldumontier Do we need to have
the following in the example?

http://www.ebi.ac.uk/chembl/compound/inspect/
idot:primarySource true ;
dct:format "text/html" ;
dct:publisher http://www.ebi.ac.uk ;
idot:accessIdentifierPattern "^http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL\\d+ http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL%5C%5Cd+" ;
a idot:AccessPattern .

http://identifiers.org/chembl.compound/
dct:format "text/html" ;
idot:accessIdentifierPattern "^http://identifiers.org/chembl.compound/CHEMBL\\d+ http://identifiers.org/chembl.compound/CHEMBL%5C%5Cd+" ;
a idot:AccessPattern .

http://bio2rdf.org/chembl:
dct:format "application/rdf+xml" ;
dct:publisher http://bio2rdf.org ;
idot:accessIdentifierPattern "^http://bio2rdf.org/chembl:CHEMBL\\d+ http://bio2rdf.org/chembl:CHEMBL%5C%5Cd+" ;
a idot:AccessPattern .

http://linkedchemistry.info/chembl/chemblid
dct:format "application/rdf+xml" ;
idot:accessIdentifierPattern "^http://linkedchemistry.info/chembl/CHEMBL\\d+ http://linkedchemistry.info/chembl/CHEMBL%5C%5Cd+" ;
a idot:AccessPattern .


Reply to this email directly or view it on GitHub
#100 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants