Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to support templated example queries #32

Open
dssib opened this issue Aug 28, 2024 · 9 comments
Open

Extension to support templated example queries #32

dssib opened this issue Aug 28, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@dssib
Copy link
Collaborator

dssib commented Aug 28, 2024

It would be useful to be able to include example SPARQL templated queries, allowing contributors to point to queries that can populate templates (e.g. species names), similar to what we use in the BioQuery interface: https://biosoda.expasy.org/bioquery-dbgi/?search=Q2

@dssib dssib added the enhancement New feature or request label Aug 28, 2024
@dssib dssib changed the title Extension to support template queries (similar to Extension to support templated example queries Aug 28, 2024
@vemonet
Copy link
Member

vemonet commented Sep 3, 2024

If we go this road we might want to adapt the "standard" used by https://grlc.io: add a _ after the ? of the templated variable, e.g. ?_specie_iri

This should not break any of the existing tests (since the templated variable will be considered as a regular variable by parsers)

But we would need to figure out some predicate that point to the query used to populate the templated variable. And an intermediary object that link the SPARQL query used for completion to the variable ID (in my example ?_specie_iri)

Example templated enumeration query with grlc: https://github.com/CLARIAH/grlc-queries/blob/master/enumerate.rq

@vemonet
Copy link
Member

vemonet commented Sep 5, 2024

Complete example of what it could look like, ex:001 would be in its own separated file

@prefix ex: <https://www.bgee.org/sparql/.well-known/sparql-examples/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:030 a sh:SPARQLExecutable,
        sh:SPARQLSelectExecutable ;
    rdfs:comment "Anatomical entities for ?species at the young adult developmental stage"@en ;
    sh:prefixes _:sparql_examples_prefixes ;
    sh:select """PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT DISTINCT ?anatomicalEntity ?stageName {
    ?condition genex:hasAnatomicalEntity ?anatEntity ;
        genex:hasDevelopmentalStage ?stage ;
        obo:RO_0002162 ?species .
    ?anatEntity rdfs:label ?anatomicalEntity .
    ?stage rdfs:label ?stageName .
    FILTER ( CONTAINS(lcase(?stageName), "young adult") )
}""" ;
    schema:target <httpspeciess://www.bgee.org/sparql/> .


ex:030_001 a ex:TemplatedQueryLink ;
    ex:templatedQuery ex:030 ;
    ex:variableInTemplatedQuery "?species" ;
    ex:getValueFromDatasourceQuery ex:001 ;
    ex:variableInDatasourceQuery "?species" ;
    ex:labelVariableInDatasourceQuery "?commonName" .


ex:001 a sh:SPARQLExecutable,
        sh:SPARQLSelectExecutable ;
    rdfs:comment "What are the species present in Bgee?"@en ;
    sh:prefixes _:sparql_examples_prefixes ;
    sh:select """PREFIX up: <http://purl.uniprot.org/core/>

SELECT ?species ?commonName WHERE {
    ?species a up:Taxon ;
        up:rank up:Species ;
        up:commonName ?commonName .
}""" ;
    schema:target <https://sparql.uniprot.org/sparql/>,
        <https://www.bgee.org/sparql/>, <https://sparql.omabrowser.org/sparql/> .

labelVariableInDatasourceQuery would be optional (used so that we can show a human readable label to the user in the template, but still letting us do the matching on URIs in the back)

This could still be tested with current code, we just need to add a shape to test for the ex:TemplatedQueryLink. If we want to be thorough we could add some custom checks to see if the ex:variableInTemplatedQuery, ex:variableInDatasourceQuery and ex:labelVariableInDatasourceQuery are actually present in the targeted queries.

Feel free to propose better options for the type/predicates :)

@mpagni12
Copy link
Collaborator

mpagni12 commented Oct 2, 2024

If we go this road we might want to adapt the "standard" used by https://grlc.io: add a _ after the ? of the templated variable, e.g. ?_specie_iri

This should not break any of the existing tests (since the templated variable will be considered as a regular variable by parsers)

But we would need to figure out some predicate that point to the query used to populate the templated variable. And an intermediary object that link the SPARQL query used for completion to the variable ID (in my example ?_specie_iri)

Example templated enumeration query with grlc: https://github.com/CLARIAH/grlc-queries/blob/master/enumerate.rq

IMHO, the critical point is that queries must remain valid SPARQL syntax in any case. This precludes syntax extension, e.g. $$VARIABLE. The ?_ notation is ok in this respect. But, the ?_ notation forces us to decide which variable could/must be replaced at the time of writing the query, which may not scale well while composing many queries together. Hence I would not request the ?_ notation to be mandatory.

@mpagni12
Copy link
Collaborator

mpagni12 commented Oct 2, 2024

rdfs:comment "Anatomical entities for ?species at the young adult developmental stage"@en ;

This is not an ordinary comment as ?species is meant to be interpreted. I would create a new property to account for this, say

ex:commentToBeInterpretedBySIBTools rdfs:subPropertyOf rdfs:comment .

In general I am of the opinion that one should be extra careful in recycling existing vocabulary.
In doubt, new classes and properties should be created, relying on rdfs:subClassOf and rdfs:subPropertyOf or their owl friends.

@mpagni12
Copy link
Collaborator

mpagni12 commented Oct 2, 2024

ex:030_001 a ex:TemplatedQueryLink ;
    ex:templatedQuery ex:030 ;
    ex:variableInTemplatedQuery "?species" ;
    ex:getValueFromDatasourceQuery ex:001 ;
    ex:variableInDatasourceQuery "?species" ;
    ex:labelVariableInDatasourceQuery "?commonName" .

mmmh

ex:030_001 a ex:TemplatedQueryLink ;
    ex:templatedQuery ex:030 ;
    ex:templatedVariable [
        ex:variableInTemplatedQuery "?species" ;
        ex:getValueFromDatasourceQuery ex:001 ;
        ex:variableInDatasourceQuery "?species" ; # to serve as a unique key, not visible in the UI
        ex:labelVariableInDatasourceQuery "?commonName" . # visible to the enduser, and possibly include some HTML
   ], 
   [ ... another independant variable ]

The case of dependent variables might be considered: Referring different variables from the same SPARL query will restrict the allowed combinations.

@tarcisiotmf
Copy link
Collaborator

I think it would be better to distinguish such template queries from the others. Then we could define a new type ex:SPARQLExecutableTemplate subclass of sh:SPARQLExecutable. I would also vote for using "template" instead of templated (since, originally, template is not a verb but a noun).

@mpagni12
Copy link
Collaborator

mpagni12 commented Oct 7, 2024

I think it would be better to distinguish such template queries from the others. Then we could define a new type ex:SPARQLExecutableTemplate subclass of sh:SPARQLExecutable. I would also vote for using "template" instead of templated (since, originally, template is not a verb but a noun).

As long as the syntax of SPARQL queries is not "extended", I don't see a good reason to distinguish template queries from "regular" queries. A regular query from today may become a template query of tomorrow, without any change.

On the other hand, I am also in favour of defining a new dedicated type, a subclass of sh:SPARQLExecutable. It can be seen as a preventive measure not to mess up with later use of shacl in the same triplestore.

@tarcisiotmf
Copy link
Collaborator

tarcisiotmf commented Oct 8, 2024

The reason is that both are of different nature. for example, the question (description) would have a variable for a template ( "Anatomical entities in ?species...") what should not be the case for a traditional use of a question answer system. Therefore, it might require some extra preprocessing to get them right. Moreover, IMO they are different use cases (that can be complementary) - pairs of questions and queries, and query templates and question templates, then a clear distinction will be better for this reason. For instance, from one template query severals questions/SPARQL queries can be derived, etc , Even a template query can be "templated" differently. For the example above , "Anatomical entities for ?species at the young adult developmental stage" or "Anatomical entities for ?species at developmental ?stage.", the latter with two variables.

@mpagni12
Copy link
Collaborator

You are right with the description part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants