Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export all federated queries to create a real-world benchmark for federated queries #40

Open
vemonet opened this issue Oct 10, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@vemonet
Copy link
Member

vemonet commented Oct 10, 2024

This repository contains a lot of complex federated queries to large endpoints.

It would be interesting to provide some instructions to easily export all federated queries to constitute a benchmark that could be used by federated query systems.

Another comparable benchmark would be: https://github.com/dice-group/LargeRDFBench

But this benchmark would provide queries that are actually used in the real world.

@vemonet vemonet added documentation Improvements or additions to documentation good first issue Good for newcomers labels Oct 10, 2024
@vemonet vemonet self-assigned this Oct 10, 2024
@constraintAutomaton
Copy link

constraintAutomaton commented Oct 28, 2024

I made this script to extract the queries @vemonet .

https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor

I changed the queries provided in the repo because they do not seem to work with the data model. I used this one instead.

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?queryID ?federatedEndpoint ?comment ?query ?target  WHERE {
  ?queryID sh:select ?query .
  ?queryID spex:federatesWith ?federatedEndpoint .
  ?queryID rdfs:comment ?comment .
  ?queryID <https://schema.org/target> ?target
}

At least on my side no queries had more than one <https://schema.org/target> and spex:federatesWith seems to be matching the number of endpoint in the federation.

  • Query edited because I was getting the ones where the federation was at least 3 instead of 2.

@constraintAutomaton
Copy link

Maybe, I can document how I've done it and provide my repo as an example, after some cleanup. Unless, I made a mistake somewhere.

@vemonet
Copy link
Member Author

vemonet commented Oct 29, 2024

Thanks @constraintAutomaton that's nice! A few remarks:

  • You forgot to also add the endpoint URL of the main endpoint on which the query is expected to run
  • It would be better to put all queries under a specific key, so we can directly iterate over them without having to filter out the metadata key
  • It seems like you are using the old convertToOneTurtle.sh bash script to compile all queries (https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor/blob/main/init.sh), I would recommend to use the sparql-examples-utils.jar like documented in the README.md
  • This one is more of a detail but maybe use federatesWith instead of federatedEndpoint, to make it more consistent with the currently used predicate

Something a bit like:

{
  "queries": [ 
    {
    "uri": "https://www.bgee.org/sparql/.well-known/sparql-examples/020",
    "endpoint": "https://www.bgee.org/sparql/",
    "query": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX up: <http://purl.uniprot.org/core/>\nPREFIX genex: <http://purl.org/genex#>\nPREFIX obo: <http://purl.obolibrary.org/obo/>\nPREFIX orth: <http://purl.org/net/orth#>\nPREFIX dcterms: <http://purl.org/dc/terms/>\nPREFIX sio: <http://semanticscience.org/resource/>\n\nSELECT DISTINCT ?flyEnsemblGene ?orthologTaxon ?orthologEnsemblGene ?orthologOmaLink WHERE {\n\t{\n        SELECT DISTINCT ?gene ?flyEnsemblGene {\n        ?gene a orth:Gene ;\n            genex:isExpressedIn/rdfs:label 'eye' ;\n            orth:organism/obo:RO_0002162 ?taxon ;\n            dcterms:identifier ?flyEnsemblGene .\n        ?taxon up:commonName 'fruit fly' .\n        } LIMIT 100\n    }\n    SERVICE <https://sparql.omabrowser.org/sparql> {\n        ?protein2 a orth:Protein .\n        ?protein1 a orth:Protein .\n        ?clusterPrimates a orth:OrthologsCluster .\n        ?cluster a orth:OrthologsCluster ;\n            orth:hasHomologousMember ?node1 ;\n            orth:hasHomologousMember ?node2 .\n        ?node1 orth:hasHomologousMember* ?protein1 .\n        ?node2 orth:hasHomologousMember* ?clusterPrimates .\n        ?clusterPrimates orth:hasHomologousMember* ?protein2 .\n        ?protein1 sio:SIO_010079 ?gene . # is encoded by\n        ?protein2 rdfs:seeAlso ?orthologOmaLink ;\n            orth:organism/obo:RO_0002162 ?orthologTaxonUri ;\n            sio:SIO_010079 ?orthologGene . # is encoded by\n        ?clusterPrimates orth:hasTaxonomicRange ?taxRange .\n        ?taxRange orth:taxRange 'Primates' .\n        FILTER ( ?node1 != ?node2 )\n    }\n    ?orthologTaxonUri up:commonName ?orthologTaxon .\n    ?orthologGene dcterms:identifier ?orthologEnsemblGene .\n}",
    "description": "Which are the genes in Primates orthologous to a gene that is expressed in the fruit fly's eye?",
    "federatesWith": [
      "https://www.bgee.org/sparql/",
      "https://sparql.omabrowser.org/sparql"
    ],
    }
    ...
  ],
  "metadata": ...
  },

@constraintAutomaton
Copy link

Thanks @vemonet! I've made the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants