Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a Cache: Intermediate Query Results vs Full Query Results #19

Open
mistermboy opened this issue Jul 19, 2021 · 1 comment
Open

Comments

@mistermboy
Copy link
Member

Another approach for removing time outs and reduce the query times could be to save the query results in a cache.
At this point there are two possibilities:

  • Save the intermediate query results
  • Save the full query results

For example, for the country authors query we could just save the first query results, where we get all the people from a country, and then perform the rest of the query, or we could save the result of the whole query where we already get all the authors from a country.

For the first option, we would need to figure out a way of injecting the cached intermediate results into the query as a Solution Set in orther to execute the resulting query later . Maybe this could be useful -> https://github-wiki-see.page/m/blazegraph/database/wiki/SPARQL_Update. However, intermediate query results are not the most workload-intensive comparing to the results of the resulting query,so maybe this would not greatly improve performance.

For the other option, we would save just the final results of the query. This may look to have to save a lot of information but actually is less than for the intermediate results. For example, for the country authors query with Luxembourg as a country we get 6026 humans for the intermediate query vs 452 humans for the full query
One of the issues we may front of with this approach of saving the final results is in relation to the map views and similar. If we take this way then we are not performing queries to wikidata, so we can not exepect the results as a map or a graph. However, there is a way of drawing all these views from the query results using the wikidata query service dist (more docs abouts the result views)

@ExarcaFidalgo
Copy link
Collaborator

Created a NodeJS script which would take advantage of the partial querying to go through the configured queries and parameters and save them in MongoDB as a JSON. For each query a collection is created in the database; for each parameter, a document with the obtained data.

A trial took place with COUNTRY_AUTHORS passing as parameter all members of the European Union, using the unordered subsetting version (therefore, with unconsistencies). Such process took 5553 seconds, approximately an hour and a half to complete.

The related documents in the MongoDB database occupy 81.04 MB. Take into account that the actual size would be somewhat larger, since we are losing about 2-5% of the nodes per query.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants