-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced support for lists in queries, results, and graph mutations #151
Comments
Notation3 has a number of useful List builtins that might be useful as the basis for SPARQL builtins. |
Good point, I forgot about those. |
Looks quite a bit like XPath 3.1 array functions. |
SWObjects has a notion of generators which can be used in place of a Collection. Instead of writing e.g. SELECT ?flag { ?flag :colors (:blue :white :red) } # France, maybe a few others. US is in other order. you can ask
The
and should probably include AT_LEAST. Bindings produced by a List to Rows function marked the result set as ordered (which prohibits certain optimizations). |
Access is a no-brainer. Several that come to mind: AT(?list,?index) - given a list pointer, retrieves the item at the ?index'd position, or NULL if nothing is found at that index. I'm inclined to see mutations as imperatives: MUTATION { Mutations obviously open up a larger can of worms. |
Many of th N3 list builtins could be considered as a basis for SPARQL methods. |
Wouldn't it make sense to continue to be aligned with XPath? |
Noting that @namedgraph also said
, it looks like it's time to do some homework. An apparent, but unstated semantics of the SPARQL grammar is that you can only bind new variables in a few places:
This excludes XPath operations produce sequences (maybe was node-set back in 1.0?). The core path expressions like
In order to implement any functions or operators to flatten list members, I think we first have to postulate an As far as I can see, the use of e.g. 🔴 This is the best of my recollection right now but @afs and others are likely to correct some mistakes. edits welcome. |
@ericprud XSLT/XPath 1.0 had node-sets, 2.0 introduced sequences, 3.0 introduced arrays and maps. |
Looks like this issue overlaps with #46 at least partially. |
SPARQL Nested Tables is another but not completely overlapping proposal that could solve some of the issues discussed above which could definitely help with coercing graph query results into a tabular format. |
Also seems like some relationship to #6 |
Why?
Other databases and query languages have support for list operations and returning lists of items and SPARQL does not. Creating/modifying RDF lists in a SPARQL CONSTRUCT/UPDATE can be painful. Furthermore, when variables are aggregated, the connections between aggregated variables are lost.
While SPARQL is a pattern matching language and list operations arguably aren't pattern matching operations, when it comes to returning data it would be helpful both for users and for applications to be able to receive results collected into rows, which would allow people to return meaningful rows of data and/or prevent returning duplicate data. This could help make understanding and parsing query results easier when one row per pattern match isn't desired.
Proposed Solution
Example Data
Consider the following example data:
RDF List to SPARQL Result List
It would be convenient if a function was added that explicitly interprets a URI or variable bound to a URI or blank node that represents a RDF List as the contents of the list, perhaps
AS_LIST
. This would allow the full contents of an RDF List to be returned as a single variable binding which would be convenient for post-processing. These two example queries:would both return this result:
( ex:Sally ex:Timmy ex:Lucy )
Rows to List (and Vice Versa)
It would be convenient to be able to have the same functionality provided by functions like PostgreSQL's
ARRAY_AGG
or Cypher'sCOLLECT
to transform data across multiple rows in one column into a column of lists as an aggregation operation. It would also be nice to support ordering inside the aggregate.Query:
Result:
ex:Bob
( ex:Jane ex:Sue ex:Jim )
Similarly, it would be convenient to be able to have the same functionality provided by functions like PostgreSQL's
UNNEST
or Cypher'sUNWIND
to transform data in a column of lists to a row for each element in the list. This could behave sort of likeBIND
except it could generate multiple result rows.Query:
Result:
ex:Bob
ex:Sally
ex:Bob
ex:Timmy
ex:Bob
ex:Lucy
In the above example, the list
AS_LIST(?listIdentifier)
was passed toUNWIND
. It would be convenient ifUNWIND
interprets a non-list argument that is a URI or variable bound to a URI or blank node that represents a RDF List as the contents of the list beforeUNWIND
ing (i.e. passes the argument toAS_LIST
beforeUNWIND
ing if not a list). That would enable the these two queries:to return this result:
ex:Sally
ex:Timmy
ex:Lucy
This extra convience may be dropped if it would have a negative impact on query evaluation performance.
IN
OperatorIt would be convenient to extend the
IN
operator to be able to use a list by identifier or referenced by a variable. For example, the following two queries:should both return:
true
As the current syntax expects a parenthesized expression after
IN
, putting a URI or variable after theIN
as above to notate a resource that corresponds to a RDF List should not cause ambiguity with the current syntax (in a functional sense, at least).An alternative or additional option would be to use the above
AS_LIST
first (i.e. the variable passed in was bound to a list). I don't know if this would be more or less performant depending on how many times the list was accessed as well as opportunities for short-circuiting. However, this would also enable the ability to useIN
with a list that was previouslyCOLLECT
ed, which could be very useful. If this was done, the above two queries would become the following.Creating RDF Lists in CONSTRUCT/UPDATE Queries
It is not convenient to create RDF Lists in CONSTRUCT or UPDATE Queries, especially if the length is variable. If a variable is bound to a list (such as by using
COLLECT
orAS_LIST
above), then it should be able to simply be added to the graph as an RDF List directly. For example:Query:
Result:
Replacing RDF Lists in UPDATE Queries
The same capabilities could be used to more easily replace lists in INSERT/DELETE queries. For example, suppose that in the example data the friends were accidentally swapped with the children.
Running this query:
Should result in the following data in the store:
Binding lists using
BIND
andVALUES
If a variable can have a list for a binding, then
BIND
andVALUES
should be extended to be able to supply a list as a binding directly. Consequently, libraries that make SPARQL requests would be able to pre-bind lists to variables. Furthermore, it would be possible to create a list where each element is an already bound variable.Advanced: List "Mutations" and Access
Some less critical but potentially useful features would be the ability to create new lists by modifying the contents of an existing list (such as by appending an item, inserting an item at a specified position, etc.) or accessing values in a list by index. These kind of features are (in my personal opinion) much lower priority features than
AS_LIST
/COLLECT
/UNNEST
.Others?
I'm sure there are other useful list operations that I'm not thinking of at the moment that could be useful to include.
Considerations for backward compatibility
Aside possibly for the syntax for
IN
expressions, nothing jumps out at me as affecting backwards compatibility. Perhaps someone who is more familiar with the inner workings of SPARQL algebra and query evaluation would find additional issues.The text was updated successfully, but these errors were encountered: