Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL-friendly lists #46

Open
dbooth-boston opened this issue Dec 7, 2018 · 28 comments
Open

SPARQL-friendly lists #46

dbooth-boston opened this issue Dec 7, 2018 · 28 comments
Labels
query Extends the Query spec

Comments

@dbooth-boston
Copy link
Collaborator

It is very hard[7] to query RDF
lists, using standard SPARQL, while returning item ordering.
This inability to conveniently handle such a basic data
construct seems brain-dead to developers who have grown to
take lists for granted.

"On my wish list are . . . generic structures like nested lists as first class citizens"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0170.html

IDEA: Jena's list:index property

Apache Jena offers one potential (though non-standard)
way to ease this pain, by defining a list:index property:
https://jena.apache.org/documentation/query/rdf_lists.html

IDEA: Add lists as a fundamental concept in RDF

As proposed by David Wood and James Leigh
prior to the RDF 1.1 work.[8]
https://www.w3.org/2009/12/rdf-ws/papers/ws14

@william-vw
Copy link

+1M. See also here (issue 3): http://manu.sporny.org/2014/json-ld-origins-2/ ...

Note that it could be straightforward to add extra semantics, i.e., on top of a triple-based representation, to implement these kinds of list predicates.

@VladimirAlexiev
Copy link
Contributor

+1 . cc @azaroth42

@dbooth-boston dbooth-boston transferred this issue from w3c/EasierRDF Apr 3, 2019
@RickMoynihan
Copy link

SHACL also makes use of lists to express paths, so improving list support might make SHACL processing easier too.

@JervenBolleman JervenBolleman added protocol improving sending queries over the wire query Extends the Query spec labels Apr 4, 2019
@jaw111
Copy link
Contributor

jaw111 commented Apr 8, 2019

Would the VALUES OF syntax proposed by @cygri in #6 be appropriate here?

Example:

VALUES (?item ?idx) OF splitList(("travel" "iceland" "winter"))

The returned results are equivalent to:

VALUES (?item ?idx) {
    ("travel" 1)
    ("iceland" 2)
    ("winter" 3)
}

@cygri
Copy link

cygri commented Apr 8, 2019

@jaw111 I don't quite understand the syntax you're using here. The proposal for VALUES OF only allows normal SPARQL expressions as arguments of the multi-value function, so a list wouldn't be allowed there.

Is the intention to use it like splitList(?x) where ?x would have been earlier bound to the first blank node of a list in the active graph? So, data:

<articles/1234> ex:tagList ("travel" "iceland" "winter").

And query:

SELECT ?tag ?idx {
    <articles/1234> ex:tagList ?tags
    VALUES (?tag ?idx) OF listMembers(?tags)
}

With the result you gave. This would cover the functionality provided by Jena's list:member and list:index property functions.

@tayloj
Copy link

tayloj commented Apr 8, 2019

Some discussion on the mailing list about length-bounded property paths seems relevant too, since a path like ?list rdf:rest{n}/rdf:first ?item returns the nth element of a list (with zero-based indexing).

@jaw111
Copy link
Contributor

jaw111 commented Apr 8, 2019

@cygri you are correct, using a list there does not make much sense. Must have missed a trick earlier.

@tayloj expanding on your suggestion, how about using a variable instead of an integer for the path length? So a path like ?list rdf:rest{?n}/rdf:first ?item returns a set of solutions where the ?n variable is bound to the index.

@tayloj
Copy link

tayloj commented Apr 8, 2019

@jaw111 That'd certainly be useful, but I have no idea how feasible it is. I think there were already difficulties in implementing {n,m} quantifiers efficiently even with fixed values. Moving to a variable is probably even more complicated. But I'd definitely use it if it were available.

@TallTed
Copy link
Member

TallTed commented Apr 30, 2019

I think there were already difficulties in implementing {n,m} quantifiers efficiently even with fixed values.

FWIW, Virtuoso still supports the {n,m} property path quantifiers. (This is not a comment on the rdf:rest{?n} suggestion from @jaw111.)

@afs afs removed the protocol improving sending queries over the wire label May 1, 2019
@kasei
Copy link
Collaborator

kasei commented May 1, 2019

@TallTed does Virtuoso use the bag semantics of expanding that to a BGP/union equivalent, or the set semantics of just limiting the length of a + path?

@jaw111
Copy link
Contributor

jaw111 commented May 1, 2019 via email

@TallTed
Copy link
Member

TallTed commented May 1, 2019

does Virtuoso use the bag semantics of expanding [the {n,m} property path quantifiers] to a BGP/union equivalent, or the set semantics of just limiting the length of a + path?

@kasei - Good question, to which I don't immediately have the answer. @IvanMikhailov or @kidehen may be able to shed some light.

@kidehen
Copy link

kidehen commented May 1, 2019

@kasei ,

Are we talking about what's exemplified by the following query?

SELECT DISTINCT  * 
WHERE { 
        ?s a <http://dbpedia.org/ontology/AcademicJournal> ; 
        rdf:type{1,3} ?o 
       } 

LIMIT 50

Live Results Link.

/cc @TallTed

@kasei
Copy link
Collaborator

kasei commented May 1, 2019

@kidehen Yes, except for the DISTINCT which will mask the difference. It seems that it's using the bag semantics of BGP/union expansion, which can have some challenges with cardinality for larger values of the path quantifiers (and as I recall was one of the big issues that prevented this from being included in SPARQL 1.1).

@kidehen
Copy link

kidehen commented May 1, 2019

@kidehen Yes, except for the DISTINCT which will mask the difference. It seems that it's using the bag semantics of BGP/union expansion, which can have some challenges with cardinality for larger values of the path quantifiers (and as I recall was one of the big issues that prevented this from being included in SPARQL 1.1).

Okay, here's the query solution link without DISTINCT :)

@ktk
Copy link

ktk commented Oct 14, 2019

We use sh:in for validation of data cubes in RDF. Unfortunately it is pretty much impossible to generate such a list in SPARQL, at least I could not figure out how.

The list functions in Jena seem to be accessing lists only, not manipulating or creating them. Is there any prior work somewhere about how creating and manipulating could look like?

I have not much know how about designing such things but what I tried doing (and failed) was:

CONSTRUCT {
  <something> sh:in ( ?listMembers ) .
}

So pretty much using the Turtle collection syntax. So ?listMembers could be a normal set, if we use SELECT subquery it could also be ordered before using it in the CONSTRUCT. Also I would imagine that I can add more variables, like I can add more entries in Turtle syntax.

Am I completely missing something here that prevents this approach from working?

There is obviously more missing, like removing an entry and adding a new entry but I'm not sure how much of it is realistic in a language like SPARQL.

By the way why is this called collection in Turtle and not list?

@ktk
Copy link

ktk commented Oct 14, 2019

Bob DuCharme had a blog post that showed some standard manipulations. Works to some extend but is not really nice form syntactic sugar point of view http://www.snee.com/bobdc.blog/2014/04/rdf-lists-and-sparql.html

@ktk
Copy link

ktk commented Oct 15, 2019

@afs
Copy link
Collaborator

afs commented Oct 15, 2019

From some time ago: https://afs.github.io/rdf-lists-sparql . Lesson - it's painful.

One avenue is to add to the basic SPARQL data model - lists and sets (and paths) - beyond RDF terms. This is a large change, including result set formats, but I think it is worth exploring.

@ericprud
Copy link
Member

In SWObjects, I extended triple pattern matching with some generators. One of those was "MEMBERS(?var)" (example use) which joined the current binding with the argument (?var above) bound to each member of the list.

I mentioned it to Lee F during the SPARQL 1.1 WG and he said the syntax give him hives. I used this a lot, especially from the command line to e.g. sequentially walk test manifest entries, with no skin conditions that couldn't be explained by prolonged puberty.

@TallTed
Copy link
Member

TallTed commented Oct 21, 2019

@ktk

By the way why is this called collection in Turtle and not list?

List is most commonly understood to mean ordered list, while collection is most commonly understood to mean unordered list. (Yes, both list and collection may have both ordered and unordered variants, but the most common intuitive default is as I said.) Unordered membership is far easier to handle due to various other aspects of RDF and DBMS, and for many reasons (not least being WG time constraints) that ease was important in the development of these specs.

@JervenBolleman
Copy link
Collaborator

JervenBolleman commented Oct 23, 2019

Stardog talked about a possible extension at least to have a list equivalent to group_concat which would affect the result formats more than anything else.

@ktk
Copy link

ktk commented Oct 25, 2019

@JervenBolleman that is an interesting one, thanks. We have a workaround where we create lists in a coded step after concatenating them with GROUP_CONCAT in SPARQL so that feels very natural to me. Some questions based on that proposal:

  • how would this be handled in CONSTRUCT, simply as a collection?
  • Could I "concat" arrays?
  • wouldn't it make sense to have something like slice() (see MDN) instead of get()?

@albertmeronyo
Copy link

As per @ktk 's suggestion I'm linking here the slides I used today at ISWC to talk about our work on RDF Lists: https://www.slideshare.net/albertmeronyo/modelling-and-querying-lists-in-rdf-a-pragmatic-study

I went into the presentation unaware of this thread :-) So I just subscribed cc/ @enridaga

@ktk
Copy link

ktk commented Dec 11, 2021

Just noticed that Stardog provides nice basic member functions for lists, I like what I see https://docs.stardog.com/query-stardog/#rdf-list-functions

@ericprud
Copy link
Member

Just noticed that Stardog provides nice basic member functions for lists, I like what I see https://docs.stardog.com/query-stardog/#rdf-list-functions

It seems to me that if you have the freedom to extend SPARQL, there are good reasons to write these as operators in the query language rather than as magic predicates embedded in triple patterns:

  1. leverage syntax and function composition, e.g. BIND (LENGTH(:literalList) AS ?length) instead of :literalList stardog:list:length ?length. The former can be combined with any other function available in SPARQL 1.2.
  2. separate SPARQL operations from asserted triples. The magic triple representation is shorter, but it can be easily missed when nestled in with a bunch of triple constraints which correspond to asserted triples. In addition to aiding human recognition, it will be easier to verify completeness of query re-writers (e.g. SPARQL to SQL) if these operations have their own syntactic constructs.
  3. reject unsupported queries. A SPARQL 1.1 engine will reject a query with a LENGTH operator while it would silently fail to match a query with a stardog:list:length predicate.

One advantage to magic predicates is that such a query can pass seamlessly through a naive SPARQL pipeline processor (e.g. a tool which parses the query for bound variables, issues it verbatim, and renders the results in a nice HTML table). Unless SPARQL 1.2 were committed to being syntactically compatible with SPARQL 1.1, I don't think syntactic compatibility of list features compensates for the advantages of SPARQL list operators.

@namedgraph
Copy link

Pat Hayes on first-class list semantics (or the lack of it):

https://lists.w3.org/Archives/Public/semantic-web/2022Sep/0001.html

@VladimirAlexiev
Copy link
Contributor

Use case: convert SHACL prop attachmetns to domain/range.

Very easy to do for schema:domainIncludes, schema:rangeIncludes because these are polymorphic (multivalued):

insert {
  ?prop schema:domainIncludes ?domain; schema:rangeIncludes ?range
} where {
  {[a sh:NodeShape; sh:property/sh:path ?prop; sh:targetClass ?domain]} union
  {[a sh:PropertyShape; sh:path ?prop; sh:class|sh:datatype ?range]} 
}

Much harder to do for RDFS+OWL because one needs to construct lists, eg

:propP   rdfs:domain         [a owl:Class; owl:unionOf (:classX :classY :classZ)].

@jaw111's example https://gist.github.com/jaw111/1b149fd1111f774a3613f10955686617 shows how to do a similar thing (but produces SHACL as final result, and I think it's a bit erroneous).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query Extends the Query spec
Projects
None yet
Development

No branches or pull requests