Skip to content

Commit

Permalink
adds the document for SEP 0009 (SPARQL CDTs)
Browse files Browse the repository at this point in the history
  • Loading branch information
rdfguy authored and afs committed Aug 29, 2024
1 parent 7019eca commit fed9aed
Showing 1 changed file with 69 additions and 0 deletions.
69 changes: 69 additions & 0 deletions SEP/SEP-0009/sep-0009.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# SPARQL Extensions for Composite Datatypes (Lists and Maps)

* Short name: SPARQL CDTs
* SEP Number: [SEP-0009](sep-0009.md)
* Author: [Olaf Hartig](https://github.com/hartig) (joint work with [Gregory Todd Williams](https://github.com/kasei) and other members of the Amazon Neptune team)

Related repository: https://github.com/awslabs/SPARQL-CDTs

## Abstract
This proposal introduces an approach to representing generic forms of composite values (lists and maps, in particular) as literals in RDF,
and corresponding extensions of the SPARQL language.
These extensions include an aggregation function to produce such composite values,
functions to operate on such composite values in expressions,
and a new operator to transform such composite values into their individual components.

## Motivation
In contrast to many other popular data representation forms and their query languages,
RDF and SPARQL lack built-in support for generic types of composite values such as lists and maps.
Instead, RDF introduces so-called [containers](https://www.w3.org/TR/rdf-schema/#ch_containervocab) and [collections](https://www.w3.org/TR/rdf-schema/#ch_collectionvocab),
which allow users to model composite values through a dedicated vocabulary _on top_ of the core data model.
This approach has several drawbacks when compared to the alternative of representing composite values as compact, self-contained objects.
Especially, it can easily become verbose and bloat the storage footprint;
moreover, extracting information from such containers and collections in SPARQL queries is cumbersome and can even be tricky
(e.g., enumerating the elements of an RDF collection in their given order requires a complex query, if the size of the collection is not assumed to be known),
and manipulating containers and collections in SPARQL is complex
(e.g., inserting a value into an RDF collection at a particular position is all but trivial).

## Rationale and Alternatives
By building upon the RDF literal mechanism,
the proposed approach is fully compatible with RDF,
which means that it enables storage and retrieval of composite values as "black box" entities in existing triple stores, without modifications.
Of course, systems that support the approach may leverage dedicated data structures to efficiently implement the proposed language extensions for SPARQL.

An alternative would have been to extend the RDF data model with a new kind of RDF term for each type of composite values,
similar to the notion of [quoted triples](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#dfn-quoted) in RDF-star.
However, this would have been a more substantial change.

As for the extensions to SPARQL,
the proposed approach makes two additions to the syntax of the language
(in addition to providing several functions that use the [extension functions](https://www.w3.org/TR/sparql11-query/#extensionFunctions) feature of the language),
namely a new aggregate called [FOLD](https://awslabs.github.io/SPARQL-CDTs/spec/latest.html#fold)
and a new operator called [UNFOLD](https://awslabs.github.io/SPARQL-CDTs/spec/latest.html#unfold).
While, to some extent, these additions could have been captured as [extension functions](https://www.w3.org/TR/sparql11-query/#extensionFunctions) and [magic properties](https://www.w3.org/wiki/SPARQL/Extensions/Computed_Properties)
(e.g., as done in the [JenaX project](https://scaseco.github.io/jenax/); see [the JenaX function definitions](http://jsa.aksw.org/fn/)),
doing so would not allow us to cover all the functionality of the proposed approach.
For a discussion see: <https://github.com/awslabs/SPARQL-CDTs/issues/7>.


## Evidence of consensus

None yet.

## Specification:
See: https://w3id.org/awslabs/neptune/SPARQL-CDTs/spec/latest.html

## Backwards Compatibility
Queries that do not use the SPARQL extensions of the proposed approach are unaffected by the extension.

## Tests and implementations
A comprehensive collection of tests is available at https://github.com/awslabs/SPARQL-CDTs/tree/main/tests

Currently, there are two open source implementations of the proposed approach:

* We have extended the Java-based RDF programming framework [Apache Jena](https://jena.apache.org/) with a complete implementation of all the features of the proposed approach.
Currently, the source code of this extension can be found in the [UnfoldAndFoldWithCompositeValues branch](https://github.com/hartig/jena/tree/UnfoldAndFoldWithCompositeValues) of the [hartig/jena fork](https://github.com/hartig/jena/) of the official [Jena Github repository](https://github.com/apache/jena/).
We have created a [pull request to contribute this extension](https://github.com/apache/jena/pull/2501) to the Jena project.
* We have also extended the Perl-based RDF programming framework [Attean](https://github.com/kasei/attean) with a complete implementation of the approach.
Currently, the source code of this extension can be found in the [mutli-value-exprs branch](https://github.com/kasei/attean/tree/mutli-value-exprs) directly within the [Attean Github repository](https://github.com/kasei/attean),
ready to be merged after discussion with the Attean community.

0 comments on commit fed9aed

Please sign in to comment.