From fed9aed05c0bd1811946e1feb9831b67f60fe558 Mon Sep 17 00:00:00 2001 From: Ora Lassila Date: Wed, 28 Aug 2024 15:19:52 -0400 Subject: [PATCH] adds the document for SEP 0009 (SPARQL CDTs) --- SEP/SEP-0009/sep-0009.md | 69 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 SEP/SEP-0009/sep-0009.md diff --git a/SEP/SEP-0009/sep-0009.md b/SEP/SEP-0009/sep-0009.md new file mode 100644 index 0000000..2f257c1 --- /dev/null +++ b/SEP/SEP-0009/sep-0009.md @@ -0,0 +1,69 @@ +# SPARQL Extensions for Composite Datatypes (Lists and Maps) + +* Short name: SPARQL CDTs +* SEP Number: [SEP-0009](sep-0009.md) +* Author: [Olaf Hartig](https://github.com/hartig) (joint work with [Gregory Todd Williams](https://github.com/kasei) and other members of the Amazon Neptune team) + +Related repository: https://github.com/awslabs/SPARQL-CDTs + +## Abstract +This proposal introduces an approach to representing generic forms of composite values (lists and maps, in particular) as literals in RDF, +and corresponding extensions of the SPARQL language. +These extensions include an aggregation function to produce such composite values, +functions to operate on such composite values in expressions, +and a new operator to transform such composite values into their individual components. + +## Motivation +In contrast to many other popular data representation forms and their query languages, +RDF and SPARQL lack built-in support for generic types of composite values such as lists and maps. +Instead, RDF introduces so-called [containers](https://www.w3.org/TR/rdf-schema/#ch_containervocab) and [collections](https://www.w3.org/TR/rdf-schema/#ch_collectionvocab), +which allow users to model composite values through a dedicated vocabulary _on top_ of the core data model. +This approach has several drawbacks when compared to the alternative of representing composite values as compact, self-contained objects. +Especially, it can easily become verbose and bloat the storage footprint; +moreover, extracting information from such containers and collections in SPARQL queries is cumbersome and can even be tricky +(e.g., enumerating the elements of an RDF collection in their given order requires a complex query, if the size of the collection is not assumed to be known), +and manipulating containers and collections in SPARQL is complex +(e.g., inserting a value into an RDF collection at a particular position is all but trivial). + +## Rationale and Alternatives +By building upon the RDF literal mechanism, +the proposed approach is fully compatible with RDF, +which means that it enables storage and retrieval of composite values as "black box" entities in existing triple stores, without modifications. +Of course, systems that support the approach may leverage dedicated data structures to efficiently implement the proposed language extensions for SPARQL. + +An alternative would have been to extend the RDF data model with a new kind of RDF term for each type of composite values, +similar to the notion of [quoted triples](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#dfn-quoted) in RDF-star. +However, this would have been a more substantial change. + +As for the extensions to SPARQL, +the proposed approach makes two additions to the syntax of the language +(in addition to providing several functions that use the [extension functions](https://www.w3.org/TR/sparql11-query/#extensionFunctions) feature of the language), +namely a new aggregate called [FOLD](https://awslabs.github.io/SPARQL-CDTs/spec/latest.html#fold) +and a new operator called [UNFOLD](https://awslabs.github.io/SPARQL-CDTs/spec/latest.html#unfold). +While, to some extent, these additions could have been captured as [extension functions](https://www.w3.org/TR/sparql11-query/#extensionFunctions) and [magic properties](https://www.w3.org/wiki/SPARQL/Extensions/Computed_Properties) +(e.g., as done in the [JenaX project](https://scaseco.github.io/jenax/); see [the JenaX function definitions](http://jsa.aksw.org/fn/)), +doing so would not allow us to cover all the functionality of the proposed approach. +For a discussion see: . + + +## Evidence of consensus + +None yet. + +## Specification: +See: https://w3id.org/awslabs/neptune/SPARQL-CDTs/spec/latest.html + +## Backwards Compatibility +Queries that do not use the SPARQL extensions of the proposed approach are unaffected by the extension. + +## Tests and implementations +A comprehensive collection of tests is available at https://github.com/awslabs/SPARQL-CDTs/tree/main/tests + +Currently, there are two open source implementations of the proposed approach: + +* We have extended the Java-based RDF programming framework [Apache Jena](https://jena.apache.org/) with a complete implementation of all the features of the proposed approach. +Currently, the source code of this extension can be found in the [UnfoldAndFoldWithCompositeValues branch](https://github.com/hartig/jena/tree/UnfoldAndFoldWithCompositeValues) of the [hartig/jena fork](https://github.com/hartig/jena/) of the official [Jena Github repository](https://github.com/apache/jena/). +We have created a [pull request to contribute this extension](https://github.com/apache/jena/pull/2501) to the Jena project. +* We have also extended the Perl-based RDF programming framework [Attean](https://github.com/kasei/attean) with a complete implementation of the approach. +Currently, the source code of this extension can be found in the [mutli-value-exprs branch](https://github.com/kasei/attean/tree/mutli-value-exprs) directly within the [Attean Github repository](https://github.com/kasei/attean), +ready to be merged after discussion with the Attean community.