-
Notifications
You must be signed in to change notification settings - Fork 238
SPARQL vs. Gremlin
SPARQL is a popular query language for RDF graphs. SPARQL is simple and intuitive, though it lacks various constructs for expressing any arbitrary graph query (e.g. looping and branching constructs). On the other hand, while Gremlin can be used to perform any arbitrary graph query, it lacks much of the intuitive and clean syntax made available by SPARQL. This section will discuss how to perform common SPARQL queries in Gremlin to help the user get a sense of how to query an RDF graph with Gremlin. Finally, note that it is possible to directly execute SPARQL queries in Gremlin over SAIL-based graphs using the function sail:sparql(graph?,string)
. For more information on using SPARQL in Gremlin, please see Sesame SAIL Quad Store.
Here is a simple SPARQL query that will return all the vertices (i.e. resources) that tg:1
tg:knows
.
SELECT ?x WHERE {
tg:1 tg:knows ?x
}
An RDF store can be seen as a three column database table (with appropriate indices). Thus, each line of a SPARQL query is a pattern match on variables and/or constants, where each variable name (e.g. ?x
) must hold for all lines of the query (see Prolog). In Gremlin, this is accomplished by a filtered path traversal out of vertex tg:1
. First, some setup code to load the graph diagrammed at this location.
gremlin> $_g := sail:open()
==>sailgraph[memorystore]
gremlin> sail:add-ns('tg','http://tinkerpop.com#')
gremlin> sail:load('data/graph-example-1.ntriple','n-triples')
==>true
And now the traversal.
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]
A more complicated example is provided below where the names of the “known” resources are desired.
SELECT ?y WHERE {
tg:1 tg:knows ?x .
?x tg:name ?y
}
The corresponding Gremlin traversal is as follows.
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV/outE[@label=sail:ns('tg:name')]/inV
==>v["vadas"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["josh"^^<http://www.w3.org/2001/XMLSchema#string>]
Or, for only returning the string values:
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV/outE[@label=sail:ns('tg:name')]/inV/@value
==>vadas
==>josh
The general pattern for turning a SPARQL query into a Gremlin traversal is to find a constant in the query. For example, tg:1
. Use that constant as the root from which to start a traversal from. If there are multiple constants, then ground the traversal as follows.
SELECT ?y WHERE {
tg:1 tg:knows tg:2 .
?x tg:name ?y
}
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV[@id=sail:ns('tg:2')]/outE[@label=sail:ns('tg:name')]/inV/@value
==>vadas
In this traversal, tg:2
serves as a ground (a mid-path constant).
If the SPARQL query does not have a constant, then a full edge scan is required in Gremlin. For example,
SELECT ?z WHERE {
?x ?y ?z
}
has the corresponding Gremlin representation:
gremlin> $_g/E/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]
==>v[http://tinkerpop.com#3]
==>v[http://tinkerpop.com#3]
==>v[http://tinkerpop.com#5]
==>v[http://tinkerpop.com#3]
==>v["marko"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["29"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["vadas"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["27"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["lop"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["java"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["josh"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["32"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["ripple"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["java"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["peter"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["35"^^<http://www.w3.org/2001/XMLSchema#int>]
If ?y
, in the previous SPARQL query, is a constant, as in
SELECT ?z WHERE {
?x tg:knows ?z
}
then in Gremlin do the following.
gremlin> $_g/E[@label=sail:ns('tg:knows')]/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]