-
Notifications
You must be signed in to change notification settings - Fork 238
SPARQL vs. Gremlin
SPARQL is a popular query language for RDF graphs. SPARQL is simple and intuitive, though it lacks various constructs for expressing any arbitrary graph query (e.g. looping and branching constructs). On the other hand, while Gremlin can be used to perform any arbitrary graph query, it lacks much of the intuitive and clean syntax made available by SPARQL. This section will discuss how to perform common SPARQL queries in Gremlin to help the user get a sense of how to query an RDF graph with Gremlin. Finally, note that it is possible to directly execute SPARQL queries in Gremlin over SAIL-based graphs using the function sail:sparql(graph?,string)
. For more information on using SPARQL in Gremlin, please see Sesame SAIL Quad Store.
Here is a simple SPARQL query that will return all the vertices (i.e. resources) that tg:1
tg:knows
.
SELECT ?x WHERE {
tg:1 tg:knows ?x
}
An RDF store can be seen as a three column database table (with appropriate indices). Thus, each line of a SPARQL query is a pattern match on a three variables (or constants), where a variable name (e.g. ?x
) must hold for all lines of the query (see Prolog). In Gremlin, this is accomplished by a filtered path traversal out of vertex tg:1
. First, some setup code.
gremlin> $_g := sail:open()
==>sailgraph[memorystore]
gremlin> sail:add-ns('tg','http://tinkerpop.com#')
gremlin> sail:load('data/graph-example-1.ntriple','n-triples')
==>true
And now the query.
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]
A more complicated example may is provided below where the names of the known resources are desired.
SELECT ?y WHERE {
tg:1 tg:knows ?x .
?x tg:name ?y
}
The related Gremlin query is as follows.
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV/outE[@label=sail:ns('tg:name')]/inV
==>v["vadas"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["josh"^^<http://www.w3.org/2001/XMLSchema#string>]
Or, for only returning the string values:
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV/outE[@label=sail:ns('tg:name')]/inV/@value
==>vadas
==>josh
The general pattern for turning a SPARQL query into a Gremlin query is to find a constant in the query. For example, tg:1
. Use that constant as the root from which to start a traversal from. If there are multiple constants, then ground the traversal as follows.
SELECT ?y WHERE {
tg:1 tg:knows tg:2 .
?x tg:name ?y
}
gremlin> g:id-v('tg:1')/outE[@label=sail:ns('tg:knows')]/inV[@id=sail:ns('tg:2')]/outE[@label=sail:ns('tg:name')]/inV/@value
==>vadas
In this traversal, tg:2
serves as a ground.
If the SPARQL query does not have a constant, then a full edge scan is required in Gremlin. For example,
SELECT ?z WHERE {
?x ?y ?z
}
has the corresponding Gremlin representation:
gremlin> $_g/E/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]
==>v[http://tinkerpop.com#3]
==>v[http://tinkerpop.com#3]
==>v[http://tinkerpop.com#5]
==>v[http://tinkerpop.com#3]
==>v["marko"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["29"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["vadas"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["27"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["lop"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["java"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["josh"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["32"^^<http://www.w3.org/2001/XMLSchema#int>]
==>v["ripple"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["java"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["peter"^^<http://www.w3.org/2001/XMLSchema#string>]
==>v["35"^^<http://www.w3.org/2001/XMLSchema#int>]
If ?y
was a constant, such as
SELECT ?z WHERE {
?x tg:knows ?z
}
then in Gremlin do the following.
gremlin> $_g/E[@label=sail:ns('tg:knows')]/inV
==>v[http://tinkerpop.com#2]
==>v[http://tinkerpop.com#4]