Skip to content

Commit

Permalink
feat(interactive): Support PathExpand with PATH_OPT=TRAIL in GIE (#…
Browse files Browse the repository at this point in the history
…4184)

<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

## What do these changes do?
This PR introduces a new option `TRAIL` for Path Expand by specifying
`PATH_OPT` as`TRAIL`. It ensures no duplicated edges (i.e., unique
edges) during the traversals, e.g.,
 
<img width="612" alt="example"
src="https://github.com/user-attachments/assets/766d338e-86f7-4271-98ae-0b3ce8d53fc2">


### Configurations
Currently, we support `PATH_OPT` values `ARBITRARY`, `SIMPLE` and
`TRAIL`. The differences are as follows:

Parameter | Supported Values | Description

-----------|-------------------|------------------------------------------------
PATH_OPT | ARBITRARY | Allow vertices or edges to be duplicated.
 PATH_OPT  | SIMPLE            | No duplicated nodes.            
 PATH_OPT  | TRAIL             | No duplicated edges.            
RESULT_OPT| END_V | Only keep the end vertex.
RESULT_OPT| ALL_V | Keep all vertices along the path.
RESULT_OPT| ALL_V_E | Keep all vertices and edges along the path.

### Example
from A to X:
- **Arbitrary path** (`PATH_OPT=ARBITRARY`): Infinite paths in the
figure.
- **Simple path** (`PATH_OPT=SIMPLE`): Just one path A->B->C->D->X
- **Trail** (unique edge, `PATH_OPT=TRAIL`): Two paths, A->B->C->D->X,
and A->B->C->E->F->C->D->X


![image](https://github.com/user-attachments/assets/e267c89b-1cce-45fb-a8fe-1b0692a030bb)

The term "trail" is defined in [1].

<!-- Please give a short brief about these changes. -->

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this
change? -->

Fixes [#4072](#4072)

[1] Deutsch, Alin, et al. "Graph pattern matching in GQL and SQL/PGQ."
Proceedings of the 2022 International Conference on Management of Data.
2022.

---------

Co-authored-by: BingqingLyu <[email protected]>
Co-authored-by: xiaolei.zl <[email protected]>
  • Loading branch information
3 people authored Aug 26, 2024
1 parent 1159170 commit d934a83
Show file tree
Hide file tree
Showing 16 changed files with 322 additions and 45 deletions.
15 changes: 15 additions & 0 deletions docs/interactive_engine/tinkerpop/supported_gremlin_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,18 @@ lengthRange - the lower and the upper bounds of the path length, </br> edgeLabel
Usages of the with()-step: </br>
keyValuePair - the options to configure the corresponding behaviors of the `PathExpand`-step.
Below are the supported values and descriptions for `PATH_OPT` and `RESULT_OPT` options.
Parameter | Supported Values | Description
-----------|-------------------|------------------------------------------------
PATH_OPT | ARBITRARY | Allow vertices or edges to be duplicated.
PATH_OPT | SIMPLE | No duplicated nodes.
PATH_OPT | TRAIL | No duplicated edges.
RESULT_OPT| END_V | Only keep the end vertex.
RESULT_OPT| ALL_V | Keep all vertices along the path.
RESULT_OPT| ALL_V_E | Keep all vertices and edges along the path.
```bash
# expand hops within the range of [1, 10) along the outgoing edges,
# vertices can be duplicated and only the end vertex should be kept
Expand All @@ -594,6 +606,9 @@ g.V().out("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
# vertices and edges can be duplicated, and all vertices and edges along the path should be kept
g.V().out("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'ALL_V_E')
# expand hops within the range of [1, 10) along the outgoing edges,
# edges cannot be duplicated, and all vertices and edges along the path should be kept
g.V().out("1..10").with('PATH_OPT', 'TRAIL').with('RESULT_OPT', 'ALL_V_E')
# expand hops within the range of [1, 10) along the outgoing edges,
# vertices cannot be duplicated and all vertices should be kept
g.V().out("1..10").with('PATH_OPT', 'SIMPLE').with('RESULT_OPT', 'ALL_V')
# = g.V().out("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
Expand Down
2 changes: 2 additions & 0 deletions interactive_engine/compiler/ir_experimental_ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ fi

# Test3: run gremlin standard tests on distributed experimental store via calcite-based ir
# start distributed engine service and load modern graph
export DISTRIBUTED_ENV=true
cd ${base_dir}/../executor/ir/target/release &&
RUST_LOG=info DATA_PATH=/tmp/gstest/modern_graph_exp_bin PARTITION_ID=0 ./start_rpc_server --config ${base_dir}/../executor/ir/integrated/config/distributed/server_0 &
cd ${base_dir}/../executor/ir/target/release &&
Expand All @@ -54,6 +55,7 @@ if [ $exit_code -ne 0 ]; then
echo "ir\(calcite-based\) gremlin integration with proto physical test on distributed experimental store fail"
exit 1
fi
unset DISTRIBUTED_ENV

# Test4: run cypher movie tests on experimental store via ir-core
cd ${base_dir}/../executor/ir/target/release && DATA_PATH=/tmp/gstest/movie_graph_exp_bin RUST_LOG=info ./start_rpc_server --config ${base_dir}/../executor/ir/integrated/config &
Expand Down
4 changes: 4 additions & 0 deletions interactive_engine/compiler/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,10 @@
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-clean-plugin</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ public static final PathOpt ffiPathOpt(GraphOpt.PathExpandPath opt) {
switch (opt) {
case ARBITRARY:
return PathOpt.Arbitrary;
case TRAIL:
return PathOpt.Trail;
case SIMPLE:
default:
return PathOpt.Simple;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -562,6 +562,8 @@ public static final GraphAlgebraPhysical.PathExpand.PathOpt protoPathOpt(
return GraphAlgebraPhysical.PathExpand.PathOpt.ARBITRARY;
case SIMPLE:
return GraphAlgebraPhysical.PathExpand.PathOpt.SIMPLE;
case TRAIL:
return GraphAlgebraPhysical.PathExpand.PathOpt.TRAIL;
default:
throw new UnsupportedOperationException(
"opt " + opt + " in path is unsupported yet");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ public enum Match {

public enum PathExpandPath {
ARBITRARY,
SIMPLE
SIMPLE,
TRAIL
}

public enum PathExpandResult {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@

public enum PathOpt implements IntEnum<PathOpt> {
Arbitrary,
Simple;
Simple,
Trail;

@Override
public int getInt() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,11 +199,36 @@ public abstract class IrGremlinQueryTest extends AbstractGremlinProcessTest {

public abstract Traversal<Vertex, Object> get_g_V_path_expand_until_age_gt_30_values_age();

// g.V().has("id",2).both("1..5").with("PATH_OPT","ARBITRARY").with("RESULT_OPT","END_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_endv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","SIMPLE").with("RESULT_OPT","END_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_simple_endv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","TRAIL").with("RESULT_OPT","END_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_trail_endv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","ARBITRARY").with("RESULT_OPT","ALL_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_allv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","SIMPLE").with("RESULT_OPT","ALL_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_simple_allv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","TRAIL").with("RESULT_OPT","ALL_V").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_trail_allv_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","ARBITRARY").with("RESULT_OPT","ALL_V_E").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_allve_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","SIMPLE").with("RESULT_OPT","ALL_V_E").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_simple_allve_count();

// g.V().has("id",2).both("1..5").with("PATH_OPT","TRAIL").with("RESULT_OPT","ALL_V_E").count()
public abstract Traversal<Vertex, Long> get_g_VX2X_both_with_trail_allve_count();

@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
@Test
public void g_V_path_expand_until_age_gt_30_values_age() {
// the until condition follows a sql-like expression syntax, which can only be opened when
// language type is antlr_gremlin_calcite
assumeTrue("antlr_gremlin_calcite".equals(System.getenv("GREMLIN_SCRIPT_LANGUAGE_NAME")));
final Traversal<Vertex, Object> traversal =
get_g_V_path_expand_until_age_gt_30_values_age();
Expand Down Expand Up @@ -251,6 +276,96 @@ public void g_V_where_expr_name_equal_marko_and_age_gt_20_or_age_lt_10_name() {
Assert.assertEquals("marko", traversal.next());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_arbitrary_endv_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_arbitrary_endv_count();
printTraversalForm(traversal);
Assert.assertEquals(28, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_simple_endv_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_simple_endv_count();
printTraversalForm(traversal);
Assert.assertEquals(9, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_trail_endv_count() {
// Skip this test in distributed settings because edge ids might differ
// across partitions in experimental store.
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
assumeFalse("true".equals(System.getenv("DISTRIBUTED_ENV")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_trail_endv_count();
printTraversalForm(traversal);
Assert.assertEquals(11, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_arbitrary_allv_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_arbitrary_allv_count();
printTraversalForm(traversal);
Assert.assertEquals(28, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_simple_allv_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_simple_allv_count();
printTraversalForm(traversal);
Assert.assertEquals(9, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_trail_allv_count() {
// Skip this test in distributed settings because edge ids might differ
// across partitions in experimental store.
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
assumeFalse("true".equals(System.getenv("DISTRIBUTED_ENV")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_trail_allv_count();
printTraversalForm(traversal);
Assert.assertEquals(11, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_arbitrary_allve_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_arbitrary_allve_count();
printTraversalForm(traversal);
Assert.assertEquals(28, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_simple_allve_count() {
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_simple_allve_count();
printTraversalForm(traversal);
Assert.assertEquals(9, traversal.next().intValue());
}

@Test
@LoadGraphWith(LoadGraphWith.GraphData.MODERN)
public void g_VX2X_both_with_trail_allve_count() {
// Skip this test in distributed settings because edge ids might differ
// across partitions in experimental store.
assumeFalse("hiactor".equals(System.getenv("ENGINE_TYPE")));
assumeFalse("true".equals(System.getenv("DISTRIBUTED_ENV")));
final Traversal<Vertex, Long> traversal = get_g_VX2X_both_with_trail_allve_count();
printTraversalForm(traversal);
Assert.assertEquals(11, traversal.next().intValue());
}

@Test
@LoadGraphWith(MODERN)
public void g_V_both_both_dedup_byXoutE_countX_name() {
Expand Down Expand Up @@ -1425,6 +1540,96 @@ public Traversal<Vertex, Object> get_g_V_select_expr_2_xor_3_mult_2_limit_1() {
.values("name");
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_endv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "ARBITRARY")
.with("RESULT_OPT", "END_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_simple_endv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "SIMPLE")
.with("RESULT_OPT", "END_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_trail_endv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "TRAIL")
.with("RESULT_OPT", "END_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_allv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "ARBITRARY")
.with("RESULT_OPT", "ALL_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_simple_allv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "SIMPLE")
.with("RESULT_OPT", "ALL_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_trail_allv_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "TRAIL")
.with("RESULT_OPT", "ALL_V")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_arbitrary_allve_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "ARBITRARY")
.with("RESULT_OPT", "ALL_V_E")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_simple_allve_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "SIMPLE")
.with("RESULT_OPT", "ALL_V_E")
.count());
}

@Override
public Traversal<Vertex, Long> get_g_VX2X_both_with_trail_allve_count() {
return ((IrCustomizedTraversal)
g.V().has("id", 2)
.both("1..5")
.with("PATH_OPT", "TRAIL")
.with("RESULT_OPT", "ALL_V_E")
.count());
}

@Override
public Traversal<Vertex, Object> get_g_V_path_expand_until_age_gt_30_values_age() {
return ((IrCustomizedTraversal)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,13 +87,13 @@ public void configure(final Object... keyValues) {
String key = toCamelCaseInsensitive(originalKey);
String value = toCamelCaseInsensitive(originalVal);
if (key.equals("PathOpt")) {
if (value.equals("Arbitrary") || value.equals("Simple")) {
if (value.equals("Arbitrary") || value.equals("Simple") || value.equals("Trail")) {
this.pathOpt = PathOpt.valueOf(value);
} else {
throw new ExtendGremlinStepException(
"value "
+ originalVal
+ " is invalid, use ARBITRARY or SIMPLE instead (case"
+ " is invalid, use ARBITRARY, SIMPLE or TRAIL instead (case"
+ " insensitive)");
}
} else if (key.equals("ResultOpt")) {
Expand Down
1 change: 1 addition & 0 deletions interactive_engine/executor/ir/core/src/plan/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2312,6 +2312,7 @@ mod graph {
pub enum PathOpt {
Arbitrary = 0,
Simple = 1,
Trail = 2,
}

#[allow(dead_code)]
Expand Down
Loading

0 comments on commit d934a83

Please sign in to comment.