Skip to content

Commit

Permalink
parent 6ab796e
Browse files Browse the repository at this point in the history
author shirly121 <[email protected]> 1694167237 +0800
committer xiaolei.zl <[email protected]> 1695348300 +0800

parent 6ab796e
author shirly121 <[email protected]> 1694167237 +0800
committer xiaolei.zl <[email protected]> 1695348286 +0800

[GIE Compiler] fix bugs of columnId in schema

refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (#3154)

1. Use Arrow CSV Reader to replace current adhoc csv reader, to support
more configurable options in `bulk_load.yaml`.
2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for
`MutablePropertyFragment`.

With this PR merged, `MutablePropertyFragment` will support loading
fragment from csv with options:
- delimeter: default '|'
- header_row: default true
- quoting: default false
- quoting_char: default '"'
- escaping: default false
- escaping_char: default'\\'
- batch_size: the batch size of when reading file into memory, default
1MB.
- batch_reader: default false. If set to true,
`arrow::csv::StreamingReader` will be used to parse the input file.
Otherwise, `arrow::TableReader` will be used.

With this PR merged, the performance of graph loading will be improved.
The Adhoc Reader denote the current implemented csv parser, 1,2,4,8
denotes the parallelism of graph loading, i.e. how many labels of
vertex/edge are concurrently processed.

Note that TableReader is around 10x faster than StreamingReader. The
possible reason could be the multi-threading is used.
See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for
details.

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |805s|	468s|	349s|	313s|
| Adhoc Reader | Serialization | 126s|	126s|	126s|	126s|
| Adhoc Reader  | **Total** |931s|	594s|	475s|	439s|
| Table Reader |  ReadFile | 9s	|9s	|9s|	9s|
| Table Reader | LoadGraph |455s|	280s|	211s|	182s|
| Table Reader |Serialization |126s|	126s|	126s|	126s|
| Table Reader | **Total** | 600s|	415s|	346s|	317s|
| Streaming Reader | ReadFile |91s|	91s|	91s|	91s|
| Streaming Reader | LoadGraph | 555s|	289s|	196s|	149s|
| Streaming Reader | Serialization |126s|	126s|	126s|	126s|
| Streaming Reader | **Total** | 772s|	506s|	413s|	366s|

| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph |2720s|	1548s|	1176s|	948s|
| Adhoc Reader | Serialization | 409s|	409s|	409s|	409s|
| Adhoc Reader  | **Total** | 3129s|	1957s|	1585s|	1357s|
| Table Reader |  ReadFile |24s|	24s|	24s|	24s|
| Table Reader | LoadGraph |1576s|	949s|	728s|	602s|
| Table Reader |Serialization |409s|	409s|	409s|	409s|
| Table Reader | **Total** | 2009s|	1382s|	1161s|	1035s|
| Streaming Reader | ReadFile |300s|	300s|	300s|	300s|
| Streaming Reader | LoadGraph | 1740s|	965s|	669s|	497s|
| Streaming Reader | Serialization | 409s|	409s|	409s|	409s|
| Streaming Reader | **Total** | 2539s|	1674s|	1378s|	1206s|
| Reader | Phase | 1 | 2 | 4 | 8 |
| --------- | -------------- | ---------- |---------- |----------
|---------- |
| Adhoc Reader | ReadFile\+LoadGraph | 8260s|	4900s	|3603s	|2999s|
| Adhoc Reader | Serialization | 1201s |	1201s|	1201s	|1201s|
| Adhoc Reader  | **Total** | 9461s|	6101s | 4804s	|4200s|
| Table Reader |  ReadFile | 73s	|73s|	96s|	96s|
| Table Reader | LoadGraph |4650s|	2768s|	2155s	|1778s|
| Table Reader |Serialization | 1201s |	1201s|	1201s	|1201s|
| Table Reader | **Total** | 5924s|	4042s|	3452s|	3075s|
| Streaming Reader | ReadFile | 889s |889s | 889s| 889s|
| Streaming Reader | LoadGraph | 5589s|	3005s|	2200s|	1712s|
| Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s |
| Streaming Reader | **Total** | 7679s	| 5095s |4290s| 	3802s|

FIx #3116

minor fix and move modern graph

fix grin test

todo: do_start

fix

fix

stash

fix

fix

make rules unique

dockerfile stash

minor change

remove plugin-dir

fix

minor fix

debug

debug

fix

fix

fix bulk_load.yaml

bash format

some fix

fix format

fix grin test

some fi

check ci

fix ci

set

fix ci

fix

dd

f

disable tmate

fix some bug

fix

fix

refactor

fix

fix

fix

minor

some fix

fix

support default src_dst primarykey mapping in bulk load

fix

fix

fix

fix

Ci

rename

fix java and add get_person_name.cypher

[GIE Compiler] minor fix

use graphscope gstest

format

add movie queries

dd

debug

add movie test

format

format
  • Loading branch information
shirly121 authored and zhanglei1949 committed Sep 24, 2023
1 parent fb27016 commit cbe8220
Show file tree
Hide file tree
Showing 21 changed files with 352 additions and 3 deletions.
15 changes: 14 additions & 1 deletion .github/workflows/hqps-db-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@ jobs:
INTERACTIVE_WORKSPACE: /tmp/interactive_workspace
run: |
# download dataset
git clone -b master --single-branch --depth=1 https://github.com/GraphScope/gstest.git ${GS_TEST_DIR}
#git clone -b master --single-branch --depth=1 https://github.com/GraphScope/gstest.git ${GS_TEST_DIR}
git clone -b master --single-branch --depth=1 https://github.com/zhanglei1949/gstest.git ${GS_TEST_DIR}
mkdir -p ${INTERACTIVE_WORKSPACE}/data/ldbc
GRAPH_SCHEMA_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
cp ${GRAPH_SCHEMA_YAML} ${INTERACTIVE_WORKSPACE}/data/ldbc/graph.yaml
Expand Down Expand Up @@ -131,6 +132,18 @@ jobs:
eval ${cmd}
done
# test movie graph, 8,9,10 are not supported now
cp ${GS_TEST_DIR}/flex/movies/movies_schema.yaml ${INTERACTIVE_WORKSPACE}/data/ldbc/graph.yaml
for i in 1 2 3 4 5 6 7 11 12 13 14 15;
do
cmd="./load_plan_and_gen.sh -e=hqps -i=../tests/hqps/queries/movie/query${i}.cypher -w=/tmp/codgen/"
cmd=${cmd}" -o=/tmp/plugin --ir_conf=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml "
cmd=${cmd}" --graph_schema_path=${GS_TEST_DIR}/flex/movies/movies_schema.yaml"
cmd=${cmd}" --gie_home=${GIE_HOME}"
echo $cmd
eval ${cmd}
done
- name: Run End-to-End cypher adhoc query test
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
MATCH(p : person {id: $personId}) RETURN p.name;
MATCH(p : person {id: $personId}) RETURN p.name;
16 changes: 15 additions & 1 deletion flex/tests/hqps/hqps_cypher_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ if [ ! -d ${INTERACTIVE_WORKSPACE} ]; then
fi

ENGINE_CONFIG_PATH=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/engine_config.yaml
ORI_GRAPH_SCHEMA_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
GRAPH_SCHEMA_YAML=${INTERACTIVE_WORKSPACE}/data/ldbc/graph.yaml
GRAPH_BULK_LOAD_YAML=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_bulk_load.yaml
COMPILER_GRAPH_SCHEMA=${GS_TEST_DIR}/flex/ldbc-sf01-long-date/audit_graph_schema.yaml
Expand Down Expand Up @@ -142,12 +141,27 @@ run_simple_test(){
popd
}

run_movie_test(){
echo "run movie test"
pushd ${GIE_HOME}/compiler
cmd="mvn test -Dtest=com.alibaba.graphscope.cypher.integration.movie.MovieTest"
echo "Start movie test: ${cmd}"
${cmd}
info "Finish movie test"
popd
}

kill_service
start_engine_service
start_compiler_service
run_ldbc_test
run_simple_test
kill_service
start_engine_service
start_compiler_service
# test on movie graph
run_movie_test
kill_service



Expand Down
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query1.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (tom:Person) WHERE tom.name = "Tom Hanks" RETURN tom.born AS bornYear,tom.name AS personName;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query10.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH p=shortestPath(
(bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
)
RETURN p;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query11.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(movie:Movie)
RETURN tom.id AS personId, movie.title as movieTitle, movie.released as movieReleased;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query12.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name;
4 changes: 4 additions & 0 deletions flex/tests/hqps/queries/movie/query13.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name
6 changes: 6 additions & 0 deletions flex/tests/hqps/queries/movie/query14.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5;
3 changes: 3 additions & 0 deletions flex/tests/hqps/queries/movie/query15.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(cruise:Person {name: 'Tom Cruise'})
WHERE NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(cruise)
RETURN tom.name AS actorName, movie1.title AS actedInMovie, coActor.name AS coActorName, movie2.title AS coActorActivedInMovie, cruise.name AS coCoActorName;
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query2.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (cloudAtlas:Movie {title: "Cloud Atlas"}) RETURN cloudAtlas.tagline AS tagline, cloudAtlas.released AS releasedYear,cloudAtlas.title AS title;
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query3.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (people:Person) RETURN people.name LIMIT 10;
1 change: 1 addition & 0 deletions flex/tests/hqps/queries/movie/query4.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query5.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
RETURN tom.born AS bornYear,tomHanksMovies.tagline AS movieTagline, tomHanksMovies.title AS movieTitle, tomHanksMovies.released AS releaseYear;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query6.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (cloudAtlas:Movie {title: "Cloud Atlas"})<-[:DIRECTED]-(directors)
RETURN directors.name;
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query7.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN m.title AS movieTitle, m.released AS releasedYear, coActors.name AS coActor
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query8.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"})
RETURN people.name, type(relatedTo), relatedTo
2 changes: 2 additions & 0 deletions flex/tests/hqps/queries/movie/query9.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..3]-(hollywood)
RETURN DISTINCT bacon, hollywood
1 change: 1 addition & 0 deletions interactive_engine/compiler/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@
<exclude>**/IrLdbcTest.java</exclude>
<exclude>**/SimpleMatchTest.java</exclude>
<exclude>**/IrPatternTest.java</exclude>
<exclude>**/MovieTest.java</exclude>
</excludes>
</configuration>
</plugin>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
package com.alibaba.graphscope.cypher.integration.suite.ldbc;

import java.util.Arrays;
import java.util.List;

public class MovieQueries {
public static QueryContext get_movie_query1_test() {
String query =
"MATCH (tom:Person) WHERE tom.name = \"Tom Hanks\" RETURN tom.born AS"
+ " bornYear,tom.name AS personName;";
List<String> expected =
Arrays.asList("Record<{bornYear: 1956, personName: \"Tom Hanks\"}>");
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query2_test() {
String query =
"MATCH (cloudAtlas:Movie {title: \"Cloud Atlas\"}) RETURN cloudAtlas.tagline AS"
+ " tagLine, cloudAtlas.release AS releasedYear,cloudAtlas.title AS title;";
List<String> expected =
Arrays.asList(
"Record<{tagline: \"Everything is connected\", releasedYear: 2012, title:"
+ " \"Cloud Atlas\"}>");
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query3_test() {
String query = "MATCH (people:Person) RETURN people.name LIMIT 10;";
List<String> expected =
Arrays.asList(
"Record<{people.name: \"Keanu Reeves\"}>",
"Record<{people.name: \"Carrie-Anne Moss\"}>",
"Record<{people.name: \"Laurence Fishburne\"}>",
"Record<{people.name: \"Hugo Weaving\"}>",
"Record<{people.name: \"Lilly Wachowski\"}>",
"Record<{people.name: \"Lana Wachowski\"}>",
"Record<{people.name: \"Joel Silver\"}>",
"Record<{people.name: \"Emil Eifrem\"}>",
"Record<{people.name: \"Charlize Theron\"}>",
"Record<{people.name: \"Al Pacino\"}>");
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query4_test() {
String query =
"MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released <"
+ " 2000 RETURN nineties.title LIMIT 10;";
List<String> expected =
Arrays.asList(
"Record<{nineties.title: \"The Matrix\"}>",
"Record<{nineties.title: \"The Devil's Advocate\"}>",
"Record<{nineties.title: \"A Few Good Men\"}>",
"Record<{nineties.title: \"As Good as It Gets\"}>",
"Record<{nineties.title: \"What Dreams May Come\"}>",
"Record<{nineties.title: \"Snow Falling on Cedars\"}>",
"Record<{nineties.title: \"You've Got Mail\"}>",
"Record<{nineties.title: \"Sleepless in Seattle\"}>",
"Record<{nineties.title: \"Joe Versus the Volcano\"}>",
"Record<{nineties.title: \"When Harry Met Sally\"}>");
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query5_test() {
String query =
"MATCH (tom:Person {name: \"Tom Hanks\"})-[:ACTED_IN]->(tomHanksMovies)\n"
+ "RETURN tom.born AS bornYear, tomHanksMovies.tagline AS movieTagline,"
+ " tomHanksMovies.title AS movieTitle, tomHanksMovies.released AS"
+ " releaseYear;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query6_test() {
String query =
"MATCH (cloudAtlas:Movie {title: \"Cloud Atlas\"})<-[:DIRECTED]-(directors)\n"
+ "RETURN directors.name;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query7_test() {
String query =
"MATCH (tom:Person {name:\"Tom Hanks\"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)\n"
+ "RETURN m.title AS movieTitle, m.released AS releasedYear, coActors.name AS"
+ " coActor";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query8_test() {
String query =
"MATCH (people:Person)-[relatedTo]-(:Movie {title: \"Cloud Atlas\"})\n"
+ "RETURN people.name, type(relatedTo), relatedTo";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query9_test() {
String query =
"MATCH (bacon:Person {name:\"Kevin Bacon\"})-[*1..3]-(hollywood)\n"
+ "RETURN DISTINCT bacon, hollywood";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query10_test() {
String query =
"MATCH p=shortestPath(\n"
+ " (bacon:Person {name:\"Kevin Bacon\"})-[*]-(meg:Person {name:\"Meg"
+ " Ryan\"})\n"
+ ")\n"
+ "RETURN p;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query11_test() {
String query =
"MATCH (tom:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(movie:Movie)\n"
+ "RETURN tom.id AS personId, movie.title as movieTitle, movie.released as"
+ " movieReleased;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query12_test() {
String query =
"MATCH (tom:Person {name: 'Tom"
+ " Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person)\n"
+ "RETURN coActor.name;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query13_test() {
String query =
"MATCH (tom:Person {name: 'Tom"
+ " Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)\n"
+ "WHERE tom <> coCoActor\n"
+ "AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)\n"
+ "RETURN coCoActor.name";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query14_test() {
String query =
"MATCH (tom:Person {name: 'Tom"
+ " Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)\n"
+ "WHERE tom <> coCoActor\n"
+ "AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)\n"
+ "RETURN coCoActor.name, count(coCoActor) as frequency\n"
+ "ORDER BY frequency DESC\n"
+ "LIMIT 5;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}

public static QueryContext get_movie_query15_test() {
String query =
"MATCH (tom:Person {name: 'Tom"
+ " Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(cruise:Person"
+ " {name: 'Tom Cruise'})\n"
+ "WHERE NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(cruise)\n"
+ "RETURN tom.name AS actorName, movie1.title AS actedInMovie, coActor.name AS"
+ " coActorName, movie2.title AS coActorActivedInMovie, cruise.name AS"
+ " coCoActorName;";
List<String> expected = Arrays.asList();
return new QueryContext(query, expected);
}
}
Loading

0 comments on commit cbe8220

Please sign in to comment.