feat(interactive): Introduce a new benchmark tool for GIE (#4245)

## What do these changes do?  As titled. Main features of the new benchmark tool includes: * **Support for Multiple Query Languages**. The tool accommodates various graph query languages, including Gremlin and Cypher, allowing systems to configure according to their specific language support. * **Support for Different Graph Systems**. It supports comparison among multiple graph systems, such as GraphScope GIE and KuzuDB. More systems will be integrated in the future. * **Support for Versatile Workload**. The tool supports various workloads, including LDBC IC/BI, LSQB, and JOB. * **Results Evaluation**. It enables correctness validation and performance benchmarking for detailed comparisons. The results of the output comparison are illustrated as follows: ![image](https://github.com/user-attachments/assets/94e42d11-26a7-47e2-9410-3585cb67d029) ## Related issue number  Fixes #3862 , #4014 --------- Co-authored-by: Longbin Lai <[email protected]>
alibaba · Sep 24, 2024 · abca708 · abca708
1 parent 15c6a6c
commit abca708
Show file tree

Hide file tree

Showing 179 changed files with 4,036 additions and 404 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -74,6 +74,7 @@ and the vineyard store that offers efficient in-memory data transfers.
    interactive_engine/tinkerpop_eco
    interactive_engine/neo4j_eco
    interactive_engine/gopt
+   interactive_engine/benchmark_tool
    .. interactive_engine/guide_and_examples
    interactive_engine/design_of_gie
    .. interactive_engine/supported_gremlin_steps

diff --git a/docs/interactive_engine/benchmark_tool.md b/docs/interactive_engine/benchmark_tool.md
@@ -0,0 +1,161 @@
+# A Generic Benchmark Tool
+
+We provide a benchmarking tool to evaluate the performance of Interactive Engine. This tool acts as multiple clients that send queries (Gremlin, or Cypher) to the server through the corresponding endpoint exposed by the engine. It reports performance metrics such as latency, throughput, and query results.
+
+Notably, the tool has recently been enhanced to support comprehensive comparisons of different systems and a variety of benchmark workloads, enabling thorough assessments and comparison of query correctness and performance.
+
+## Benchmark Tool Overview
+
+Here are some key features of the benchmark tool:
+
+* **Multiple Query Languages**. The tool accommodates various graph query languages, including Gremlin and Cypher, allowing systems to configure according to their specific language support.
+* **Different Graph Systems**. It supports comparison among multiple graph systems, such as GraphScope GIE and KuzuDB. More systems will be integrated in the future.
+* **Versatile Workload**. The tool supports various workloads, including [LDBC IC](https://ldbcouncil.org/benchmarks/snb-interactive/) and [BI](https://ldbcouncil.org/benchmarks/snb-bi/), [LSQB](https://github.com/ldbc/lsqb), and [JOB](https://github.com/gregrahn/join-order-benchmark).
+* **Results Evaluation**. It enables correctness validation and performance benchmarking for detailed comparisons.
+
+## Benchmark Tool Usage
+
+The benchark tool is provided in [here](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark).
+The benchmark program sends mixed queries to the server by reading query templates from [queries](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark/queries) with filling the parameters in the query templates using [substitution_parameters](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark/data/substitution_parameters).
+The program uses a round-robin strategy to iterate all the **enabled** queries with corresponding parameters.
+
+### Repository contents
+
+```
+- bin
+    - bench.sh                          // script for running benchmark for queries
+    - collect.sh                        // script for collecting benchmark results
+- config              
+    - interactive-benchmark.properties  // configurations for running benchmark
+- data
+    - substitution_parameters           // query parameter files using to fill the query templates
+    - expected_results                  // expected query results for the running queries 
+- queries                               // query templates including LDBC queries, LSQB queries, Job queries, customized queries, etc.
+- dbs                                   // Other graph systems for comparison. Currently, KuzuDB is supported.
+- example                               // an example to compare GraphScope GIE and Kuzu
+- src                                   // source code of benchmark program
+```
+
+_Note:_ the queries here with the prefix _ldbc_query_ are implementations of LDBC official interactive complex reads,
+the queries with the prefix _bi_query_ are implementations of LDBC official business intelligence,
+the queries with the prefix _lsqb_query_ are implementations of LDBC's labelled subgraph query benchmark,
+and the queries with the prefix _job_ are the implementation of JOB Benchmark.
+The gremlin queries should be with suffix _.gremlin_, and cypher queries should be with suffix _.cypher_.
+The corresponding parameters (factor 1) for LDBC queries are generated by [LDBC official tools](http://github.com/ldbc/ldbc_snb_datagen).
+
+### Building
+
+Build benchmark program using Maven:
+
+```bash
+mvn clean package
+```
+
+All the binary and queries would be packed into _target/benchmark-0.0.1-SNAPSHOT-dist.tar.gz_,
+and you can use deploy the package to anywhere could connect to the gremlin endpoint (which should be provided in interactive-benchmark.properties).
+
+### Running the benchmark
+
+```bash
+./bin/bench.sh                        # run the benchmark program with the provided properties
+```
+
+With the example configuration file ``example/job_benchmark.properties``, which compares GraphScope-GIE and KuzuDB while executing the JOB Benchmark, the example of results are as follows:
+
+```
+Start to benchmark system: GIE
+QueryName[13a], Parameter[{}], ResultCount[1], ExecuteTimeMS[3638].
+QueryName[32a], Parameter[{}], ResultCount[1], ExecuteTimeMS[266].
+QueryName[9a], Parameter[{}], ResultCount[1], ExecuteTimeMS[3669].
+QueryName[5c], Parameter[{}], ResultCount[1], ExecuteTimeMS[8603].
+QueryName[3a], Parameter[{}], ResultCount[1], ExecuteTimeMS[613].
+...
+System: GIE; query count: 35; execute time(ms): xxx qps: xxx
+
+Start to benchmark system: KuzuDb
+QueryName[13a], Parameter[{}], ResultCount[1], ExecuteTimeMS[7068].
+QueryName[32a], Parameter[{}], ResultCount[1], ExecuteTimeMS[253].
+QueryName[9a], Parameter[{}], ResultCount[1], ExecuteTimeMS[5122].
+QueryName[5c], Parameter[{}], ResultCount[1], ExecuteTimeMS[13623].
+QueryName[3a], Parameter[{}], ResultCount[1], ExecuteTimeMS[4676].
+...
+System: KuzuDB; query count: 35; execute time(ms): xxx qps: xxx
+```
+
+### Collecting the results
+
+```bash
+./bin/collect.sh                      # run the result collection program to collect the results and generate a performance comparison table
+```
+
+Based on the benchmark results, the collected data and the final performance comparison table are as follows:
+
+
+| QueryName | GIE Avg | GIE P50 | GIE P90 | GIE P95 | GIE P99 | GIE Count | KuzuDb Avg | KuzuDb P50 | KuzuDb P90 | KuzuDb P95 | KuzuDb P99 | KuzuDb Count |
+| --------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ---------- | ---------- | ---------- | ---------- | ------------ |
+| 3a        | 613.00  | 613     | 613     | 613     | 613     | 1         | 4676.00    | 4676       | 4676       | 4676       | 4676       | 1            |
+| 5c        | 8603.00 | 8603    | 8603    | 8603    | 8603    | 1         | 13623.00   | 13623      | 13623      | 13623      | 13623      | 1            |
+| 9a        | 3669.00 | 3669    | 3669    | 3669    | 3669    | 1         | 5122.00    | 5122       | 5122       | 5122       | 5122       | 1            |
+| 13a       | 3638.00 | 3638    | 3638    | 3638    | 3638    | 1         | 7068.00    | 7068       | 7068       | 7068       | 7068       | 1            |
+| 32a       | 266.00  | 266     | 266     | 266     | 266     | 1         | 253.00     | 253        | 253        | 253        | 253        | 1            |
+
+A more detailed end-to-end example is provided in [here](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark/example).
+
+## Configurations
+
+All detailed configurations can be found in ``config/interactive-benchmark.properties``.
+
+Below we highlight some key settings.
+
+### Configure Compared Systems
+
+We facilitate comparisons between various graph systems. For instance, to compare the GIE and Kuzu systems, the interactive-benchmark.properties file can be configured as follows. The Benchmark Tool will subsequently send queries to both GIE and Kuzu, gathering and analyzing their results.
+
+```
+# The configuration for the compared systems.
+# Currently, the supported systems includes GIE and KuzuDb.
+# For each system, starting from system.1 to system.n, the following configurations are needed:
+# name: the name of the system, e.g., GIE, KuzuDb.
+# client: the client of the system, e.g., for GIE, it can be cypher, gremlin; for KuzuDB, it should be kuzu.
+# endpoint(optional): the endpoint of the system if the sytem provides a service endpoint, e.g., for GIE gremlin, it is 127.0.0.1:8182 by default.
+# path(optional): the path of the database of the system if the system is a local database and need to access the database by the path, e.g., for KuzuDb, it can be /path_to_db/example_db.
+# Either of endpoint or path need to be provided, depending on the access method of the system.
+system.1.name = GIE
+system.1.client = cypher
+system.1.endpoint = 127.0.0.1:7687
+system.1.path =
+system.2.name = KuzuDb
+system.2.client = kuzu
+system.2.endpoint =
+system.2.path = ./job_db
+```
+
+### Configure Workloads
+
+Currently, we have provided commonly used benchmark workloads including ic, bi, lsqb, and job. Users can also add their own benchmarking queries to [queries](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark/queries) as well as adding substitution parameters of queries to [substitution_parameters](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/benchmark/data/substitution_parameters). Note that the file name of user-defined query templates should follow the prefix _custom_query_ or _custom_constant_query_. The difference between custom_query and custom_constant_query is that the latter has no corresponding parameters.
+
+Taking JOB benchmark as an example, the related configuration is as follows:
+
+```
+# The configuration for the benchmarking workloads.
+# the directory of query templates
+query.dir = ./queries/cypher_queries/job
+# the directory of query parameters. If the queries do not have parameters, leave it empty.
+query.parameters.dir = 
+# query file suffix, e.g., cypher (ldbc_query.cypher), gremlin (ldbc_query.gremlin), txt (ldbc_query.txt), etc.
+query.file.suffix=cypher
+# specify which kind of queries are sent.
+# if query.all.enable is true, the benchmark will send all the queries in the query.dir.
+query.all.enable=true
+```
+
+### Configure Results Collection
+
+By default, benchmark results will be output to the `interactive-benchmark.log` and `interactive-benchmark-report.md` files, as exemplified in the sections "Running the benchmark" and "Collecting the results" above. Specifically, if you want to further compare query correctness under the current workloads, you can provide the corresponding configuration:
+
+```
+# the directory of query results which is optional. if provided, the benchmarking results will be compared with the expected results.
+query.expected.path = ./data/expected_results/job_expected.json
+```
+
+The benchmark tool will automatically execute the queries and compare the results for correctness.
diff --git a/interactive_engine/benchmark/Makefile b/interactive_engine/benchmark/Makefile
@@ -0,0 +1,35 @@
+OPT?=poc
+
+CUR_DIR:=$(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
+
+ifeq ($(JAVA_HOME),)
+    java:=java
+else
+    java:=$(JAVA_HOME)/bin/java
+endif
+
+UNAME_S := $(shell uname -s)
+UNAME_M := $(shell uname -m)
+
+config.path:=config/interactive-benchmark.properties
+QUIET_OPT := --quiet
+
+build:
+	cd $(CUR_DIR) && mvn clean package ${QUIET_OPT} && \
+	cd target && \
+	tar zxvf gaia-benchmark-0.0.1-SNAPSHOT-dist.tar.gz > /dev/null
+
+clean:
+	cd $(CUR_DIR) && mvn clean
+
+run:
+	cd $(CUR_DIR) && $(java) \
+	  -cp "$(CUR_DIR)/target/gaia-benchmark-0.0.1-SNAPSHOT/lib/*" \
+	  com.alibaba.graphscope.gaia.benchmark.InteractiveBenchmark ${config.path}
+
+collect:
+	cd $(CUR_DIR) && $(java) \
+	  -cp "$(CUR_DIR)/target/gaia-benchmark-0.0.1-SNAPSHOT/lib/*" \
+	  com.alibaba.graphscope.gaia.benchmark.CollectResult ${config.path}
+
+.PHONY: build run
diff --git a/interactive_engine/benchmark/README.md b/interactive_engine/benchmark/README.md
@@ -1,26 +1,30 @@
 ## Benchmark Tool Usage
 
-In this directory is a tool that can be used to benchmark GAIA. It serves as multiple clients to send 
-queries to gremlin server through the gremlin endpoint exposed by the engine, and report the performance numbers 
-(e.g., latency, throughput, query results).
-The benchmark program sends mixed queries to the server by reading query templates from [queries](queries) with filling the parameters in the query templates 
-using [substitution_parameters](data/substitution_parameters). 
+This directory contains a benchmarking tool for GraphScope GIE and other specified systems. It functions as multiple clients, sending queries through the engine's exposed endpoint or directly to the database, depending on the querying method for each system. The tool reports performance metrics such as latency, throughput, and query results.
+The benchmark program sends mixed queries to the server by reading query templates from [queries](queries) with filling the parameters in the query templates using [substitution_parameters](data/substitution_parameters). 
 The program uses a round-robin strategy to iterate all the **enabled** queries with corresponding parameters.
 
 ### Repository contents
 ```
+- bin
+    - bench.sh                          // script for running benchmark for queries
+    - collect.sh                        // script for collecting benchmark results
 - config                                
     - interactive-benchmark.properties  // configurations for running benchmark
 - data
     - substitution_parameters           // query parameter files using to fill the query templates
-- queries                               // query templates including LDBC queries, K-hop queries and user-defined queries
-- scripts
-    - benchmark.sh                      // script for running benchmark
-    - cal.py                            // script for calculating benchmark results
+    - expected_results                  // expected query results for the running queries 
+- queries                               // query templates including LDBC queries, LSQB queries, Job queries, customized queries, etc.
+- dbs                                   // Other graph systems for comparison. Currently, KuzuDB is supported.
+- example                               // an example to compare GraphScope GIE and Kuzu
 - src                                   // source code of benchmark program
 ```
 _Note:_ the queries here with the prefix _ldbc_query_ are implementations of LDBC official interactive complex reads,
-and the corresponding parameters (factor 1) are generated by [LDBC official tools](http://github.com/ldbc/ldbc_snb_datagen).
+the queries with the prefix _bi_query_ are implementations of LDBC official business intelligence,
+the queries with the prefix _lsqb_query_ are implementations of LDBC's labelled subgraph query benchmark,
+and the queries with the prefix _job_ are the implementation of JOB Benchmark.
+The gremlin queries should be with suffix _.gremlin_, and cypher queries should be with suffix _.cypher_.
+The corresponding parameters (factor 1) for LDBC queries are generated by [LDBC official tools](http://github.com/ldbc/ldbc_snb_datagen).
 
 ### Building
 
@@ -29,36 +33,50 @@ Build benchmark program using Maven:
 mvn clean package
 ```
 All the binary and queries would be packed into _target/benchmark-0.0.1-SNAPSHOT-dist.tar.gz_, 
-and you can use deploy the package to anywhere could connect to the gremlin endpoint. 
+and you can use deploy the package to anywhere could connect to the endpoint (which should be provided in interactive-benchmark.properties). 
 
 ### Running the benchmark
 
 ```bash
-cd target
-tar -xvf gaia-benchmark-0.0.1-SNAPSHOT-dist.tar.gz
-cd gaia-benchmark-0.0.1-SNAPSHOT
-vim config/interactive-benchmark.properties # specify the gremlin endpoint of your server and modify running configurations
-chmod +x ./scripts/benchmark.sh 
-./scripts/benchmark.sh                      # run the benchmark program
+./bin/bench.sh                        # run the benchmark program with the provided properties
 ```
+With the example configuration file ``example/job_benchmark.properties``, which compares GraphScope-GIE and KuzuDB while executing the JOB Benchmark, the results are as follows:
+```
+Start to benchmark system: GIE
+QueryName[13a], Parameter[{}], ResultCount[1], ExecuteTimeMS[3638].
+QueryName[32a], Parameter[{}], ResultCount[1], ExecuteTimeMS[266].
+QueryName[9a], Parameter[{}], ResultCount[1], ExecuteTimeMS[3669].
+QueryName[5c], Parameter[{}], ResultCount[1], ExecuteTimeMS[8603].
+QueryName[3a], Parameter[{}], ResultCount[1], ExecuteTimeMS[613].
+...
+System: GIE; query count: 35; execute time(ms): xxx qps: xxx
 
-Benchmark reports numbers as following:
+Start to benchmark system: KuzuDb
+QueryName[13a], Parameter[{}], ResultCount[1], ExecuteTimeMS[7068].
+QueryName[32a], Parameter[{}], ResultCount[1], ExecuteTimeMS[253].
+QueryName[9a], Parameter[{}], ResultCount[1], ExecuteTimeMS[5122].
+QueryName[5c], Parameter[{}], ResultCount[1], ExecuteTimeMS[13623].
+QueryName[3a], Parameter[{}], ResultCount[1], ExecuteTimeMS[4676].
+...
+System: KuzuDB; query count: 35; execute time(ms): xxx qps: xxx
 ```
-QueryName[LDBC_QUERY_1], Parameter[{firstName=John, personId=17592186223433}], ResultCount[87], ExecuteTimeMS[ 1266 ].
-QueryName[LDBC_QUERY_12], Parameter[{tagClassName=Judge, personId=19791209469071}], ResultCount[0], ExecuteTimeMS[ 259 ].
-QueryName[LDBC_QUERY_11], Parameter[{workFromYear=2001, personId=32985348901156, countryName=Bolivia}], ResultCount[0], ExecuteTimeMS[ 60 ].
-QueryName[LDBC_QUERY_9], Parameter[{personId=10995116420051, maxDate=20121128080000000}], ResultCount[20], ExecuteTimeMS[ 55755 ].
-QueryName[LDBC_QUERY_8], Parameter[{personId=67523}], ResultCount[20], ExecuteTimeMS[ 148 ].
-QueryName[LDBC_QUERY_7], Parameter[{personId=26388279199350}], ResultCount[0], ExecuteTimeMS[ 10 ].
-QueryName[LDBC_QUERY_6], Parameter[{personId=26388279148519, tagName=Vallabhbhai_Patel}], ResultCount[0], ExecuteTimeMS[ 12837 ].
-QueryName[LDBC_QUERY_5], Parameter[{minDate=20120814080000000, personId=2199023436754}], ResultCount[0], ExecuteTimeMS[ 11268 ].
-QueryName[LDBC_QUERY_3], Parameter[{durationDays=30, endDate=20110701080000000, countryXName=Mongolia, countryYName=Namibia, personId=8796093204429, startDate=20110601080000000}], ResultCount[20]
-, ExecuteTimeMS[ 21474 ].
-QueryName[LDBC_QUERY_2], Parameter[{personId=28587302394490, maxDate=20121128080000000}], ResultCount[20], ExecuteTimeMS[ 331 ].
-query count: 10; execute time(ms): ...; qps: ...
+
+### Collecting the results
+
+```bash
+./bin/collect.sh                      # run the result collection program to collect the results and generate a performance comparison table
 ```
+Based on the benchmark results, the collected data and the final performance comparison table are as follows:
+
+And the comparison result after collection is as follows:
+| QueryName | GIE Avg | GIE P50 | GIE P90 | GIE P95 | GIE P99 | GIE Count | KuzuDb Avg | KuzuDb P50 | KuzuDb P90 | KuzuDb P95 | KuzuDb P99 | KuzuDb Count |
+| --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- | --------- |
+| 3a | 613.00 | 613 | 613 | 613 | 613 | 1  | 4676.00 | 4676 | 4676 | 4676 | 4676 | 1  |
+| 5c | 8603.00 | 8603 | 8603 | 8603 | 8603 | 1  | 13623.00 | 13623 | 13623 | 13623 | 13623 | 1  |
+| 9a | 3669.00 | 3669 | 3669 | 3669 | 3669 | 1  | 5122.00 | 5122 | 5122 | 5122 | 5122 | 1  |
+| 13a | 3638.00 | 3638 | 3638 | 3638 | 3638 | 1  | 7068.00 | 7068 | 7068 | 7068 | 7068 | 1  |
+| 32a | 266.00 | 266 | 266 | 266 | 266 | 1  | 253.00 | 253 | 253 | 253 | 253 | 1  |
 
 ### User-defined Benchmarking Queries
 Users can add their own benchmarking queries to [queries](queries) as well as adding substitution parameters of queries to [substitution_parameters](data/substitution_parameters). 
-Note that the file name of user-defined query templates should follow the prefix _custom_query_ or _custom_constant_query_. The difference between custom_query and 
-custom_constant_query is that the latter has no corresponding parameters.
+Note that the file name of user-defined query templates should follow the prefix _custom_query_ or _custom_constant_query_. The difference between custom_query and custom_constant_query is that the latter has no corresponding parameters.