Implementations for the BI workload of the LDBC Social Network Benchmark.
To get started with the LDBC SNB benchmarks, check out our introductory presentation: The LDBC Social Network Benchmark (PDF).
📜 If you wish to cite the LDBC SNB, please refer to the documentation repository (bib snippet).
The repository contains the following implementations:
cypher
: queries are expressed in the Cypher language and run in the Neo4j graph database management systemumbra
: queries are expressed in SQL and run in Umbra JIT-compiled columnar relational database management system.tigergraph
: queries are expressed in the GSQL language and run in the TigerGraph graph database management system
All implementations use Docker containers for ease of setup and execution. However, the setups can be adjusted to use a non-containerized DBMS.
Running an SNB BI experiment requires the following steps.
-
Pick a tool, e.g. Umbra. Make sure you have the required binaries and licenses available.
-
Generate the data sets using the SNB Datagen according to the format described in the tool's README.
-
Generate the substitution parameters using the
paramgen
tool. -
Load the data set: set the required environment variables and run the tool's
scripts/load-in-one-step.sh
script. -
Run the benchmark: set the required environment variables and run the tool's
scripts/benchmark.sh
script. -
Collect the results in the
output
directory of the tool.
To cross-validate the results of two implementations, run the power test for both tools, e.g. Cypher and Umbra results. Then, run:
scripts/cross-validate.sh cypher umbra
Note that the cross-validation uses the numdiff tool.
See .circleci/config.yml
for an up-to-date example on how to use the projects in this repository.
BI sets are being uploaded to the SURF CWI repository. (See download instructions.)