Cypher implementation of the LDBC SNB benchmark. Note that some BI queries cannot be expressed (efficiently) in vanilla Cypher so they make use of the APOC and Graph Data Science Neo4j libraries.
The Neo4j implementation expects the data to be in composite-projected-fk
CSV layout, without headers and with quoted fields.
To generate data that confirms this requirement, run Datagen with the --explode-edges
and the --format-options header=false,quoteAll=true
options.
This implementation also supports compressed data sets, both for the initial load and for batches. To generate compressed data sets, include compression=gzip
in the Datagen's --format-options
. The scripts in this repository change between compressed and uncompressed representations.
(Rationale: Files should not have headers as these are provided separately in the headers/
directory and quoting the fields in the CSV is required to preserve trailing spaces.)
In Datagen's directory (ldbc_snb_datagen_spark
), issue the following commands. We assume that the Datagen project is built and the ${PLATFORM_VERSION}
, ${DATAGEN_VERSION}
environment variables are set correctly.
export SF=desired_scale_factor
export LDBC_SNB_DATAGEN_MAX_MEM=available_memory
rm -rf out-sf${SF}/
tools/run.py \
--cores $(nproc) \
--memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
./target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar \
-- \
--format csv \
--scale-factor ${SF} \
--explode-edges \
--mode bi \
--output-dir out-sf${SF}/ \
--generate-factors \
--format-options header=false,quoteAll=true,compression=gzip
-
Set the
${NEO4J_CSV_DIR}
environment variable.-
To use a locally generated data set, set the
${LDBC_SNB_DATAGEN_DIR}
and${SF}
environment variables and run:export NEO4J_CSV_DIR=${LDBC_SNB_DATAGEN_DIR}/out-sf${SF}/graphs/csv/bi/composite-projected-fk/
Or, simply run:
. scripts/use-datagen-data-set.sh
-
To download and use the sample data set, run:
wget -q https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip unzip -q social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip export NEO4J_CSV_DIR=`pwd`/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed/graphs/csv/bi/composite-projected-fk/
Or, simply run:
scripts/get-sample-data-set.sh . scripts/use-sample-data-set.sh
-
-
Load the data:
scripts/load-in-one-step.sh
-
The substitution parameters should be generated using the
paramgen
.
Test loading the microbatches:
scripts/batches.sh
${NEO4J_CSV_DIR}
directory on the host machine but maps the paths relative to the /import
directory in the Docker container (Neo4j's dedicated import directory which it uses as the basis of the import paths in the LOAD CSV
Cypher commands).
For example, the ${NEO4J_CSV_DIR}/deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv
path is translated to the deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv
relative path.
To run the queries, issue:
scripts/queries.sh ${SF}
For a test run, use:
scripts/queries.sh ${SF} --test
To start a database that has already been loaded, run:
scripts/start.sh