Name		Name	Last commit message	Last commit date
parent directory ..
ddl		ddl
dml		dml
headers		headers
output		output
queries		queries
scratch		scratch
scripts		scripts
README.md		README.md
batches.py		batches.py
benchmark.py		benchmark.py
queries.py		queries.py

README.md

LDBC SNB BI Neo4j/Cypher implementation

Cypher implementation of the LDBC SNB benchmark. Note that some BI queries cannot be expressed (efficiently) in vanilla Cypher so they make use of the APOC and Graph Data Science Neo4j libraries.

Generating the data set

The Neo4j implementation expects the data to be in composite-projected-fk CSV layout, without headers and with quoted fields. To generate data that confirms this requirement, run Datagen with the --explode-edges and the --format-options header=false,quoteAll=true options. This implementation also supports compressed data sets, both for the initial load and for batches. To generate compressed data sets, include compression=gzip in the Datagen's --format-options. The scripts in this repository change between compressed and uncompressed representations.

(Rationale: Files should not have headers as these are provided separately in the headers/ directory and quoting the fields in the CSV is required to preserve trailing spaces.)

In Datagen's directory (ldbc_snb_datagen_spark), issue the following commands. We assume that the Datagen project is built and the ${PLATFORM_VERSION}, ${DATAGEN_VERSION} environment variables are set correctly.

export SF=desired_scale_factor
export LDBC_SNB_DATAGEN_MAX_MEM=available_memory

rm -rf out-sf${SF}/
tools/run.py \
    --cores $(nproc) \
    --memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
    ./target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar \
    -- \
    --format csv \
    --scale-factor ${SF} \
    --explode-edges \
    --mode bi \
    --output-dir out-sf${SF}/ \
    --generate-factors \
    --format-options header=false,quoteAll=true,compression=gzip

Loading the data

Set the ${NEO4J_CSV_DIR} environment variable.

To use a locally generated data set, set the ${LDBC_SNB_DATAGEN_DIR} and ${SF} environment variables and run:

export NEO4J_CSV_DIR=${LDBC_SNB_DATAGEN_DIR}/out-sf${SF}/graphs/csv/bi/composite-projected-fk/

Or, simply run:

. scripts/use-datagen-data-set.sh

To download and use the sample data set, run:

wget -q https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip
unzip -q social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip
export NEO4J_CSV_DIR=`pwd`/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed/graphs/csv/bi/composite-projected-fk/

Or, simply run:

scripts/get-sample-data-set.sh
. scripts/use-sample-data-set.sh

Load the data:
```
scripts/load-in-one-step.sh
```
The substitution parameters should be generated using the paramgen.

Microbatches

Test loading the microbatches:

scripts/batches.sh

⚠️ Note that this script uses the data sets in the ${NEO4J_CSV_DIR} directory on the host machine but maps the paths relative to the /import directory in the Docker container (Neo4j's dedicated import directory which it uses as the basis of the import paths in the LOAD CSV Cypher commands). For example, the ${NEO4J_CSV_DIR}/deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv path is translated to the deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv relative path.

Queries

To run the queries, issue:

scripts/queries.sh ${SF}

For a test run, use:

scripts/queries.sh ${SF} --test

Working with the database

To start a database that has already been loaded, run:

scripts/start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cypher

cypher

README.md

LDBC SNB BI Neo4j/Cypher implementation

Generating the data set

Loading the data

Microbatches

Queries

Working with the database

Files

cypher

Directory actions

More options

Directory actions

More options

Latest commit

History

cypher

Folders and files

parent directory

README.md

LDBC SNB BI Neo4j/Cypher implementation

Generating the data set

Loading the data

Microbatches

Queries

Working with the database