Skip to content

Latest commit

 

History

History
 
 

cypher

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

LDBC SNB BI Neo4j/Cypher implementation

Cypher implementation of the LDBC SNB benchmark. Note that some BI queries cannot be expressed (efficiently) in vanilla Cypher so they make use of the APOC and Graph Data Science Neo4j libraries.

Generating the data set

The Neo4j implementation expects the data to be in composite-projected-fk CSV layout, without headers and with quoted fields. To generate data that confirms this requirement, run Datagen with the --explode-edges and the --format-options header=false,quoteAll=true options. This implementation also supports compressed data sets, both for the initial load and for batches. To generate compressed data sets, include compression=gzip in the Datagen's --format-options. The scripts in this repository change between compressed and uncompressed representations.

(Rationale: Files should not have headers as these are provided separately in the headers/ directory and quoting the fields in the CSV is required to preserve trailing spaces.)

In Datagen's directory (ldbc_snb_datagen_spark), issue the following commands. We assume that the Datagen project is built and the ${PLATFORM_VERSION}, ${DATAGEN_VERSION} environment variables are set correctly.

export SF=desired_scale_factor
export LDBC_SNB_DATAGEN_MAX_MEM=available_memory
rm -rf out-sf${SF}/
tools/run.py \
    --cores $(nproc) \
    --memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
    ./target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar \
    -- \
    --format csv \
    --scale-factor ${SF} \
    --explode-edges \
    --mode bi \
    --output-dir out-sf${SF}/ \
    --generate-factors \
    --format-options header=false,quoteAll=true,compression=gzip

Loading the data

  1. Set the ${NEO4J_CSV_DIR} environment variable.

    • To use a locally generated data set, set the ${LDBC_SNB_DATAGEN_DIR} and ${SF} environment variables and run:

      export NEO4J_CSV_DIR=${LDBC_SNB_DATAGEN_DIR}/out-sf${SF}/graphs/csv/bi/composite-projected-fk/

      Or, simply run:

      . scripts/use-datagen-data-set.sh
    • To download and use the sample data set, run:

      wget -q https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip
      unzip -q social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed.zip
      export NEO4J_CSV_DIR=`pwd`/social-network-sf0.003-bi-composite-projected-fk-neo4j-compressed/graphs/csv/bi/composite-projected-fk/

      Or, simply run:

      scripts/get-sample-data-set.sh
      . scripts/use-sample-data-set.sh
  2. Load the data:

    scripts/load-in-one-step.sh
  3. The substitution parameters should be generated using the paramgen.

Microbatches

Test loading the microbatches:

scripts/batches.sh

⚠️ Note that this script uses the data sets in the ${NEO4J_CSV_DIR} directory on the host machine but maps the paths relative to the /import directory in the Docker container (Neo4j's dedicated import directory which it uses as the basis of the import paths in the LOAD CSV Cypher commands). For example, the ${NEO4J_CSV_DIR}/deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv path is translated to the deletes/dynamic/Post/batch_id=2012-09-13/part-x.csv relative path.

Queries

To run the queries, issue:

scripts/queries.sh ${SF}

For a test run, use:

scripts/queries.sh ${SF} --test

Working with the database

To start a database that has already been loaded, run:

scripts/start.sh