.
├── README.md # General INFOS
├── FILES.md # This file, contents of repo
├── RUN.md # Instructions for running exps
├── Makefile # MAKE things Work
├── requirements.txt # Python Packages needed
│
├── collected # Experiment Results
│ ├── RAW
│ └── RESULTS
│
├── images # Docker Images
│ ├── Makefile # Build Docker Images
│ ├── init/ # Some GDB need initializations
│ ├── extra/ # Files needed by Docker Images
| ├── libs/ # Cached libraries
│ ├── gremlin-2to3.dockerfile # Uses Neo4J to convert Tp2 Data
│ ├── gremlin-arangodb.dockerfile
│ ├── gremlin-blazegraph.dockerfile
│ ├── gremlin-neo4j-tp3.dockerfile
│ ├── gremlin-neo4j.dockerfile
│ ├── gremlin-orientdb.dockerfile
│ ├── gremlin-pg.dockerfile
│ ├── gremlin-sparksee.dockerfile
│ ├── gremlin-titan-tp3.dockerfile
│ └── gremlin-titan.dockerfile
│
├── runtime # This dir is mounted inside each
│ │ # image and scrirpts are called
│ ├── converter.groovy # Used for Tp2 to Tp3 conversion of GraphSON data
│ ├── data/ # The datasets to be imported
│ ├── confs/ # Conf files for GDBs
│ ├── meta/ # Parameters for the queries
│ ├── presampled/ # Sampled nodes/edges/labels
│ ├── tp2/ # Queries for Tinkerpop2
│ └── tp3/ # Queries for Tinkerpop3
│
├── collect.sh # Used by make to collect exp
│ # results into 'collected'
├── settings/ # Settings folder
└── test.py # The test runner:
# Spanws the Docker containers
# Runs the queries
-
Makefile
Usually used to clean up before running an experiment, also has command to collect results -
test.py
It's the main script. It manages the Docker container, parses the metadata, and supervise the queries. Usage:python test.py -i [image name] [options]
Es:python test.py -d -i dbtrento/gremlin-neo4j -v /dfs/vol3/ -e JAVA_OPTS="-Xms1G -Xmn128M -Xmx120G"
-
settings.json
This file is read bytest.py
. Contains the name of the dataset and queries we are testing. It is used to actually point to the dataset and also to just infer the list of datasets. Same for the queries. Here queries do not specify if Tp2 or Tp3, only the names. -
images/
Contains the dockerfiles for each database we are going to test.gremlin-*.dockerfile
are the images which are going to be used, they can have the GDB with gremlin embedded mode (same VM) or they start a server internally and pass queries through a client.Makefile
build the images, naming them with the conventions used through the project.init/
contains init.sh
scripts for a specific image, some databases need to start services before a query can be processedextra/
some images requires extra files during installation, e.g.,arangodb_converter.go
file for graph format conversion for ArangoDB.
-
runtime/
this folder is mounted inside every docker container. Contains configurations, queries and execution scripts-
converter.groovy
this is a groovy script with Gremlin commands for Tp2 to load and export a dataset in a format readable by Tp3. -
confs/
if any configuration is needed and could be changed without rebuilding the docker image, it should stay here -
meta/
queries run based on some parameters (e.g., number of iterations, max BFS hops, etc.), those are stored here -
presampled/
queries require some node id, edge id, or label from the dataset, those are stored here. A file.json
for every dataset, and a file for every graph db to store Local IDs (LIDs), both generated bysampler.groovy
. The files arejson
serialization of the arrays of randomly selected nodes. This is done in order to provide consistency between experiments (different runs) and to have comparable tests when running again different databases. The file namedsamples_[DatasetName]
contains the set of nodes, edges and labels chosen from the specific dataset. Then we store a file namedlids_[DatasetName]_[DBName]_[MD5]
which contains the internal reference (Internal IDs) of each node/edge chosen, these are unique internal identifier for the same nodes/edges but assigned by each database. -
data/
loaders expect datasets to be here inGraphSONMode.EXTENDED
format which is readable both by Tp2 and Tp3 systems. -
tp2/
Tinkerpop 2 queries, almost all databases support this -
tp3/
Tinkerpop 3 queries, a different version of groovy requires functions implemented differently, and some methods names/signatures have changed in this version
-
Once the container is created, the database engine are started (if required) by an init script in the docker image (see images/init
), eventually the main script (execute.sh
) is invoked.
The main script is responsible for the query creation and execution:
first it creates an empty file within the image at /tmp/query
, then executes header.groovy.sh
which accordingly to the current environment variables injects the proper headers (import, functions, ecc.).
Then the content of the query file, located at $QUERY
is appended to the file.
With the only exception of the loading query: when working in native loading mode the loader.groovy
file is not appended;
furthermore the content of the sampler.groovy
file is always appended to allow the ID -> LID
mapping.
Finally gremlin.sh
is invoked providing as argument the /tmp/query
file and the command output is filtered according to the $DEBUG
variable.