Directory Structure and File Tree

Overview

.
├── README.md                           # General INFOS
├── FILES.md                            # This file, contents of repo
├── RUN.md                              # Instructions for running exps
├── Makefile                            # MAKE things Work
├── requirements.txt                    # Python Packages needed
│
├── collected                           # Experiment Results
│   ├── RAW
│   └── RESULTS
│
├── images                              # Docker Images
│   ├── Makefile                        # Build Docker Images 
│   ├── init/                           # Some GDB need initializations
│   ├── extra/                          # Files needed by Docker Images
|   ├── libs/                           # Cached libraries
│   ├── gremlin-2to3.dockerfile         # Uses Neo4J to convert Tp2 Data 
│   ├── gremlin-arangodb.dockerfile
│   ├── gremlin-blazegraph.dockerfile
│   ├── gremlin-neo4j-tp3.dockerfile
│   ├── gremlin-neo4j.dockerfile
│   ├── gremlin-orientdb.dockerfile
│   ├── gremlin-pg.dockerfile
│   ├── gremlin-sparksee.dockerfile
│   ├── gremlin-titan-tp3.dockerfile
│   └── gremlin-titan.dockerfile   
│
├── runtime                             # This dir is mounted inside each 
│   │                                   # image and scrirpts are called    
│   ├── converter.groovy                # Used for Tp2 to Tp3 conversion of GraphSON data
│   ├── data/                           # The datasets to be imported
│   ├── confs/                          # Conf files for GDBs
│   ├── meta/                           # Parameters for the queries
│   ├── presampled/                     # Sampled nodes/edges/labels
│   ├── tp2/                            # Queries for Tinkerpop2
│   └── tp3/                            # Queries for Tinkerpop3
│
├── collect.sh                          # Used by make to collect exp
│                                       #   results into 'collected'
├── settings/                           # Settings folder
└── test.py                             # The test runner:
                                        # Spanws the Docker containers
                                        # Runs the queries

Details

Makefile Usually used to clean up before running an experiment, also has command to collect results
test.py It's the main script. It manages the Docker container, parses the metadata, and supervise the queries. Usage: python test.py -i [image name] [options] Es:
```
python test.py -d -i dbtrento/gremlin-neo4j  -v  /dfs/vol3/    -e JAVA_OPTS="-Xms1G -Xmn128M -Xmx120G"
```
settings.json This file is read by test.py. Contains the name of the dataset and queries we are testing. It is used to actually point to the dataset and also to just infer the list of datasets. Same for the queries. Here queries do not specify if Tp2 or Tp3, only the names.
images/ Contains the dockerfiles for each database we are going to test.
- gremlin-*.dockerfile are the images which are going to be used, they can have the GDB with gremlin embedded mode (same VM) or they start a server internally and pass queries through a client.
- Makefile build the images, naming them with the conventions used through the project.
- init/ contains init .sh scripts for a specific image, some databases need to start services before a query can be processed
- extra/ some images requires extra files during installation, e.g., arangodb_converter.go file for graph format conversion for ArangoDB.
runtime/ this folder is mounted inside every docker container. Contains configurations, queries and execution scripts
- converter.groovy this is a groovy script with Gremlin commands for Tp2 to load and export a dataset in a format readable by Tp3.
- confs/ if any configuration is needed and could be changed without rebuilding the docker image, it should stay here
- meta/ queries run based on some parameters (e.g., number of iterations, max BFS hops, etc.), those are stored here
- presampled/ queries require some node id, edge id, or label from the dataset, those are stored here. A file .json for every dataset, and a file for every graph db to store Local IDs (LIDs), both generated by sampler.groovy. The files are json serialization of the arrays of randomly selected nodes. This is done in order to provide consistency between experiments (different runs) and to have comparable tests when running again different databases. The file named samples_[DatasetName] contains the set of nodes, edges and labels chosen from the specific dataset. Then we store a file named lids_[DatasetName]_[DBName]_[MD5] which contains the internal reference (Internal IDs) of each node/edge chosen, these are unique internal identifier for the same nodes/edges but assigned by each database.
- data/ loaders expect datasets to be here in GraphSONMode.EXTENDED format which is readable both by Tp2 and Tp3 systems.
- tp2/ Tinkerpop 2 queries, almost all databases support this
- tp3/ Tinkerpop 3 queries, a different version of groovy requires functions implemented differently, and some methods names/signatures have changed in this version

About Query implementations

Once the container is created, the database engine are started (if required) by an init script in the docker image (see images/init), eventually the main script (execute.sh) is invoked.

The main script is responsible for the query creation and execution: first it creates an empty file within the image at /tmp/query, then executes header.groovy.sh which accordingly to the current environment variables injects the proper headers (import, functions, ecc.).

Then the content of the query file, located at $QUERY is appended to the file. With the only exception of the loading query: when working in native loading mode the loader.groovy file is not appended; furthermore the content of the sampler.groovy file is always appended to allow the ID -> LID mapping.

Finally gremlin.sh is invoked providing as argument the /tmp/query file and the command output is filtered according to the $DEBUG variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FILES.md

FILES.md

Directory Structure and File Tree

Overview

Details

About Query implementations

Files

FILES.md

Latest commit

History

FILES.md

File metadata and controls

Directory Structure and File Tree

Overview

Details

About Query implementations