Aerospike based implementation of Janusgraph storage backend. When to use: If you need horizontally scalable graph DB backed by Aerospike.
The main difference with other traditional backends (Canssandra, Berkeley) is that Aerospike does not support transactions.
On each commit Jansugraph writes batch of updates that should be applied to storage backend all together.
In other case graph may become inconsistent.
So we need to emulate transactional behaviour and not surprisingly made it via Write Ahead Log.
We use No Sql Batch Updater library to achieve this.
Prior to applying updates we save all batch as one record in Aerospike separate namespace and remove this record after all updates being applied.
This allows WriteAheadLogCompleter
that runs on each node in separate thread to finish (with configured delay) all needed updates in case of some node had died in the middle of the batch.
Collects all locks that transaction needs and acquire them just in commit phase. Allows us to run all lock acquisitions in parallel. This approach caused Aerospike storage backend to be classified as optimisticLocking In terms of Janusgraph DB.
Janusgraph keeps vertex and all adjacent edges in one record.
That makes it sensitive to max record value size in key-value storage.
Aerospike record size is limited by 1Mb by default and can be increased up to 8Mb in
namespace configuration. It makes sens to configure WAL namespace to use maximum value (8Mb).
While emulating eventually consistent batch updates it is still possible to have dirty reads that may lead to some unwanted side effect like ghost vertices. You should try to avoid concurrent deletion and update of the same vertex. The best option is to use some external synchronization while doing such thing.
In our microservice architecture we run Janusgraph in embedded mode. This mode uses Janusgraph and Aerospike storage backend just as library to correctly access and persist graphs in Aerospike.
It allows our services to:
- communicate with Janusgraph in the same JVM with minimal overheads
- scale up/down Janusgraph together with the service
- Add dependency to Aerospike storage backend to your project
<dependency>
<groupId>com.playtika.janusgraph</groupId>
<artifactId>aerospike-storage-backend</artifactId>
</dependency>
- Instantiate JanusGraph
ModifiableConfiguration config = buildGraphConfiguration();
config.set(STORAGE_HOSTS, new String[]{aerospikeHost}); //Aerospike host
config.set(STORAGE_PORT, container.getMappedPort(aerospikePort));
config.set(STORAGE_BACKEND, "com.playtika.janusgraph.aerospike.AerospikeStoreManager");
config.set(NAMESPACE, aerospikeNamespace);
config.set(WAL_NAMESPACE, walNamespace); //Aspike namespace to use for Write Ahead Log
config.set(GRAPH_PREFIX, "test"); //used as prefix for Aspike sets. Allows to run several graphs in one Aspike namespace
//!!! need to prevent small batches mutations as we use deferred locking approach !!!
config.set(BUFFER_SIZE, AEROSPIKE_BUFFER_SIZE);
config.set(TEST_ENVIRONMENT, true); //# whether we should use durable deletes (not available in community version of Aspike)
config.set(ConfigOptions.SCAN_PARALLELISM, 1); //allow tu run scans in single thread only
JanusGraph graph = JanusGraphFactory.open(config);
- Run Gremlin queries
graph.traversal().V().has("name", "jupiter")
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
aerospike | thrpt | 30 | 0.106 | ± 0.004 | ops/s |
cassandra | thrpt | 30 | 0.008 | ± 0.001 | ops/s |
This benchmark was run using standard 'cassandra:3.11' docker image and custom aerospike image that doesn't keep any data in memory. https://github.com/kptfh/aerospike-server.docker
To run benchmarks and test on your local machine you just need to have docker installed.