CoreCache is a distributed key-value store designed for high performance and scalability.
CoreCache MVP includes the following features:
- Leader Election and Coordination: Managed through ZooKeeper.
- Data Handling: Reads and writes are processed through a leader node.
- Data Storage: Utilizes a Log-Structured Merge Tree (LSM Tree) for efficient data storage.
- API Support: Provides GET, PUT, and DELETE operations.
- Data Management: Memtables handle data until it is flushed to SSTable, and deletions are managed during compaction.
- Replicas: Data replication is not implemented in the MVP.
- Write-Ahead Logs (WAL): WAL is not included.
- Data is first read from and written to Memtables.
- Memtables are flushed to SSTables, at which point they are cleared.
- DELETE Operations: Data is marked for deletion and collected during compaction.
- Compaction Role: Handles updates and deletions by rewriting index and data files.
CoreCache requires the following dependencies:
- Kazoo: A library for interacting with ZooKeeper.
- dynaconf: Manages Python dependencies and configuration.
- Colima: Recommended for local development (alternative to Docker).
To run CoreCache locally:
- Start ZooKeeper: Use the following Docker command to run ZooKeeper in a container.
docker run --name some-zookeeper -p 2181:2181 --restart always -d zookeeper
- Connect to ZooKeeper: Use the following command to connect to the ZooKeeper container.
docker run -it --rm --link some-zookeeper:zookeeper zookeeper zkCli.sh -server zookeeper
Follow these steps to deploy CoreCache:
- Ensure that Python and pip are installed on your system.
- Download the CoreCache release.
curl -L -o corecache-0.11.tar.gz https://github.com/gtinside/distributed-key-value-store/archive/refs/tags/0.11.tar.gz
- Extract the downloaded file.
tar -xvzf corecache-0.11.tar.gz
- Run ZooKeeper: Ensure ZooKeeper is running.
- Navigate to the scripts directory.
cd distributed-key-value-store-0.11/scripts
- Start the CoreCache server.
start_server.sh --zooKeeperHost localhost --zooKeeperPort 2181
Here are some performance benchmarks:
Date | CoreCache Version | Number of Nodes | Configuration | Operation | Total Requests | Max Throughput | Avg Latency | p95 Latency | Detailed Report |
---|---|---|---|---|---|---|---|---|---|
09/16/2024 | v0.15 | 3 | AWS t2.micro | POST | 10K | 31.2 requests/sec | 3.53 ms | 12.2 ms | More Details |
CoreCache has few limitations that being actively addressed:
- Race Conditions: Potential issues when data is being inserted while the cache is being flushed to SSTable.
- Configuration Management: Configuration items such as data directory, port range, and flush conditions should be managed via a properties file.
- Data Retrieval: Only the searched key is made available in Memcache when retrieving data from SSTable.
- Single Leader: Only the leader node can insert data into the cache.
- MemTable Flush: This process stops the world, potentially halting data insertion during a flush.
- Index File Scanning: Empty MemTable requires scanning all index files to locate data, which could be optimized.
- Timestamp Accuracy: Timestamp on data should reflect when the key-value pair was first inserted.
- Dependency Management: Consider migrating to Poetry for improved dependency management.
- Error Handling: Implement proper error handling across all APIs.
- Pathlib Migration: Migrate file manipulations to
pathlib
. - Data Integrity: If a key marked as deleted (
deleted=true
) is not flushed before a node crash, the key remains undeleted. This can be mitigated with a Write-Ahead Log (WAL).