CAVE is a graph processing engine for storing, accessing, and performing graph analytics on SSDs. CAVE considers SSD’s supported concurrency through its internal parallelism as a key property to exploit and it does so via issuing carefully tuned concurrent I/Os to the graphs stored on a single SSD. CAVE adopts a natural blocked file format based on adjacency lists and uses a concurrent cache pool for data blocks to provide ease of implementation of different algorithms.
The paper: https://dl.acm.org/doi/pdf/10.1145/3654928.
Goto ./cpp
folder, then run make
for building.
cd cpp
make
Clang compiler is specified by default for its better support and performance of multi-threading. Using GCC is okay but we won't guarantee the performance.
We tested in the following configurations:
- RedHat Linux 4.9, Clang 11, with EXT4 file system.
- Windows 11, GCC 13 and 14 toolchain using MinGW-w64, with NTFS file system.
We developed a parser
to read common graph data into our binary file structure. It provides simple support for standard adjacent list, edge list files in plain text format, as well as binary adjacency and edge list for compact storage and faster parsing.
Please check /scripts/graph_convert.py
, install and read documentation of NetworKit package about how to convert other data formats and make a binary file from plain texts. We recommend first convert the input graph to binadj
or binedge
format for the best parsing performance.
Then run parser to pre-process the data file.
./bin/parser <input_data_path> -format (adjlist/edgelist/binedge/binadj)
Data files with suffix .adjlist
, .edgelist
, .binadj
and .binedge
will be automatically detected. Otherwise please indicate the file format by the -format
argument.
After compilation, /bin
will include executables of algorithms to be tested. We provide BFS, DFS, WCC, PageRank, and Random Walk algorithms out of the box. You can edit and implement your algorithms following similar codes.
Our executables supports the following arguments for benchmark usages.
./bin/[algo] <parsed_data_path> (cache/thread) [args]
The following are parameters for the benchmark.
-
For cache tests, there's only one argument in [0,1,2,3] states that which cache size list you want to run:
- 0: Only 1024MB. For sanity check.
- 1: [1,2,3,4,5,10,25,50]. Suggest use for small datasets < 50MB.
- 2: [20,40,60,80,100,200,500,1000]. For dataset like soc-LiveJounal1 sized ~1GB.
- 3: [128,256,...,16384]. For very large dataset like com-Friendster.
-
For thread tests, 3 arguments are available.
- The first one is the minimum number of threads, the second one is the maximum. It tests from the minimum by the power of 2 to the maximum.
- The third one is optional, for specifying cache size (in MB) for all tests. By default it is 1024MB.
Test results will be put in log
folder in csv
format.
-
Benchmark BFS algorithm on CA-GrQc dataset.
# Parse data ./bin/parser ../data/CA-GrQc.txt -format edgelist # Benchmark ./bin/bfs ../data/CA-GrQc.bin thread 1 256 ./bin/bfs ../data/CA-GrQc.bin cache 0
-
Benchmark WCC algorithm on soc-LiveJournal1 dataset.
# Parse data ./bin/parser ../data/soc-LiveJournal1.binadj # Benchmark ./bin/wcc ../data/soc-LiveJournal1.bin cache 1
Stanford Large Network Dataset Collection for Friendster, RoadNet, LiveJournal, and YouTube dataset.
LDBC Graph Analytics Benchmark for the Twitter-mpi dataset.
We also provide a simple random graph generator in /scripts/graph_gen.py
also based on NetworKit.
The benchmark output will be stored in /log
folder with naming scheme [data]_[algo]_[testcase].csv
. The columns are
- algo_name: Name of the algorithm.
- thread: Number of threads used.
- cache_mb: Size of cache pool used (in megabytes)
- time: Running time (in microsecond).
- res: Output of algorithms. 0/1 for searching algorithms and a number for WCC or PageRank.
Then you can pick your favorite way to process, compare and plot the results. We use the popular Matplotlib Python library to create figures in our paper. A Python notebook /scripts/example_plot.ipynb
can be used a start.
We implemented parallel version of breadth-first search (BFS), PageRank, weakly connected components (WCC), and random walk algorithms. We also tested to implement a parallel pseudo DFS algorithm with introductions below. All the codes of algorithms are in /algorithm
, and can be executed and benchmarked by running corresponding executables.
While DFS is inherently a serialized algorithm, it is possible to enhance its performance by introducing parallelism through unordered or pseudo depth-first search technique.
We take inspiration from this idea and we incorporate a mechanism to monitor the size of the vertex stack for each thread in our implementation. After visiting the neighbors of a vertex, we check if the size of the stack exceeds a predefined threshold. If it does, the stack is evenly divided into two smaller stacks, and one of these stacks is assigned to a new thread for further exploration.
-
GraphChi: Large-Scale Graph Computation on Just a PC.
- Code: https://github.com/GraphChi/graphchi-cpp
- See wiki here for command parameters. Use
membudget_mb
to limit cache size andexecthreads
for number of threads.
-
GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning.
- Code: https://github.com/thu-pacman/GridGraph
- The compiled binary file takes
[memory budget]
parameter for setting cache size. The number of threads can be controlled by changing value ofparallelism
in the source file.
-
Mosaic: Processing a Trillion-Edge Graph on a Single Machine.
- Code: https://github.com/sslab-gatech/mosaic
- Edit the config file.
SG_RB_SIZE_HOST_TILES
for cache size andSG_NPROCESSOR
for threads.
Thread pool library comes from BS::thread-pool, a fast, lightweight, and easy-to-use C++17 thread pool library.
To support parallel hashmap used in cache pool, we use The Parallel Hashmap, a set of excellent hash map implementations, as well as a btree alternative to std::map and std::set.
Unordered parallel DFS refers to ideas in A work-efficient algorithm for parallel unordered depth-first search.