Source code for paper ``Random Sampling over Joins Revisited'' (SIGMOD '18).
The main memory implementation is located in mmJoin. Create makefiles using cmake. The code is written for a C++11 compiler.
- cd mmJoin
- install cmake, make, gcc, g++ (>= 4.8)
- mkdir build
- cd build
- cmake -DCMAKE_BUILD_TYPE=Release .. (or Debug if you want to debug it using gdb)
- make
Suppose ${CWD} is the work directory and ${SF} is the scale factor of the dataset (TPC-H, TPC-DS). Note that symbolic links should also work.
- TPC-H. Place generated *.tbl in ${CWD}/${SF}x/. Rename lineitem.tbl to lineitem_skew.tbl (or change the file path in mmJoin/src/generic_sample_test.cpp).
- Twitter graph data from here. // TODO upload the data preparation tools
- TPC-DS. Place generated *.dat in ${CWD}/tpcds_${SF}x/.
-
cd mmJoin/build/src
-
./generic_sample_test [args] Use --help and refer to mmJoin/src/generic_sample_test.cpp for help. Note: TPC-H queries are named TCPX (X='3', 'X', 'Y') in the source file due to a misspelling.
-
For TPC-DS run:
./tpcds_sample_test [args]
It requires __int128 support from GCC >= 4.6.
This repository is shared for academic purpose only. Any one can use, copy, modify and distribute the software, provided this notice is also included. We disclaim any warranty for any particular purpose nor shall we be liable to any one for using the code.
8/26/20: Add two sample general acyclic queries on the TPC-DS dataset.