This README describes how to reproduce the paper named "PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer Replacement". If you have any questions about it, please contact us. We will be happy to hear that and respond to you joyfully.
This project contains three major parts: PreVision source codes, evaluation scripts for all systems, and a data generator. The source codes and scripts are tuned for the current directory structure. Thus, please do not change the directory structure. The project structure is as follows.
.
├── README.md # This File
├── plot # Scripts for Plotting
├── buffertile # PreVision Buffer Manager
├── lam_executor # PreVision Executor
├── linear_algebra_module # PreVision Linear Algebra Operators
├── evaluation # Evaluation Scripts
├── makeall.sh # Build PreVision. See `./evaluation/prevision/exec_eval`.
├── slab-benchmark # Data Generator (from the SLAB Benchmark: https://adalabucsd.github.io/slab.html)
├── tilechunk # PreVision Buffer Wrapper
└── tilestore # PreVision I/O Manager
Now we are going to go through the below steps.
- Install prerequisites
- Data generation
- Data loading for some systems and evaluation
We recommend using Ubuntu as the operating system. We have tested on Ubuntu 18.04 and 20.04.
Since some systems throw an out-of-memory error when preparing datasets, we recommend using a machine with a large memory size. Also, since the size of datasets that will be generated for the evaluation is quite huge, please make sure that you have at least 3 terabytes of free disk space.
Before install comparsion systems, please install OpenBLAS 0.3.0.
Download Spark 3.3.2 and un-compress it to the current directory.
Please make sure that add the absolute path of ./spark-3.3.2-bin-hadoop3/bin/
to the PATH
environmental variable.
Please install Java 11 before installing SystemDS because the system requires it.
Download SystemDS 3.1.0 and un-compress it to the current directory (the root directory of the above directory structure). Please make sure the following.
- Add the absolute path of
./systemds-3.1.0-bin/bin/
to thePATH
environmental variable. - Add the absolute path of
./systemds-3.1.0-bin/
to theSYSTEMDS_ROOT
environmental variable.
MADlib is a machine learning extension of PostgreSQL. Thus, you need to install PostgreSQL first, and then install MADlib.
Install PostgreSQL 12.14 with plpython support (--with-python
) on your computer.
After that, following the MADlib installation guide, install MADlib 1.21.0 with schema madlib
.
If you install MADlib from source code, please make sure the Postgres installation is detected during initializing a cmake build.
Since SciDB changed to closed-source software, the latest version we can use is 19.11. We provide a docker image with SciDB 19.11. We recommend to use this SciDB docker image.
Please change PREVISION_PATH
and DB_PATH
to your one.
Since DB_PATH
will be used for SciDB database path, make sure that there is enough disk space.
After configuration, please run the following commands.
PREVISION_PATH=<PREVISION_REPOSITORY_PATH_IN_HOST>
DB_PATH=<DB_PATH_FOR_SCIDB>
sudo docker run --name prevision-scidb-exp -dit --shm-size=30gb -v $PREVISION_PATH:/prevision -v $DB_PATH:/dbpath grammaright/scidb:19.11-xenial
If you need to interact with SciDB manually, you must use a scidb
user (not a root user). If you run SciDB using the root account, SciDB would make an MPI error. The docker image has the scidb
user with a password is qwer1234
.
Now, we are going to generate matrices for experiments.
Please move your directory to ./slab-benchmark/prevision/
and open README.md
there.
Implementations and scripts for evaluation are stored in the evaluation
directory.
Please refer to the /evaluation/README.md
.
Scripts for plotting are stored in the plot
directory.
Please refer to the ./plot/README.md
.