-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial release of SkyBench software suite.
- Loading branch information
0 parents
commit 6c93b92
Showing
35 changed files
with
715,950 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
**The MIT License (MIT)** | ||
|
||
Copyright (c) 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy of this | ||
software and associated documentation files (the "Software"), to deal in the Software | ||
without restriction, including without limitation the rights to use, copy, modify, | ||
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to | ||
permit persons to whom the Software is furnished to do so, subject to the following | ||
conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all copies | ||
or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, | ||
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A | ||
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT | ||
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION | ||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE | ||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
## SkyBench | ||
|
||
Version 1.1 | ||
|
||
© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh | ||
|
||
------------------------------------------- | ||
### Table of Contents | ||
|
||
* [Introduction](#introduction) | ||
* [Algorithms](#algorithms) | ||
* [Datasets](#datasets) | ||
* [Requirements](#requirements) | ||
* [Usage](#usage) | ||
* [License](#license) | ||
* [Contact](#contact) | ||
* [References](#references) | ||
|
||
|
||
------------------------------------ | ||
### Introduction | ||
|
||
The *SkyBench* software suite contains software for efficient main-memory | ||
computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and | ||
multi-core (i.e., multi-threaded) algorithms are included. | ||
|
||
[The skyline operator](https://en.wikipedia.org/wiki/Skyline_operator) [1] identifies | ||
so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the | ||
problem is often presented as | ||
[finding the silhouette of Manhattan](http://stackoverflow.com/q/1066234/2769271): | ||
if one has knows the position of the corner points of every building, what parts of | ||
which buildings are visible from across the river? | ||
The two-dimensional case is trivial to solve and not the focus of *SkyBench*. | ||
|
||
In higher dimensions, the problem is formalised with the concept of _dominance_: a point | ||
_p_ is _dominated by_ another point _q_ if _q_ has better or equal values for every | ||
attribute and the points are distinct. All points that are not dominated are part of | ||
the skyline. For example, if the points correspond to hotels, then any hotel that is | ||
more expensive, farther from anything of interest, and lower-rated than another choice | ||
would _not_ be in the skyline. In the table below, _Marge's Hotel_ is dominated by | ||
_Happy Hostel_, because it is more expensive, farther from Central Station, and lower | ||
rated, so it is not in the skyline. On the other hand, _The Grand_ has the best rating | ||
and _Happy Hostel_ has the best price. _Lovely Lodge_ does not have the best value for | ||
any one attribute, but neither _The Grand_ nor _Happy Hostel_ outperform it on every | ||
attribute, so it too is in the skyline and represents a good _balance_ of the attributes. | ||
|
||
|
||
|Name |Price per Night|Rating|Distance to Central Station|In skyline?| | ||
|:------------|--------------:|:----:|:-------------------------:|:---------:| | ||
|The Grand | $325| ⋆⋆⋆⋆⋆| 1.2km| ✓| | ||
|Marge's Motel| $55| ⋆⋆| 3.6km| | | ||
|Happy Hostel | $25| ⋆⋆⋆| 0.4km| ✓| | ||
|Lovely Lodge | $100| ⋆⋆⋆⋆| 8.2km| ✓| | ||
|
||
|
||
As the number of dimensions/attributes increases, so too does the size of and difficulty | ||
in producing the skyline. Parallel algorithms, such as those implemented here, quickly | ||
become necessary. | ||
|
||
*SkyBench* is released in conjunction with our recent ICDE paper [2]. All of the | ||
code and scripts necessary to repeat experiments from that paper are available in | ||
this software suite. To the best of our knowledge, this is also the first publicly | ||
released C++ skyline software, which will hopefully be a useful resource for the | ||
academic and industry research communities. | ||
|
||
|
||
------------------------------------ | ||
### Algorithms | ||
|
||
The following algorithms have been implemented in SkyBench: | ||
|
||
* **Hybrid** [2]: Located in [src/hybrid](src/hybrid). | ||
It is the state-of-the-art multi-core algorithm, based on two-level | ||
quad-tree partitioning of the data and memoisation of point-to-point | ||
relationships. | ||
|
||
* **Q-Flow** [2]: Located in [src/qflow](src/qflow). | ||
It is a simplification of Hybrid to demonstrate control flow. | ||
|
||
* **PSkyline** [3]: Located in [src/pskyline](src/pskyline). | ||
It was the previous state-of-the-art multi-core algorithm, based | ||
on a divide-and-conquer paradigm. | ||
|
||
* **BSkyTree** [4]: Located in [src/bskytree](src/bskytree). | ||
It is the state-of-the-art sequential algorithm, based on a | ||
quad-tree partitioning of the data and memoisation of point-to-point | ||
relationships. | ||
|
||
All four algorithms are implementations of the common interface defined in | ||
[common/skyline_i.h](common/skyline_i.h) and use common dominance tests from | ||
[common/common.h](common/common.h) and [common/dt_avx.h](common/dt_avx.h) | ||
(the latter when vectorisation is enabled). | ||
|
||
------------------------------------ | ||
### Datasets | ||
|
||
For reproducibility of the experiments in [2], we include three datasets. | ||
The [WEATHER](workloads/elv_weather-U-15-566268.csv) dataset was originally obtained from | ||
[The University of East Anglia Climatic Research Unit](http://www.cru.uea.ac.uk/cru/data/hrg/tmc) | ||
and preprocessed for skyline computation. | ||
We also include two classic skyline datasets, exactly as used in [2]: | ||
[NBA](workloads/nba-U-8-17264.csv) and | ||
[HOUSE](workloads/house-U-6-127931.csv). | ||
|
||
The synthetic workloads can be generated with the standard benchmark skyline | ||
data generator [1] hosted on | ||
[pgfoundry](http://pgfoundry.org/projects/randdataset). | ||
|
||
|
||
------------------------------------ | ||
### Requirements | ||
|
||
*SkyBench* depends on the following applications: | ||
|
||
* A C++ compiler that supports C++11 and OpenMP (e.g., the newest | ||
[GNU compiler](https://gcc.gnu.org/)) | ||
|
||
* The GNU `make` program | ||
|
||
* AVX or AVX2 if vectorised dominance tests are to be used | ||
|
||
|
||
------------------------------------ | ||
### Usage | ||
|
||
To run, the code needs to be compiled with the given number of dimensions.^ | ||
For example, to compute the skyline of the 8-dimensional NBA data set located | ||
in `workloads/nba-U-8-17264.csv`, do: | ||
|
||
> make all DIMS=8 | ||
> | ||
> ./bin/SkyBench -f workloads/nba-U-8-17264.csv | ||
By default, it will compute the skyline with all algorithms. Running `./bin/SkyBench` | ||
without parameters will provide more details about the supported options. | ||
|
||
You can make use of the provided shell script (`/script/runExp.sh`) that does all of | ||
the above automatically. For details, execute: | ||
> ./script/runExp.sh | ||
To reproduce the experiment with real datasets (Table II in [2]), do (assuming | ||
a 16-core machine): | ||
> ./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid" | ||
^For performance reasons, skyline implementations that we obtained from other | ||
authors compile their code for a specific number of dimensions. For a fair | ||
comparison, we adopted the same approach. | ||
|
||
|
||
------------------------------------ | ||
### License | ||
|
||
This software is subject to the terms of | ||
[The MIT License](http://opensource.org/licenses/MIT), | ||
which [has been included in this repository](LICENSE.md). | ||
|
||
|
||
------------------------------------ | ||
### Contact | ||
|
||
This software suite will be expanded soon with new algorithms; so, you are | ||
encouraged to ensure that this is still the latest version. Please do not | ||
hesitate to contact the authors if you have comments, questions, or bugs to report. | ||
>[SkyBench on GitHub](https://github.com/sean-chester/SkyBench) | ||
|
||
------------------------------------ | ||
### References | ||
|
||
1. | ||
S. Börzsönyi, D. Kossmann, and K. Stocker. | ||
(2001) | ||
"The Skyline Operator." | ||
In _Proceedings of the 17th International Conference on Data Engineering (ICDE 2001)_, | ||
421--432. | ||
http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf | ||
|
||
2. | ||
S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh. | ||
(2015) | ||
"Scalable parallelization of skyline computation for multi-core processors." | ||
In _Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015)_, | ||
1083--1094. | ||
http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf | ||
|
||
3. | ||
H. Im, J. Park, and S. Park. | ||
(2011) | ||
"Parallel skyline computation on multicore architectures." | ||
_Information Systems_ 36(4): | ||
808--823. | ||
http://dx.doi.org/10.1016/j.is.2010.10.005 | ||
|
||
4. | ||
J. Lee and S. Hwang. | ||
(2014) | ||
"Scalable skyline computation using a balanced pivot selection technique." | ||
_Information Systems_ 39: | ||
1--21. | ||
http://dx.doi.org/10.1016/j.is.2013.05.005 | ||
|
||
------------------------------------ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
############################################################ | ||
# Makefile for Benchmarking Skyline Algorithms # | ||
# Darius Sidlauskas ([email protected]) # | ||
# Sean Chester ([email protected]) # | ||
# Copyright (c) 2014 Aarhus University # | ||
############################################################ | ||
|
||
RM = rm -rf | ||
MV = mv | ||
CP = cp -rf | ||
CC = g++ | ||
|
||
TARGET = $(OUT)/SkyBench | ||
|
||
SRC = $(wildcard src/util/*.cpp) \ | ||
$(wildcard src/common/*.cpp) \ | ||
$(wildcard src/bskytree/*.cpp) \ | ||
$(wildcard src/pskyline/*.cpp) \ | ||
$(wildcard src/qflow/*.cpp) \ | ||
$(wildcard src/hybrid/*.cpp) \ | ||
$(wildcard src/*.cpp) | ||
|
||
OBJ = $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.o))) | ||
|
||
OUT = bin | ||
|
||
LIB_DIR = # used as -L$(LIB_DIR) | ||
INCLUDES = -I ./src/ | ||
|
||
LIB = | ||
|
||
# Forces make to look these directories | ||
VPATH = src:src/util:src/bskytree:src/pskyline:src/qflow:src/hybrid:src/common | ||
|
||
DIMS=6 | ||
V=VERBOSE | ||
DT=0 | ||
PROFILER=0 | ||
|
||
# By default compiling for performance (optimal) | ||
CXXFLAGS = -O3 -m64 -DNDEBUG\ | ||
-DNUM_DIMS=$(DIMS) -D$(V) -DCOUNT_DT=$(DT) -DPROFILER=$(PROFILER)\ | ||
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \ | ||
-Wcast-qual -Wcast-align \ | ||
-std=c++0x -fopenmp -mavx | ||
|
||
LDFLAGS=-m64 -lrt -fopenmp | ||
|
||
# Target-specific Variable values: | ||
# Compile for debugging (works with valgrind) | ||
dbg : CXXFLAGS = -O0 -g3 -m64\ | ||
-DNUM_DIMS=$(DIMS) -DVERBOSE -DCOUNT_DT=0 -DPROFILER=1\ | ||
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \ | ||
-Wcast-qual -Wcast-align -std=c++0x | ||
dbg : all | ||
|
||
# All Target | ||
all: $(TARGET) | ||
|
||
# Tool invocations | ||
$(TARGET): $(OBJ) $(LIB_DIR)$(LIB) | ||
@echo 'Building target: $@ (GCC C++ Linker)' | ||
$(CC) -o $(TARGET) $(OBJ) $(LDFLAGS) | ||
@echo 'Finished building target: $@' | ||
@echo ' ' | ||
|
||
$(OUT)/%.o: %.cpp | ||
@echo 'Building file: $< (GCC C++ Compiler)' | ||
$(CC) $(CXXFLAGS) $(INCLUDES) -c -o"$@" "$<" | ||
@echo 'Finished building: $<' | ||
@echo ' ' | ||
|
||
clean: | ||
-$(RM) $(OBJ) $(TARGET) $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.d))) | ||
-@echo ' ' | ||
|
||
deepclean: | ||
-$(RM) bin/* | ||
-@echo ' ' | ||
|
||
|
||
.PHONY: all clean deepclean dbg tests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Note that these scripts are setup to be run from the parent directory and correspond to | ||
the scripts used to generate the results in our ICDE 2015 paper. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/bin/bash | ||
|
||
./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -d 12\ | ||
-c "500000 1000000 2000000 4000000 8000000"\ | ||
-s "bskytree hybrid" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#!/bin/bash | ||
|
||
./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -c 1000000\ | ||
-d "2 4 6 8 10 12 14 16 18 20 22 24"\ | ||
-s "bskytree hybrid" | ||
|
Oops, something went wrong.