Skip to content

Commit

Permalink
Initial release of SkyBench software suite.
Browse files Browse the repository at this point in the history
  • Loading branch information
sean-chester committed Jan 21, 2016
0 parents commit 6c93b92
Show file tree
Hide file tree
Showing 35 changed files with 715,950 additions and 0 deletions.
20 changes: 20 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
**The MIT License (MIT)**

Copyright (c) 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be included in all copies
or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
202 changes: 202 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
## SkyBench

Version 1.1

© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh

-------------------------------------------
### Table of Contents

* [Introduction](#introduction)
* [Algorithms](#algorithms)
* [Datasets](#datasets)
* [Requirements](#requirements)
* [Usage](#usage)
* [License](#license)
* [Contact](#contact)
* [References](#references)


------------------------------------
### Introduction

The *SkyBench* software suite contains software for efficient main-memory
computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and
multi-core (i.e., multi-threaded) algorithms are included.

[The skyline operator](https://en.wikipedia.org/wiki/Skyline_operator) [1] identifies
so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the
problem is often presented as
[finding the silhouette of Manhattan](http://stackoverflow.com/q/1066234/2769271):
if one has knows the position of the corner points of every building, what parts of
which buildings are visible from across the river?
The two-dimensional case is trivial to solve and not the focus of *SkyBench*.

In higher dimensions, the problem is formalised with the concept of _dominance_: a point
_p_ is _dominated by_ another point _q_ if _q_ has better or equal values for every
attribute and the points are distinct. All points that are not dominated are part of
the skyline. For example, if the points correspond to hotels, then any hotel that is
more expensive, farther from anything of interest, and lower-rated than another choice
would _not_ be in the skyline. In the table below, _Marge's Hotel_ is dominated by
_Happy Hostel_, because it is more expensive, farther from Central Station, and lower
rated, so it is not in the skyline. On the other hand, _The Grand_ has the best rating
and _Happy Hostel_ has the best price. _Lovely Lodge_ does not have the best value for
any one attribute, but neither _The Grand_ nor _Happy Hostel_ outperform it on every
attribute, so it too is in the skyline and represents a good _balance_ of the attributes.


|Name |Price per Night|Rating|Distance to Central Station|In skyline?|
|:------------|--------------:|:----:|:-------------------------:|:---------:|
|The Grand | $325| ⋆⋆⋆⋆⋆| 1.2km||
|Marge's Motel| $55| ⋆⋆| 3.6km| |
|Happy Hostel | $25| ⋆⋆⋆| 0.4km||
|Lovely Lodge | $100| ⋆⋆⋆⋆| 8.2km||


As the number of dimensions/attributes increases, so too does the size of and difficulty
in producing the skyline. Parallel algorithms, such as those implemented here, quickly
become necessary.

*SkyBench* is released in conjunction with our recent ICDE paper [2]. All of the
code and scripts necessary to repeat experiments from that paper are available in
this software suite. To the best of our knowledge, this is also the first publicly
released C++ skyline software, which will hopefully be a useful resource for the
academic and industry research communities.


------------------------------------
### Algorithms

The following algorithms have been implemented in SkyBench:

* **Hybrid** [2]: Located in [src/hybrid](src/hybrid).
It is the state-of-the-art multi-core algorithm, based on two-level
quad-tree partitioning of the data and memoisation of point-to-point
relationships.

* **Q-Flow** [2]: Located in [src/qflow](src/qflow).
It is a simplification of Hybrid to demonstrate control flow.

* **PSkyline** [3]: Located in [src/pskyline](src/pskyline).
It was the previous state-of-the-art multi-core algorithm, based
on a divide-and-conquer paradigm.

* **BSkyTree** [4]: Located in [src/bskytree](src/bskytree).
It is the state-of-the-art sequential algorithm, based on a
quad-tree partitioning of the data and memoisation of point-to-point
relationships.

All four algorithms are implementations of the common interface defined in
[common/skyline_i.h](common/skyline_i.h) and use common dominance tests from
[common/common.h](common/common.h) and [common/dt_avx.h](common/dt_avx.h)
(the latter when vectorisation is enabled).

------------------------------------
### Datasets

For reproducibility of the experiments in [2], we include three datasets.
The [WEATHER](workloads/elv_weather-U-15-566268.csv) dataset was originally obtained from
[The University of East Anglia Climatic Research Unit](http://www.cru.uea.ac.uk/cru/data/hrg/tmc)
and preprocessed for skyline computation.
We also include two classic skyline datasets, exactly as used in [2]:
[NBA](workloads/nba-U-8-17264.csv) and
[HOUSE](workloads/house-U-6-127931.csv).

The synthetic workloads can be generated with the standard benchmark skyline
data generator [1] hosted on
[pgfoundry](http://pgfoundry.org/projects/randdataset).


------------------------------------
### Requirements

*SkyBench* depends on the following applications:

* A C++ compiler that supports C++11 and OpenMP (e.g., the newest
[GNU compiler](https://gcc.gnu.org/))

* The GNU `make` program

* AVX or AVX2 if vectorised dominance tests are to be used


------------------------------------
### Usage

To run, the code needs to be compiled with the given number of dimensions.^
For example, to compute the skyline of the 8-dimensional NBA data set located
in `workloads/nba-U-8-17264.csv`, do:

> make all DIMS=8
>
> ./bin/SkyBench -f workloads/nba-U-8-17264.csv
By default, it will compute the skyline with all algorithms. Running `./bin/SkyBench`
without parameters will provide more details about the supported options.

You can make use of the provided shell script (`/script/runExp.sh`) that does all of
the above automatically. For details, execute:
> ./script/runExp.sh
To reproduce the experiment with real datasets (Table II in [2]), do (assuming
a 16-core machine):
> ./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid"
^For performance reasons, skyline implementations that we obtained from other
authors compile their code for a specific number of dimensions. For a fair
comparison, we adopted the same approach.


------------------------------------
### License

This software is subject to the terms of
[The MIT License](http://opensource.org/licenses/MIT),
which [has been included in this repository](LICENSE.md).


------------------------------------
### Contact

This software suite will be expanded soon with new algorithms; so, you are
encouraged to ensure that this is still the latest version. Please do not
hesitate to contact the authors if you have comments, questions, or bugs to report.
>[SkyBench on GitHub](https://github.com/sean-chester/SkyBench)

------------------------------------
### References

1.
S. Börzsönyi, D. Kossmann, and K. Stocker.
(2001)
"The Skyline Operator."
In _Proceedings of the 17th International Conference on Data Engineering (ICDE 2001)_,
421--432.
http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf

2.
S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh.
(2015)
"Scalable parallelization of skyline computation for multi-core processors."
In _Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015)_,
1083--1094.
http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf

3.
H. Im, J. Park, and S. Park.
(2011)
"Parallel skyline computation on multicore architectures."
_Information Systems_ 36(4):
808--823.
http://dx.doi.org/10.1016/j.is.2010.10.005

4.
J. Lee and S. Hwang.
(2014)
"Scalable skyline computation using a balanced pivot selection technique."
_Information Systems_ 39:
1--21.
http://dx.doi.org/10.1016/j.is.2013.05.005

------------------------------------
82 changes: 82 additions & 0 deletions makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
############################################################
# Makefile for Benchmarking Skyline Algorithms #
# Darius Sidlauskas ([email protected]) #
# Sean Chester ([email protected]) #
# Copyright (c) 2014 Aarhus University #
############################################################

RM = rm -rf
MV = mv
CP = cp -rf
CC = g++

TARGET = $(OUT)/SkyBench

SRC = $(wildcard src/util/*.cpp) \
$(wildcard src/common/*.cpp) \
$(wildcard src/bskytree/*.cpp) \
$(wildcard src/pskyline/*.cpp) \
$(wildcard src/qflow/*.cpp) \
$(wildcard src/hybrid/*.cpp) \
$(wildcard src/*.cpp)

OBJ = $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.o)))

OUT = bin

LIB_DIR = # used as -L$(LIB_DIR)
INCLUDES = -I ./src/

LIB =

# Forces make to look these directories
VPATH = src:src/util:src/bskytree:src/pskyline:src/qflow:src/hybrid:src/common

DIMS=6
V=VERBOSE
DT=0
PROFILER=0

# By default compiling for performance (optimal)
CXXFLAGS = -O3 -m64 -DNDEBUG\
-DNUM_DIMS=$(DIMS) -D$(V) -DCOUNT_DT=$(DT) -DPROFILER=$(PROFILER)\
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \
-Wcast-qual -Wcast-align \
-std=c++0x -fopenmp -mavx

LDFLAGS=-m64 -lrt -fopenmp

# Target-specific Variable values:
# Compile for debugging (works with valgrind)
dbg : CXXFLAGS = -O0 -g3 -m64\
-DNUM_DIMS=$(DIMS) -DVERBOSE -DCOUNT_DT=0 -DPROFILER=1\
-Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \
-Wcast-qual -Wcast-align -std=c++0x
dbg : all

# All Target
all: $(TARGET)

# Tool invocations
$(TARGET): $(OBJ) $(LIB_DIR)$(LIB)
@echo 'Building target: $@ (GCC C++ Linker)'
$(CC) -o $(TARGET) $(OBJ) $(LDFLAGS)
@echo 'Finished building target: $@'
@echo ' '

$(OUT)/%.o: %.cpp
@echo 'Building file: $< (GCC C++ Compiler)'
$(CC) $(CXXFLAGS) $(INCLUDES) -c -o"$@" "$<"
@echo 'Finished building: $<'
@echo ' '

clean:
-$(RM) $(OBJ) $(TARGET) $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.d)))
-@echo ' '

deepclean:
-$(RM) bin/*
-@echo ' '


.PHONY: all clean deepclean dbg tests
2 changes: 2 additions & 0 deletions scripts/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Note that these scripts are setup to be run from the parent directory and correspond to
the scripts used to generate the results in our ICDE 2015 paper.
5 changes: 5 additions & 0 deletions scripts/cardTest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -d 12\
-c "500000 1000000 2000000 4000000 8000000"\
-s "bskytree hybrid"
6 changes: 6 additions & 0 deletions scripts/dimTest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

./runExp.sh -p -i ./workloads -t "1 2 4 8" -x "C E A" -c 1000000\
-d "2 4 6 8 10 12 14 16 18 20 22 24"\
-s "bskytree hybrid"

Loading

0 comments on commit 6c93b92

Please sign in to comment.