Project work for the course Advanced Systems Lab, FS2021 @ETHZ
- Mark Frey
- Beat Hubmann
- Martin Tschechne
- Robin Worreby
Create the build directory and run cmake and then make. The optimization level can be set from the command line using
cmake -D CMAKE_CXX_FLAGS="-O3"
This will generate two files, unit_tests
and hdbscan
. For testing just execute unit_tests
.
hdbscan
takes 6 required and 6 optional command line arguments. The minimal command to run is
./hdbscan -points_file=../data/data_set_medium.csv -hierarchy_file=output/hierarchy.csv -tree_file=output/tree.csv -visualization_file=output/results.vis -partition_file=output/partition.csv -outlier_score_file=output/outlier_scores.csv -num_points 1000 -num_dimension 2
The optional arguments are
constraints
path to the constraints csvmin_pts
minimum number of points (default: 8)min_cl_size
minimal cluster size (default: 8)compact
Whether or not to compact the output (default: true)dist_function
Which distance function to use (default: euclidean)optimization_level
Which optimization level to use (default: no_optimization)
This information can also be accessed by executing
hdbscan -help
Different build variants (compiler optimizations) can be added in CMakeLists. To create a new build variant add
build_hdbscan_benchmarking("-O3")
to CMakeLists and change -O3
to whatever flag you want. All builds will be written to build/benchmarking
with name hdbscan<OPTIMIZATION FLAG>
(without <>). These executables can then e.g. be used by a python script to create the plots.
From the plotting folder execute
python3 plot_mst_benchmark.py --points_file=../data/unit_tests/example_high_dim_10000_32.csv --num_points=10000 --num_dimension=32 --recompute --output_name=results_10k --title_heatmap="Relative speed-up 10k points, 32 dimensions" --title_bar="Cycles by optimization, 10k points, 32 dimensions" --disable_nobitset --num_runs=2
Adapt the num_runs parameter to taste
In order to benchmark with LIKWID you need to checkout out the 82-benchmark-with-likwid-v2
branch.
Depending on your setup you might have to adjust the MSR rights on your machine, which can be done the following way:
sudo modprobe msr
An example on how to run LIKWID is given in run_files/run_likwid.sh
.
For further information on LIKWID you can check the official documentation here.
Raw distance benchmarking results
Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
Accelerated Hierarchical Density Clustering