neighbor_hash_map

This repo contains codes and steps necessary to reproduce the artifacts for research paper titled CIKM'24: An Enhanced Batch Query Architecture in Real-time Recommendation.

Setting up the hardware

We conducted tests in the paper using Intel(R) Xeon(R) Gold 5218 CPUs. In fact, any hardware that supports AVX512 can be used for reproduction, but the results may not necessarily match the results presented in the paper.

Software Requirements

Linux x86_64 kernel >= 4.9.0
Hugepage (2 MiB) supported and has been configured, for example: echo 8192 > /proc/sys/vm/nr_hugepages
Clang >= 16.0.6 or GCC >= 11.4.0
Bazel >= 6.3.0
Tested on Ubuntu 22.04

Build benchmark

git clone https://github.com/slow-steppers/NeighborHash
cd NeighborHash

When using clang

bazel build --config=clang --config=neighbor_hugepage --config=neighbor_simd //testing:all

Or, when using gcc

bazel build --config=neighbor_hugepage --config=neighbor_simd //testing:all

Run benchmarks

Our benchmark implementation utilizes the google benchmark framework, which allows selecting benchmarks to run using regular expressions. The regular expression for running random access is: BM_RandomAccess<.*HashMapType.*>/DS/SQR/LF. For instance, to run benchmarks for all scalar hashmaps with a 1M DatasetSize, SQR=90, and LF=80, execute the following code:

numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<.*Scalar.*uniform>/1048576/90/79"

We have integrated the following components into our benchmark:

NeighborHash
std::unordered_map
LinearProbing
BucketingSIMD_16x16
ankerl::unordered_dense::map
absl::flat_hash_map
ska::bytell_hash_map
emhash7::HashMap
tsl::hopscotch_map
Native array (Random Access)

For ease of comparison, we have categorized each hashmap into the following classes: Scalar, IntraVec, Vec, MultiThreading. The candidate values for each component of the expression are as follows:

SQR: 30/50/90/100
LF: 79
DS: 1024/16384/131072/1048576/2097152/16777216/67108864/134217728
HashMapType: Scalar/Vec/IntraVec/MultiThreading/QFGO

The following presents the code for reproducing the results from the paper:

# all scalar hashmaps under all dataset size with SQR=90,LF=80
numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<.*Scalar.*>/.*/90/79"

# Neighbor/LinearProbing/RandomAccess with AMAC under all dataset size with SQR=90,LF=80
numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<(Neighbor|LinearProbing|Array).*uniform.*AMAC.*>/.*/90/79$"

# all intra-vectorization hashmaps under all dataset size with SQR=90,LF=80
numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<.*IntraVec.*>/.*/90/79"

# all vectorization hashmaps under all dataset size with SQR=90,LF=80
numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<.*Vec.*>/.*/90/79"

# NeighborHash with QFGO under all dataset size with SQR=90,LF=80
numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/hash_map_benchmark --benchmark_filter="BM_RandomAccess<.*QFGO.*>/.*/90/79"

To perform Mixed insertions and lookups, use the following code:

numactl --membind=0 --cpunodebind=0 ./bazel-bin/testing/mixed_multi_threading

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.distdir		.distdir
.github/workflows		.github/workflows
neighbor_hash		neighbor_hash
testing		testing
third_party		third_party
.bazelrc		.bazelrc
.clang-format		.clang-format
.gitignore		.gitignore
BUILD		BUILD
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neighbor_hash_map

Setting up the hardware

Software Requirements

Build benchmark

Run benchmarks

About

Releases 2

Packages

Contributors 5

Languages

License

slow-steppers/NeighborHash

Folders and files

Latest commit

History

Repository files navigation

neighbor_hash_map

Setting up the hardware

Software Requirements

Build benchmark

Run benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages