Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readme #23

Merged
merged 19 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 31 additions & 22 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,52 +11,61 @@ jobs:
name: Benchmark X86
runs-on: buildjet-2vcpu-ubuntu-2204

permissions:
contents: write

steps:
- uses: actions/checkout@v4

- name: Benchmark
run: cargo bench --bench throughput --features bench-plot
run: cargo bench --bench throughput --features 'bench-plot'

- name: Benchmark (AVX2)
run: cargo bench --bench throughput --features bench-plot avx2
- name: Switch to nightly rust
run: rustup default nightly

- name: Commit & Push Plots
uses: stefanzweifel/git-auto-commit-action@v5
- name: Benchmark AVX2 (nightly)
run: cargo bench --bench throughput --features 'bench-plot avx2'

- uses: actions/upload-artifact@v3
with:
commit_message: Update Benchmark Plots
file_pattern: '*.svg'
commit_user_name: Benchmark Bot
commit_user_email: [email protected]
commit_author: Author <[email protected]>
name: benches
path: benches/throughput/*.svg

benchmark-arm:
name: Benchmark ARM
needs: benchmark-x86
runs-on: buildjet-2vcpu-ubuntu-2204-arm

steps:
- uses: actions/checkout@v4

- name: Benchmark
run: cargo bench --bench throughput --features 'bench-plot'

- uses: actions/upload-artifact@v3
with:
name: benches
path: benches/throughput/*.svg

commit:
name: Commit & Push
runs-on: ubuntu-latest
needs: [benchmark-x86, benchmark-arm]

permissions:
contents: write

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Benchmark
run: cargo bench --bench throughput --features bench-plot
- name: Download Benchmark Results
uses: actions/download-artifact@v3
with:
name: benches
path: benches/throughput

- name: Commit & Push Plots
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: Update Benchmark Plots
commit_options: '--amend --no-edit'
push_options: '--force'
skip_fetch: true
file_pattern: '*.svg'
commit_user_name: Benchmark Bot
commit_user_email: [email protected]
commit_author: Author <actions@github.com>
commit_author: Benchmark Bot <benchmark-bot@noreply.com>

134 changes: 72 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,95 @@
# GxHash
![CI](https://github.com/ogxd/gxhash-rust/actions/workflows/rust.yml/badge.svg)
[![Build & Test](https://github.com/ogxd/gxhash/actions/workflows/build_test.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/build_test.yml)

Up to this date, the fastest non-cryptographic hashing algorithm 🚀 (see benchmarks)
Passes all [SMHasher](https://github.com/rurban/smhasher) quality tests ✅

#### What makes it so fast?
Here are the principal reasons:
- SIMD all the way (and usage of SIMD AES for efficient bit mixing)
- High ILP processing for large inputs
- Small bytecode for greater inlining opportunities
Checkout the [article](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more details.
GxHash is a [**blazingly fast**](#performance) and [**robust**](#robustness) non-cryptographic hashing algorithm.

## Usage
```
```bash
cargo add gxhash
```

Used directly as a hash function:
```rust
use gxhash::*;

// Used as a hashing function
let bytes = [42u8; 1000];
let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;
println!("Hash is {:x}!", gxhash::gxhash64(&bytes, seed));

// Used as an Hasher for faster HashSet/HashMap
let mut hashset = GxHashSet::default();
println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));
```
Used in `HashMap`/`HashSet`:
```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
```

## Compatibility
- ARM 64-bit using `NEON` intrinsics.
- x86-64 bit using `SSE2` + `AES` intrinsics.
- (optional and only on unstable toolchain) with `avx2` feature enabled, gxhash will use `AVX2` intrinsics, for up to twice as much performance for large inputs. Only compatible on `AVX2` enabled x86-64 platforms.
## Features

### Blazingly Fast 🚀
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).
See the [benchmarks](#benchmarks).

### Highly Robust 🗿
GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all [SMHasher](https://github.com/rurban/smhasher) tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

Check out the [paper](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more technical details.

## Portability

### Architecture Compatibility
GxHash is compatible with:
- X86 processors with `AES-NI` intrinsics
- ARM processors with `NEON` intrinsics
> **Warning**
> Other platforms are currently not supported (there is no fallback)
> Other platforms are currently not supported (there is no fallback). The behavior on these platforms is undefined.

### Hashes Stability
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. An exception to this is the AVX2 version of GxHash (nightly).

## Benchmarks

To run the benchmarks locally use one of the following:
```bash
# Benchmark throughput
cargo bench --bench throughput
# Benchmark performance of GxHash's Hasher when used in a HashSet
cargo bench --bench hashset
# Benchmark throughput and get output as a markdown table
cargo bench --bench throughput --features bench-md
# Benchmark throughput and get output as .svg plots
cargo bench --bench throughput --features bench-plot
```

GxHash is continuously benchmarked on X86 and ARM Github runners.
[![Benchmark](https://github.com/ogxd/gxhash/actions/workflows/bench.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/bench.yml)

**Lastest Benchmark Results:**
![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
![x86_64-avx2](./benches/throughput/x86_64-avx2.svg)

## Security

### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.
### Cryptographic Properties ❌
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how likely it is to be reversed.

## Benchmarks
Displayed numbers are throughput in Mibibytes of data hashed per second. Higher is better.
To run the benchmarks: `cargo bench --bench throughput`.

### Intel Ice Lake (x86 64-bit) (GCP n2-standard-2)

https://github.com/ogxd/gxhash/blob/af980cb313f3d16efc6e63956eb9ca4ddd70ee30/src/lib.rs#L4C1-L8C1

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|-------:|-------:|
| gxhash-avx2 | 4189 | 16734 | 46142 | 72679 | 96109 | 102202 | 100845 |
| gxhash | 6069 | 24283 | 29465 | 49542 | 58164 | 62511 | 64281 |
| xxhash | 915 | 4266 | 10339 | 10116 | 17164 | 20135 | 22834 |
| ahash | 1838 | 8712 | 22473 | 25958 | 35090 | 38440 | 39308 |
| t1ha0 | 740 | 2707 | 8572 | 28659 | 51202 | 59918 | 65902 |
| seahash | 213 | 620 | 1762 | 2473 | 2761 | 2837 | 2860 |
| metrohash | 754 | 2556 | 5983 | 10395 | 12738 | 13492 | 13624 |
| highwayhash | 122 | 490 | 3278 | 7057 | 9726 | 10743 | 11036 |
| fnv-1a | 1169 | 3062 | 1602 | 933 | 833 | 811 | 808 |

### Macbook M1 Pro (ARM 64-bit)

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|------:|-------:|
| gxhash | 6192 | 24901 | 31770 | 59465 | 72476 | 74723 | 76746 |
| xxhash | 1407 | 5638 | 11432 | 8380 | 16289 | 18690 | 19310 |
| ahash | 1471 | 5920 | 15597 | 22280 | 28672 | 29631 | 31174 |
| t1ha0 | 1181 | 4254 | 10277 | 15459 | 14120 | 13741 | 13743 |
| seahash | 1130 | 4428 | 8756 | 9248 | 8357 | 8085 | 8056 |
| metrohash | 1094 | 3389 | 9709 | 14431 | 17470 | 17679 | 17931 |
| highwayhash | 182 | 743 | 2696 | 5196 | 6573 | 7061 | 7170 |
| fnv-1a | 1988 | 2627 | 1407 | 896 | 777 | 753 | 745 |

## Debugging
The algorithm is mostly inlined, making most profilers fail at providing useful intrinsics. The best I could achieve is profiling at assembly level. [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash`). [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.
### Cryptographic Properties
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

## Contributing

- Feel free to submit PRs
- Repository is entirely usable via `cargo` commands
- Versioning is the following
- Major for stability breaking changes (output hashes for a same input are different after changes)
- Minor for API changes/removal
- Patch for new APIs, bug fixes and performance improvements

> ℹ️ [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash64`) (method `#[inline]` should be removed otherwise it won't be seen by the tool)
> ℹ️ [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.

## Publication
> Author note:
Expand Down
2 changes: 1 addition & 1 deletion benches/hashset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ fn benchmark_for_string(c: &mut Criterion, string: &str) {
iterate(b, string, &mut set);
});

let mut set = GxHashSet::<String>::default();
let mut set: HashSet::<String, GxBuildHasher> = GxHashSet::<String>::default();
group.bench_function("GxHash", |b| {
iterate(b, string, &mut set);
});
Expand Down
Loading