Skip to content

Commit

Permalink
Finish
Browse files Browse the repository at this point in the history
  • Loading branch information
ogxd committed Nov 22, 2023
1 parent 60511dd commit 3cf0a85
Showing 1 changed file with 53 additions and 97 deletions.
150 changes: 53 additions & 97 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,130 +7,86 @@ GxHash is a [**blazingly fast**](#performance) and [**robust**](#robustness) non
```
cargo add gxhash
```
GxHash used in HashMap/HashSet:
```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
```
GxHash used directly:
Used directly as a hash function:
```rust
let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;
println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));
```
Used in `HashMap`/`HashSet`:
```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
```

## Features

### Performance
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes.
See [benchmarks](#benchmarks).

<details>
<summary><h4>Tips for collapsed sections</h4></summary>
### Blazingly Fast 🚀
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).
See the [benchmarks](#benchmarks).

![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
### Highly Robust 🗿
GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all [SMHasher](https://github.com/rurban/smhasher) tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

</details>
Check out the [paper](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more technical details.

### Robustness
## Portability

GxHash passes all SMHasher tests.
### Architecture Compatibility
GxHash is compatible with:
- X86 processors with `AES-NI` intrinsics
- ARM processors with `NEON` intrinsics
> **Warning**
> Other platforms are currently not supported (there is no fallback)
For other platforms, the behavior is currently undefined.

### Security

#### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.
#### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.
#### Cryptographic Properties ❌
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how likely it is to be reversed.
### Hashes Stability
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. An exception to this is the AVX2 version of GxHash (nightly).

## Benchmarks

To run the benchmarks locally use one of the following:
```sh
# Benchmark throughput
cargo bench --bench throughput
# Benchmark performance of GxHash's Hasher when used in a HashSet
cargo bench --bench hashset
# Benchmark throughput and get output as a markdown table
cargo bench --bench throughput --features bench-md
# Benchmark throughput and get output as .svg plots
cargo bench --bench throughput --features bench-plot
```

## Contributing



![CI](https://github.com/ogxd/gxhash-rust/actions/workflows/rust.yml/badge.svg)

Up to this date, the fastest non-cryptographic hashing algorithm 🚀 (see benchmarks)
Passes all [SMHasher](https://github.com/rurban/smhasher) quality tests ✅

#### What makes it so fast?
Here are the principal reasons:
- SIMD all the way (and usage of SIMD AES for efficient bit mixing)
- High ILP processing for large inputs
- Small bytecode for greater inlining opportunities
Checkout the [article](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more details.
GxHash is continuously benchmarked on X86 and ARM Github runners.
[![Benchmark](https://github.com/ogxd/gxhash/actions/workflows/bench.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/bench.yml)

## Usage
```
cargo add gxhash
```
**Lastest Benchmark Results:**
![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
![x86_64-avx2](./benches/throughput/x86_64-avx2.svg)

```rust
use gxhash::*;
## Security

// Used as a hashing function
let bytes = [42u8; 1000];
let seed = 1234;
println!("Hash is {:x}!", gxhash::gxhash64(&bytes, seed));
### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

// Used as an Hasher for faster HashSet/HashMap
let mut hashset = GxHashSet::default();
hashset.insert("hello world");
```
### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.

## Compatibility
- ARM 64-bit using `NEON` intrinsics.
- x86-64 bit using `SSE2` + `AES` intrinsics.
- (optional and only on unstable toolchain) with `avx2` feature enabled, gxhash will use `AVX2` intrinsics, for up to twice as much performance for large inputs. Only compatible on `AVX2` enabled x86-64 platforms.
### Cryptographic Properties
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

> **Warning**
> Other platforms are currently not supported (there is no fallback)
## Contributing

## Security
- Feel free to submit PRs
- Repository is entirely usable via `cargo` commands

## Benchmarks
Displayed numbers are throughput in Mibibytes of data hashed per second. Higher is better.
To run the benchmarks: `cargo bench --bench throughput`.

### Intel Ice Lake (x86 64-bit) (GCP n2-standard-2)

https://github.com/ogxd/gxhash/blob/af980cb313f3d16efc6e63956eb9ca4ddd70ee30/src/lib.rs#L4C1-L8C1

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|-------:|-------:|
| gxhash-avx2 | 4189 | 16734 | 46142 | 72679 | 96109 | 102202 | 100845 |
| gxhash | 6069 | 24283 | 29465 | 49542 | 58164 | 62511 | 64281 |
| xxhash | 915 | 4266 | 10339 | 10116 | 17164 | 20135 | 22834 |
| ahash | 1838 | 8712 | 22473 | 25958 | 35090 | 38440 | 39308 |
| t1ha0 | 740 | 2707 | 8572 | 28659 | 51202 | 59918 | 65902 |
| seahash | 213 | 620 | 1762 | 2473 | 2761 | 2837 | 2860 |
| metrohash | 754 | 2556 | 5983 | 10395 | 12738 | 13492 | 13624 |
| highwayhash | 122 | 490 | 3278 | 7057 | 9726 | 10743 | 11036 |
| fnv-1a | 1169 | 3062 | 1602 | 933 | 833 | 811 | 808 |

### Macbook M1 Pro (ARM 64-bit)

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|------:|-------:|
| gxhash | 6192 | 24901 | 31770 | 59465 | 72476 | 74723 | 76746 |
| xxhash | 1407 | 5638 | 11432 | 8380 | 16289 | 18690 | 19310 |
| ahash | 1471 | 5920 | 15597 | 22280 | 28672 | 29631 | 31174 |
| t1ha0 | 1181 | 4254 | 10277 | 15459 | 14120 | 13741 | 13743 |
| seahash | 1130 | 4428 | 8756 | 9248 | 8357 | 8085 | 8056 |
| metrohash | 1094 | 3389 | 9709 | 14431 | 17470 | 17679 | 17931 |
| highwayhash | 182 | 743 | 2696 | 5196 | 6573 | 7061 | 7170 |
| fnv-1a | 1988 | 2627 | 1407 | 896 | 777 | 753 | 745 |

## Debugging
The algorithm is mostly inlined, making most profilers fail at providing useful intrinsics. The best I could achieve is profiling at assembly level. [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash`). [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.
> ℹ️ [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash64`) (method `#[inline]` should be removed otherwise it won't be seen by the tool)
> ℹ️ [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.
## Publication
> Author note:
Expand Down

0 comments on commit 3cf0a85

Please sign in to comment.