Skip to content

Commit

Permalink
Finish
Browse files Browse the repository at this point in the history
  • Loading branch information
ogxd committed Nov 22, 2023
1 parent 60511dd commit cade9f8
Showing 1 changed file with 50 additions and 64 deletions.
114 changes: 50 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,67 +7,84 @@ GxHash is a [**blazingly fast**](#performance) and [**robust**](#robustness) non
```
cargo add gxhash
```
GxHash used in HashMap/HashSet:
```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
```
GxHash used directly:
Used directly as a hash function:
```rust
let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;
println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));
```
Used in `HashMap`/`HashSet`:
```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
```

## Features

### Performance
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes.
See [benchmarks](#benchmarks).
- **Blazingly Fast** 🚀
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).
See the [benchmarks](#benchmarks).

<details>
<summary><h4>Tips for collapsed sections</h4></summary>
- **Highly Robust** 🗿
GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all [SMHasher](https://github.com/rurban/smhasher) tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
Check out the [paper](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more technical details.

</details>
## Portability

### Robustness
### Architecture Compatibility
GxHash is compatible with:
- X86 processors with AES-NI
- ARM processors with AES
For other platforms, the behavior is currently undefined.

GxHash passes all SMHasher tests.
### Hashes Stability
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. An exception to this is the AVX2 version of GxHash (nightly).

### Security
## Benchmarks

#### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.
#### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.
#### Cryptographic Properties ❌
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how likely it is to be reversed.
To run the benchmarks locally use one of the following:
```sh
# Benchmark throughput
cargo bench --bench throughput
# Benchmark performance of GxHash's Hasher when used in a HashSet
cargo bench --bench hashset
# Benchmark throughput and get output as a markdown table
cargo bench --bench throughput --features bench-md
# Benchmark throughput and get output as .svg plots
cargo bench --bench throughput --features bench-plot
```

## Benchmarks
GxHash is continuously benchmarked on X86 and ARM Github runners.
[![Benchmark](https://github.com/ogxd/gxhash/actions/workflows/bench.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/bench.yml)

**Lastest Benchmark Results:**
![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
![x86_64-avx2](./benches/throughput/x86_64-avx2.svg)

## Contributing
## Security

### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.

### Cryptographic Properties
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

## Contributing

![CI](https://github.com/ogxd/gxhash-rust/actions/workflows/rust.yml/badge.svg)

Up to this date, the fastest non-cryptographic hashing algorithm 🚀 (see benchmarks)
Passes all [SMHasher](https://github.com/rurban/smhasher) quality tests ✅

#### What makes it so fast?
Here are the principal reasons:
- SIMD all the way (and usage of SIMD AES for efficient bit mixing)
- High ILP processing for large inputs
- Small bytecode for greater inlining opportunities
Checkout the [article](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more details.

## Usage
```
cargo add gxhash
Expand All @@ -94,41 +111,10 @@ hashset.insert("hello world");
> **Warning**
> Other platforms are currently not supported (there is no fallback)
## Security

## Benchmarks
Displayed numbers are throughput in Mibibytes of data hashed per second. Higher is better.
To run the benchmarks: `cargo bench --bench throughput`.

### Intel Ice Lake (x86 64-bit) (GCP n2-standard-2)

https://github.com/ogxd/gxhash/blob/af980cb313f3d16efc6e63956eb9ca4ddd70ee30/src/lib.rs#L4C1-L8C1

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|-------:|-------:|
| gxhash-avx2 | 4189 | 16734 | 46142 | 72679 | 96109 | 102202 | 100845 |
| gxhash | 6069 | 24283 | 29465 | 49542 | 58164 | 62511 | 64281 |
| xxhash | 915 | 4266 | 10339 | 10116 | 17164 | 20135 | 22834 |
| ahash | 1838 | 8712 | 22473 | 25958 | 35090 | 38440 | 39308 |
| t1ha0 | 740 | 2707 | 8572 | 28659 | 51202 | 59918 | 65902 |
| seahash | 213 | 620 | 1762 | 2473 | 2761 | 2837 | 2860 |
| metrohash | 754 | 2556 | 5983 | 10395 | 12738 | 13492 | 13624 |
| highwayhash | 122 | 490 | 3278 | 7057 | 9726 | 10743 | 11036 |
| fnv-1a | 1169 | 3062 | 1602 | 933 | 833 | 811 | 808 |

### Macbook M1 Pro (ARM 64-bit)

| Method | 4 | 16 | 64 | 256 | 1024 | 4096 | 16384 |
|-------------|-----:|------:|------:|------:|------:|------:|-------:|
| gxhash | 6192 | 24901 | 31770 | 59465 | 72476 | 74723 | 76746 |
| xxhash | 1407 | 5638 | 11432 | 8380 | 16289 | 18690 | 19310 |
| ahash | 1471 | 5920 | 15597 | 22280 | 28672 | 29631 | 31174 |
| t1ha0 | 1181 | 4254 | 10277 | 15459 | 14120 | 13741 | 13743 |
| seahash | 1130 | 4428 | 8756 | 9248 | 8357 | 8085 | 8056 |
| metrohash | 1094 | 3389 | 9709 | 14431 | 17470 | 17679 | 17931 |
| highwayhash | 182 | 743 | 2696 | 5196 | 6573 | 7061 | 7170 |
| fnv-1a | 1988 | 2627 | 1407 | 896 | 777 | 753 | 745 |

## Debugging
The algorithm is mostly inlined, making most profilers fail at providing useful intrinsics. The best I could achieve is profiling at assembly level. [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash`). [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.

Expand Down

0 comments on commit cade9f8

Please sign in to comment.