From 81874b420cddb02af212eccdf00996a2a49ea6da Mon Sep 17 00:00:00 2001 From: Alisa Sireneva Date: Thu, 12 Dec 2024 16:24:22 +0300 Subject: [PATCH] Update post --- blog/feed.rss | 4 +- blog/index.html | 2 +- blog/thoughts-on-rust-hashing/index.html | 18 +++++-- blog/thoughts-on-rust-hashing/index.md | 67 +++++++++++++++++++++--- 4 files changed, 78 insertions(+), 13 deletions(-) diff --git a/blog/feed.rss b/blog/feed.rss index 2b4f71e..48ae596 100644 --- a/blog/feed.rss +++ b/blog/feed.rss @@ -7,7 +7,7 @@ Alisa Sireneva, CC BY me@purplesyringa.moe (Alisa Sireneva) me@purplesyringa.moe (Alisa Sireneva) - Wed, 11 Dec 2024 19:27:31 GMT + Thu, 12 Dec 2024 13:24:05 GMT https://www.rssboard.org/rss-specification 60 @@ -20,7 +20,7 @@ How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on has me@purplesyringa.moe (Alisa Sireneva) https://purplesyringa.moe/blog/./thoughts-on-rust-hashing/ - Wed, 11 Dec 2024 00:00:00 GMT + Thu, 12 Dec 2024 00:00:00 GMT diff --git a/blog/index.html b/blog/index.html index a26ebbd..55f2104 100644 --- a/blog/index.html +++ b/blog/index.html @@ -1,4 +1,4 @@ -purplesyringa's blog

Subscribe to RSS

Thoughts on Rust hashing

In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:

How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.

Keep reading

Any Python program fits in 24 characters*

* If you don’t take whitespace into account.

My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in 24 bytes, ignoring whitespace.

Keep reading

The Rust Trademark Policy is still harmful

Reddit

Four days ago, the Rust Foundation released a new draft of the Rust Language Trademark Policy. The previous draft caused division within the community several years ago, prompting its retraction with the aim of creating a new, milder version.

Well, that failed. While certain issues were addressed (thank you, we appreciate it!), the new version remains excessively restrictive and, in my opinion, will harm both the Rust community as a whole and compiler and crate developers. While I expect the stricter rules to not be enforced in practice, I don’t want to constantly feel like I’m under threat while contributing to the Rust ecosystem, and this is exactly what it would feel like if this draft is finalized.

Below are some of my core objections to the draft.

Keep reading

Bringing faster exceptions to Rust

Reddit

Three months ago, I wrote about why you might want to use panics for error handling. Even though it’s a catchy title, panics are hardly suited for this goal, even if you try to hack around with macros and libraries. The real star is the unwinding mechanism, which powers panics. This post is the first in a series exploring what unwinding is, how to speed it up, and how it can benefit Rust and C++ programmers.

Keep reading

We built the best "Bad Apple!!" in Minecraft

Hacker News

Demoscene is the art of pushing computers to perform tasks they weren’t designed to handle. One recurring theme in demoscene is the shadow-art animation “Bad Apple!!”. We’ve played it on the Commodore 64, Vectrex (a unique game console utilizing only vector graphics), Impulse Tracker, and even exploited Super Mario Bros. to play it.

But how about Bad Apple!!.. in Minecraft?

Keep reading

Minecraft сравнивает массивы за куб

Telegram

Коллизии в играх обнаруживаются тяжелыми алгоритмами. Для примера попробуйте представить себе, насколько сложно это для просто двух произвольно повернутых кубов в пространстве. Они могут контактировать двумя ребрами, вершиной и гранью или еще как-то более сложно.

В майнкрафте вся геометрия хитбоксов параллельна осям координат, т.е. наклона не бывает. Это сильно упрощает поиск коллизий.

Я бы такое писала просто. Раз хитбокс блока — это объединение нескольких параллелепипедов, то можно его так и хранить: как список 6-элементных тьюплов. В подавляющем большинстве случаев этот список будет очень коротким. Для обычных кубов его длина — 1, для стеклопаналей может достигать 2, наковальня, о боги, состоит из 3 элементов, а стены могут иметь их аж целых 4. Для проверки хитбоксов на пересечение достаточно перебрать пары параллелепипедов двух хитбоксов (кажется, их может быть максимум 16). Для параллелепипедов с параллельными осями задача решается тривиально.

Но Minecraft JE писала не я, поэтому там реализация иная.

Keep reading

WebP: The WebPage compression format

Hacker News Reddit Lobsters Russian

I want to provide a smooth experience to my site visitors, so I work on accessibility and ensure it works without JavaScript enabled. I care about page load time because some pages contain large illustrations, so I minify my HTML.

But one thing makes turning my blog light as a feather a pain in the ass.

Keep reading

Division is hard, but it doesn't have to be

Reddit

Developers don’t usually divide numbers all the time, but hashmaps often need to compute remainders modulo a prime. Hashmaps are really common, so fast division is useful.

For instance, rolling hashes might compute u128 % u64 with a fixed divisor. Compilers just drop the ball here:

fn modulo(n: u128) -> u64 {
+purplesyringa's blog

Subscribe to RSS

Thoughts on Rust hashing

In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:

How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.

Keep reading

Any Python program fits in 24 characters*

* If you don’t take whitespace into account.

My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in 24 bytes, ignoring whitespace.

Keep reading

The Rust Trademark Policy is still harmful

Reddit

Four days ago, the Rust Foundation released a new draft of the Rust Language Trademark Policy. The previous draft caused division within the community several years ago, prompting its retraction with the aim of creating a new, milder version.

Well, that failed. While certain issues were addressed (thank you, we appreciate it!), the new version remains excessively restrictive and, in my opinion, will harm both the Rust community as a whole and compiler and crate developers. While I expect the stricter rules to not be enforced in practice, I don’t want to constantly feel like I’m under threat while contributing to the Rust ecosystem, and this is exactly what it would feel like if this draft is finalized.

Below are some of my core objections to the draft.

Keep reading

Bringing faster exceptions to Rust

Reddit

Three months ago, I wrote about why you might want to use panics for error handling. Even though it’s a catchy title, panics are hardly suited for this goal, even if you try to hack around with macros and libraries. The real star is the unwinding mechanism, which powers panics. This post is the first in a series exploring what unwinding is, how to speed it up, and how it can benefit Rust and C++ programmers.

Keep reading

We built the best "Bad Apple!!" in Minecraft

Hacker News

Demoscene is the art of pushing computers to perform tasks they weren’t designed to handle. One recurring theme in demoscene is the shadow-art animation “Bad Apple!!”. We’ve played it on the Commodore 64, Vectrex (a unique game console utilizing only vector graphics), Impulse Tracker, and even exploited Super Mario Bros. to play it.

But how about Bad Apple!!.. in Minecraft?

Keep reading

Minecraft сравнивает массивы за куб

Telegram

Коллизии в играх обнаруживаются тяжелыми алгоритмами. Для примера попробуйте представить себе, насколько сложно это для просто двух произвольно повернутых кубов в пространстве. Они могут контактировать двумя ребрами, вершиной и гранью или еще как-то более сложно.

В майнкрафте вся геометрия хитбоксов параллельна осям координат, т.е. наклона не бывает. Это сильно упрощает поиск коллизий.

Я бы такое писала просто. Раз хитбокс блока — это объединение нескольких параллелепипедов, то можно его так и хранить: как список 6-элементных тьюплов. В подавляющем большинстве случаев этот список будет очень коротким. Для обычных кубов его длина — 1, для стеклопаналей может достигать 2, наковальня, о боги, состоит из 3 элементов, а стены могут иметь их аж целых 4. Для проверки хитбоксов на пересечение достаточно перебрать пары параллелепипедов двух хитбоксов (кажется, их может быть максимум 16). Для параллелепипедов с параллельными осями задача решается тривиально.

Но Minecraft JE писала не я, поэтому там реализация иная.

Keep reading

WebP: The WebPage compression format

Hacker News Reddit Lobsters Russian

I want to provide a smooth experience to my site visitors, so I work on accessibility and ensure it works without JavaScript enabled. I care about page load time because some pages contain large illustrations, so I minify my HTML.

But one thing makes turning my blog light as a feather a pain in the ass.

Keep reading

Division is hard, but it doesn't have to be

Reddit

Developers don’t usually divide numbers all the time, but hashmaps often need to compute remainders modulo a prime. Hashmaps are really common, so fast division is useful.

For instance, rolling hashes might compute u128 % u64 with a fixed divisor. Compilers just drop the ball here:

fn modulo(n: u128) -> u64 {
     (n % 0xffffffffffffffc5) as u64
 }
 
modulo:
diff --git a/blog/thoughts-on-rust-hashing/index.html b/blog/thoughts-on-rust-hashing/index.html
index 83fd4c4..27ce535 100644
--- a/blog/thoughts-on-rust-hashing/index.html
+++ b/blog/thoughts-on-rust-hashing/index.html
@@ -1,5 +1,5 @@
 Thoughts on Rust hashing | purplesyringa's blog

Thoughts on Rust hashing

IntroIn languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:

How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.

How do you mix hashes? You can:

  • Leave that to the users. Everyone will then invent their own terrible mixers, like x ^ y. Indeed, both arguments are pseudo-random, what could possibly go wrong?
  • Provide a good-enough mixer for most use cases, like a * x + y. Cue CVEs because people used mix(x, mix(y, z)) instead of mix(mix(x, y), z).
  • Provide a quality mixer, missing out on performance in common simple cases.

What if the input data is already random? Then you’re just wasting cycles.

What guarantees do you provide regarding the hash values?

  • Do you require the avalanche effect? Your hash is suboptimal even for simple power-of-two-sized hash tables.
  • Do you require a half-avalanche effect instead? Congrats, you broke either those or prime-sized hash tables.
  • Do you require the hash table to perform finalization manually? Using strings as keys is now suboptimal, because computing a non-finalized hash of a string is of good enough quality already.

Is your hash function seeded?

  • If not, hi DoS.
  • If yes, but you reuse the same seed between different hash tables, your tables are now quadratic.
  • If the seed is explicitly passed to each hasher, how do you ensure different hashers don’t accidentally cancel out?

In RustRust learnt from these mistakes by splitting the responsibilities:

  • Objects implement the Hash trait, allowing them to write underlying data into a Hasher.
  • Hashers implement the Hasher trait, which hashes the data written by Hash objects.

Objects turn the structured data into a stream of integers; hashers turn the stream into a numeric hash.

On paper, this is a good solution:

  • Hashing an integer is as simple as sending the integer to the hasher. Consumers can choose hashers that provide the necessary guarantees.
  • Users don’t have to mix hashes. Hashers can do that optimally.
  • If the data is known to be random, a fast simple hasher can be used without changing the Hash implementation.
  • Different hash tables can use different hashers, efficiently providing only as much avalanche as necessary.
  • The hasher can be seeded per-table. Only the hasher has access to the seed, so safely using the seed during mixing is easy.

Surely this enables optimal and performant hashing in practice, right?

NoLet’s take a look at the Hasher API:

pub trait Hasher {
+How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance."property=og:description>

Thoughts on Rust hashing

IntroIn languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:

How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.

How do you mix hashes? You can:

  • Leave that to the users. Everyone will then invent their own terrible mixers, like x ^ y. Indeed, both arguments are pseudo-random, what could possibly go wrong?
  • Provide a good-enough mixer for most use cases, like a * x + y. Cue CVEs because people used mix(x, mix(y, z)) instead of mix(mix(x, y), z).
  • Provide a quality mixer, missing out on performance in common simple cases.

What if the input data is already random? Then you’re just wasting cycles.

What guarantees do you provide regarding the hash values?

  • Do you require the avalanche effect? Your hash is suboptimal even for simple power-of-two-sized hash tables.
  • Do you require a half-avalanche effect instead? Congrats, you broke either those or prime-sized hash tables.
  • Do you require the hash table to perform finalization manually? Using strings as keys is now suboptimal, because computing a non-finalized hash of a string is of good enough quality already.

Is your hash function seeded?

  • If not, hi DoS.
  • If yes, but you reuse the same seed between different hash tables, your tables are now quadratic.
  • If the seed is explicitly passed to each hasher, how do you ensure different hashers don’t accidentally cancel out?

In RustRust learnt from these mistakes by splitting the responsibilities:

  • Objects implement the Hash trait, allowing them to write underlying data into a Hasher.
  • Hashers implement the Hasher trait, which hashes the data written by Hash objects.

Objects turn the structured data into a stream of integers; hashers turn the stream into a numeric hash.

On paper, this is a good solution:

  • Hashing an integer is as simple as sending the integer to the hasher. Consumers can choose hashers that provide the necessary guarantees.
  • Users don’t have to mix hashes. Hashers can do that optimally.
  • If the data is known to be random, a fast simple hasher can be used without changing the Hash implementation.
  • Different hash tables can use different hashers, efficiently providing only as much avalanche as necessary.
  • The hasher can be seeded per-table. Only the hasher has access to the seed, so safely using the seed during mixing is easy.

Surely this enables optimal and performant hashing in practice, right?

NoLet’s take a look at the Hasher API:

pub trait Hasher {
     // Required methods
     fn finish(&self) -> u64;
     fn write(&mut self, bytes: &[u8]);
@@ -24,7 +24,7 @@
     let block = u64::from_ne_bytes(*block);
     *state = state.wrapping_mul(K).wrapping_add(block);
 }
-

This is just a multiplicative hash, not unlike FNV-1, but consuming 8 bytes at a time instead of 1.

Now what happens if you try to hash two 32-bit integers with this hash? With padding, that will compile to two multiplications even though one would work. This halves throughput and increases latency.

Practical hashes uses much larger blocks. rapidhash has a 24-byte state and can absorb 48 bytes at once. ahash has a 48-byte state and absorbs 64-byte blocks. meowhash has a 128-byte state and absorbs 256 bytes. (I only selected these particular hashes because I’m familiar with their kernels; others have similar designs.)

These are some of the fastest non-cryptographic hashes in the world. Do you really want to nuke their performance by padding 8-byte inputs to 48, 64, or 256 bytes? Probably not.

ChainsOkay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than absorbing a full block?

Say, the rapidhash kernel is effectively this:

fn absorb(state: &mut [u64; 3], seed: &[u64; 3], block: &[u64; 6]) {
+

This is just a multiplicative hash, not unlike FNV-1, but consuming 8 bytes at a time instead of 1.

Now what happens if you try to hash two 32-bit integers with this hash? With padding, that will compile to two multiplications even though one would work. This halves throughput and increases latency.

Practical hashes use much larger blocks. rapidhash has a 24-byte state and can absorb 48 bytes at once. ahash has a 48-byte state and absorbs 64-byte blocks. meowhash has a 128-byte state and absorbs 256 bytes. (I only selected these particular hashes because I’m familiar with their kernels; others have similar designs.)

These are some of the fastest non-cryptographic hashes in the world. Do you really want to nuke their performance by padding 8-byte inputs to 48, 64, or 256 bytes? Probably not.

ChainsOkay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than by absorbing a full block?

Say, the rapidhash kernel is effectively this:

fn absorb(state: &mut [u64; 3], seed: &[u64; 3], block: &[u64; 6]) {
     for i in 0..3 {
         state[i] = mix(block[i] ^ state[i], block[i + 3] ^ seed[i]);
     }
@@ -32,7 +32,7 @@
 

That’s three independent iterations, so surely we can absorb a smaller 64-bit block like this instead:

fn absorb_64bit(state: &mut [u64; 3], seed: &[u64; 3], block: u64) {
     state[0] = mix(block ^ state[0], seed[0]);
 }
-

Surely this is going to reduce the 6× slowdown to at least something like 2×, right?

Why does rapidhash even use three independent chains in the first place? That’s right, latency!

mix has a 5 tick latency on modern x86 processors, but a throughput of 1. Chain independence allows a 16-byte block to be consumed without waiting for the previous 16 bytes to be mixed in. We just threw this optimization out.

AccumulationOkay, so padding is a terrible idea. Can we accumulate a buffer instead? How much hashes I had to scroll through in SMHasher before I found one Rust implementation that took this approach is a warning bell.

The implementation I found, of course, stores a Vec<u8> and passes it to the underlying hasher in finish. I believe I don’t need to explain why allocating during hash function is not the brightest idea.

Let’s consider another implementation that stores a fixed-size buffer instead. Huh, that’s a lot of ifs and fors. I wonder what Godbolt will say about this. Let’s try something very simple:

struct StreamingHasher { +

Surely this is going to reduce the 6× slowdown to at least something like 2×, right?

Why does rapidhash even use three independent chains in the first place? That’s right, latency!

mix has a 5 tick latency on modern x86 processors, but a throughput of 1. Chain independence allows a 16-byte block to be consumed without waiting for the previous 16 bytes to be mixed in. We just threw this optimization out.

AccumulationOkay, so padding is a terrible idea. Can we accumulate a buffer instead? How much hashes I had to scroll through in SMHasher before I found one Rust implementation that took this approach is a warning bell.

The implementation I found, of course, stores a Vec<u8> and passes it to the underlying hasher in finish. I believe I don’t need to explain why allocating in a hash function is not the brightest idea.

Let’s consider another implementation that stores a fixed-size buffer instead. Huh, that’s a lot of ifs and fors. I wonder what Godbolt will say about this. Let’s try something very simple:

struct StreamingHasher { block_hasher: BlockHasher, buffer: [u8; 8], length: usize, @@ -252,7 +252,17 @@ alloc::vec::Vec<ruined_portal::NewType>: 177.900032ms (-> 4ae6133ab0e0fe9f)

highway:

alloc::vec::Vec<i32>: 53.843217ms (-> f2e68b031ff10c02)
 alloc::vec::Vec<ruined_portal::NewType>: 547.520541ms (-> f2e68b031ff10c02)
-

That’s not good. Note that all hashers have about the same performance on Vec<i32>. That’s about the speed of RAM. For small arrays that fits in cache, the difference is even more prominent. (I didn’t verify this, but I am the smartest person in the room and thus am obviously right.)

My goal

(Kinda)What I really want is a general-purpose hash that’s good for most practical purposes and kinda DoS-resistant but not necessarily cryptographic. It needs to perform fast on short inputs, so it can’t be a “real” block hash, but rather something close to rapidhash.

We want:

consume(a,x,y)=mix(x⊕︎a,y⊕︎C).

Right, Rust doesn’t support this. Okay, let’s try another relatively well-known scheme that might be easier to implement. It’s parallel, surely that’ll help?

To hash a 64-bit word sequence (x1,,x2n), we compute

mix(x1⊕︎a1,x2⊕︎a2)++mix(x2n1⊕︎a2n1,x2n⊕︎a2n),

where (a1,,a2n) is random data (possibly generated from the seed once), and

mix(x,y)=(xymod264)⊕︎(xydiv264).

This is a combination of certain well-known primitives. The problem here is that ai needs to be precomputed beforehand. This is not a problem for fixed-length keys, like structs of integers – something often used in, say, rustc.

Unfortunately, Rust forces each hasher to handle all possible inputs, including of different lengths, so this scheme can’t work. The hasher isn’t even parametrized by the type of the hashed object. Four well-layouted 64-bit integers that can easily be mixed together with just two full-width multiplications? Nah, write_u64 goes brrrrrrrrrrrr-

Stop bitchingI’ve been designing fast hash-based data structures for several months before realizing they are almost unusable because of these design decisions. Surely something that isn’t a problem in C++ and Python won’t be a problem in Rust, I thought. I deserve a little bitching, okay?

Actually howThe obvious way forward is to bring the structure of the data back into the picture. If the hasher knew it’s hashing fixed-size data, it could use the ai approach. If the hasher knew it’s hashing an array, it could vectorize the computation of individual hashes. If the hasher knew the types of the fields in the structure it’s hashing, it could prevent tearing, or perhaps merge small fields into 64-bit blocks efficiently. Alas, the hasher is clueless…

In my opinion, Hasher and Hash are a wrong abstraction. Instead of the Hash driving the Hasher insane, it should be the other way round: Hash providing introspection facilities and Hasher navigating the hashed objects recursively. As a bonus, this could enable (opt-in) portable hashers.

How this API should look like and whether it can be shoehorned into the existing interfaces remains to be seen. I have not started work on the design yet, and perhaps this article might be a bit premature, but I’d love to hear your thoughts on how I missed something really obvious (or, indeed, on how Rust is fast enough and no one cares).

Made with my own bare hands (why.)