From 81874b420cddb02af212eccdf00996a2a49ea6da Mon Sep 17 00:00:00 2001
From: Alisa Sireneva <me@purplesyringa.moe>
Date: Thu, 12 Dec 2024 16:24:22 +0300
Subject: [PATCH] Update post

---
 blog/feed.rss                            |  4 +-
 blog/index.html                          |  2 +-
 blog/thoughts-on-rust-hashing/index.html | 18 +++++--
 blog/thoughts-on-rust-hashing/index.md   | 67 +++++++++++++++++++++---
 4 files changed, 78 insertions(+), 13 deletions(-)
diff --git a/blog/feed.rss b/blog/feed.rss
index 2b4f71e..48ae596 100644
--- a/blog/feed.rss
+++ b/blog/feed.rss
@@ -7,7 +7,7 @@
 		<copyright>Alisa Sireneva, CC BY</copyright>
 		<managingEditor>me@purplesyringa.moe (Alisa Sireneva)</managingEditor>
 		<webMaster>me@purplesyringa.moe (Alisa Sireneva)</webMaster>
-		<lastBuildDate>Wed, 11 Dec 2024 19:27:31 GMT</lastBuildDate>
+		<lastBuildDate>Thu, 12 Dec 2024 13:24:05 GMT</lastBuildDate>
 		<docs>https://www.rssboard.org/rss-specification</docs>
 		<ttl>60</ttl>
 		<atom:link href="https://purplesyringa.moe/blog/feed.rss" rel="self" type="application/rss+xml" />
@@ -20,7 +20,7 @@ How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on has
 				<author>me@purplesyringa.moe (Alisa Sireneva)</author>
 				
 				<guid>https://purplesyringa.moe/blog/./thoughts-on-rust-hashing/</guid>
-				<pubDate>Wed, 11 Dec 2024 00:00:00 GMT</pubDate>
+				<pubDate>Thu, 12 Dec 2024 00:00:00 GMT</pubDate>
 			</item>
 		
 			<item>
diff --git a/blog/index.html b/blog/index.html
index a26ebbd..55f2104 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -1,4 +1,4 @@
-<!doctypehtml><meta charset=utf-8><meta content=width=device-width,initial-scale=1 name=viewport><title>purplesyringa's blog</title><link href="{{ root }}/favicon.ico?v=2"rel=icon><link href=../all.css rel=stylesheet><link href=../blog.css rel=stylesheet><link href=../vendor/Temml-Local.css rel=stylesheet><link crossorigin href=https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&family=Roboto+Mono:ital,wght@0,100..700;1,100..700&family=Roboto:ital,wght@0,400;0,700;1,400;1,700&family=Slabo+27px&display=swap rel=stylesheet><link href=../fonts/webfont.css rel=stylesheet><link media="screen and (prefers-color-scheme: dark"href=../vendor/atom-one-dark.min.css rel=stylesheet><link media="screen and (prefers-color-scheme: light"href=../vendor/a11y-light.min.css rel=stylesheet><link title="Blog posts"href=/blog/feed.rss rel=alternate type=application/rss+xml><script data-website-id=0da1961d-43f2-45cc-a8e2-75679eefbb69 defer src=https://zond.tei.su/script.js></script><body><header><div class=viewport-container><div class=media><a href=https://github.com/purplesyringa><img alt=GitHub src=../images/github-mark-white.svg></a></div><h1><a href=/>purplesyringa</a></h1><nav><a href=..>about</a><a class=current href>blog</a><a href=../sink/>kitchen sink</a></nav></div></header><section><div class=viewport-container><p><a href=/blog/feed.rss><i class="nf nf-fa-rss_square"title=RSS></i> Subscribe to RSS</a><div class=post-entry><h2><a href=./thoughts-on-rust-hashing/>Thoughts on Rust hashing</a></h2><time>December 11, 2024</time><p>In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:<p>How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.<p><a href=./thoughts-on-rust-hashing/>Keep reading</a></div><div class=post-entry><h2><a href=./any-python-program-fits-in-24-characters/>Any Python program fits in 24 characters*</a></h2><time>November 17, 2024</time><p><em>* If you don’t take whitespace into account.</em><p>My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode <em>any</em> Python program in <eq><math><mn>24</mn></math></eq> bytes, ignoring whitespace.<p><a href=./any-python-program-fits-in-24-characters/>Keep reading</a></div><div class=post-entry><h2><a href=./the-rust-trademark-policy-is-still-harmful/>The Rust Trademark Policy is still harmful</a></h2><time>November 10, 2024</time><a class=discussion href=https://www.reddit.com/r/rust/comments/1gnz5sm/the_rust_trademark_policy_is_still_harmful/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Four days ago, the Rust Foundation <a href=https://blog.rust-lang.org/2024/11/06/trademark-update.html>released</a> a new <a href=https://drive.google.com/file/d/1hjTx11Fb-4W7RQLmp3R8BLDACc7zxIpG/view>draft</a> of the Rust Language Trademark Policy. The previous draft caused division within the community several years ago, prompting its retraction with the aim of creating a new, milder version.<p>Well, that failed. While certain issues were addressed (thank you, we appreciate it!), the new version remains excessively restrictive and, in my opinion, will harm both the Rust community as a whole <em>and</em> compiler and crate developers. While I expect the stricter rules to not be enforced in practice, I don’t want to constantly feel like I’m under threat while contributing to the Rust ecosystem, and this is exactly what it would feel like if this draft is finalized.<p>Below are some of my core objections to the draft.<p><a href=./the-rust-trademark-policy-is-still-harmful/>Keep reading</a></div><div class=post-entry><h2><a href=./bringing-faster-exceptions-to-rust/>Bringing faster exceptions to Rust</a></h2><time>November 6, 2024</time><a class=discussion href=https://www.reddit.com/r/rust/comments/1gl050z/bringing_faster_exceptions_to_rust/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Three months ago, I wrote about why <a href=../you-might-want-to-use-panics-for-error-handling/>you might want to use panics for error handling</a>. Even though it’s a catchy title, panics are hardly suited for this goal, even if you try to hack around with macros and libraries. The real star is <em>the unwinding mechanism</em>, which powers panics. This post is the first in a series exploring what unwinding is, how to speed it up, and how it can benefit Rust and C++ programmers.<p><a href=./bringing-faster-exceptions-to-rust/>Keep reading</a></div><div class=post-entry><h2><a href=./we-built-the-best-bad-apple-in-minecraft/>We built the best "Bad Apple!!" in Minecraft</a></h2><time>October 10, 2024</time><a class=discussion href=https://news.ycombinator.com/item?id=41798369><i class="nf nf-md-comment"title=Comment></i> Hacker News</a><p>Demoscene is the art of pushing computers to perform tasks they weren’t designed to handle. One recurring theme in demoscene is the shadow-art animation “Bad Apple!!”. We’ve played it on the Commodore 64, <a href=https://en.wikipedia.org/wiki/Vectrex>Vectrex</a> (a unique game console utilizing only vector graphics), <a href=https://www.youtube.com/watch?v=SDvk3aL78fI>Impulse Tracker</a>, and even <a href=https://tasvideos.org/6012M>exploited Super Mario Bros.</a> to play it.<p>But how about Bad Apple!!.. in Minecraft?<p><a href=./we-built-the-best-bad-apple-in-minecraft/>Keep reading</a></div><div class=post-entry><h2><a href=ru/minecraft-compares-arrays-in-cubic-time/>Minecraft сравнивает массивы за куб</a></h2><time>September 14, 2024</time><a class=discussion href=https://t.me/alisa_rummages/156><i class="nf nf-md-comment"title=Comment></i> Telegram</a><p>Коллизии в играх обнаруживаются тяжелыми алгоритмами. Для примера попробуйте представить себе, насколько сложно это для просто двух произвольно повернутых кубов в пространстве. Они могут контактировать двумя ребрами, вершиной и гранью или еще как-то более сложно.<p>В майнкрафте вся геометрия хитбоксов параллельна осям координат, т.е. наклона не бывает. Это сильно упрощает поиск коллизий.<p>Я бы такое писала просто. Раз хитбокс блока — это объединение нескольких параллелепипедов, то можно его так и хранить: как список 6-элементных тьюплов. В подавляющем большинстве случаев этот список будет очень коротким. Для обычных кубов его длина — 1, для стеклопаналей может достигать 2, наковальня, о боги, состоит из 3 элементов, а стены могут иметь их аж целых 4. Для проверки хитбоксов на пересечение достаточно перебрать пары параллелепипедов двух хитбоксов (кажется, их может быть максимум 16). Для параллелепипедов с параллельными осями задача решается тривиально.<p>Но Minecraft JE писала не я, поэтому там реализация иная.<p><a href=ru/minecraft-compares-arrays-in-cubic-time/>Keep reading</a></div><div class=post-entry><h2><a href=./webp-the-webpage-compression-format/>WebP: The WebPage compression format</a></h2><time>September 7, 2024</time><a class=discussion href=https://news.ycombinator.com/item?id=41475124><i class="nf nf-md-comment"title=Comment></i> Hacker News</a><a class=discussion href=https://www.reddit.com/r/programming/comments/1fb5pzh/webp_the_webpage_compression_format/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><a class=discussion href=https://lobste.rs/s/t81n2g/webp_webpage_compression_format><i class="nf nf-md-comment"title=Comment></i> Lobsters</a><a class=discussion href=https://habr.com/ru/articles/841754/><i class="nf nf-md-translate"title=Translation></i> Russian</a><p>I want to provide a smooth experience to my site visitors, so I work on accessibility and ensure it works without JavaScript enabled. I care about page load time because some pages contain large illustrations, so I minify my HTML.<p>But one <em>thing</em> makes turning my blog light as a feather a pain in the ass.<p><a href=./webp-the-webpage-compression-format/>Keep reading</a></div><div class=post-entry><h2><a href=./division-is-hard-but-it-does-not-have-to-be/>Division is hard, but it doesn't have to be</a></h2><time>August 24, 2024</time><a class=discussion href=https://www.reddit.com/r/programming/comments/1f0n8sk/division_is_hard_but_it_doesnt_have_to_be/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Developers don’t usually divide numbers all the time, but hashmaps often need to compute <a href=https://en.wikipedia.org/wiki/Remainder>remainders</a> modulo a prime. Hashmaps are really common, so fast division is useful.<p>For instance, rolling hashes might compute <code>u128 % u64</code> with a fixed divisor. Compilers just drop the ball here:<pre><code class=language-rust><span class=hljs-keyword>fn</span> <span class="hljs-title function_">modulo</span>(n: <span class=hljs-type>u128</span>) <span class=hljs-punctuation>-></span> <span class=hljs-type>u64</span> {
+<!doctypehtml><meta charset=utf-8><meta content=width=device-width,initial-scale=1 name=viewport><title>purplesyringa's blog</title><link href="{{ root }}/favicon.ico?v=2"rel=icon><link href=../all.css rel=stylesheet><link href=../blog.css rel=stylesheet><link href=../vendor/Temml-Local.css rel=stylesheet><link crossorigin href=https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&family=Roboto+Mono:ital,wght@0,100..700;1,100..700&family=Roboto:ital,wght@0,400;0,700;1,400;1,700&family=Slabo+27px&display=swap rel=stylesheet><link href=../fonts/webfont.css rel=stylesheet><link media="screen and (prefers-color-scheme: dark"href=../vendor/atom-one-dark.min.css rel=stylesheet><link media="screen and (prefers-color-scheme: light"href=../vendor/a11y-light.min.css rel=stylesheet><link title="Blog posts"href=/blog/feed.rss rel=alternate type=application/rss+xml><script data-website-id=0da1961d-43f2-45cc-a8e2-75679eefbb69 defer src=https://zond.tei.su/script.js></script><body><header><div class=viewport-container><div class=media><a href=https://github.com/purplesyringa><img alt=GitHub src=../images/github-mark-white.svg></a></div><h1><a href=/>purplesyringa</a></h1><nav><a href=..>about</a><a class=current href>blog</a><a href=../sink/>kitchen sink</a></nav></div></header><section><div class=viewport-container><p><a href=/blog/feed.rss><i class="nf nf-fa-rss_square"title=RSS></i> Subscribe to RSS</a><div class=post-entry><h2><a href=./thoughts-on-rust-hashing/>Thoughts on Rust hashing</a></h2><time>December 12, 2024</time><p>In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:<p>How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.<p><a href=./thoughts-on-rust-hashing/>Keep reading</a></div><div class=post-entry><h2><a href=./any-python-program-fits-in-24-characters/>Any Python program fits in 24 characters*</a></h2><time>November 17, 2024</time><p><em>* If you don’t take whitespace into account.</em><p>My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode <em>any</em> Python program in <eq><math><mn>24</mn></math></eq> bytes, ignoring whitespace.<p><a href=./any-python-program-fits-in-24-characters/>Keep reading</a></div><div class=post-entry><h2><a href=./the-rust-trademark-policy-is-still-harmful/>The Rust Trademark Policy is still harmful</a></h2><time>November 10, 2024</time><a class=discussion href=https://www.reddit.com/r/rust/comments/1gnz5sm/the_rust_trademark_policy_is_still_harmful/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Four days ago, the Rust Foundation <a href=https://blog.rust-lang.org/2024/11/06/trademark-update.html>released</a> a new <a href=https://drive.google.com/file/d/1hjTx11Fb-4W7RQLmp3R8BLDACc7zxIpG/view>draft</a> of the Rust Language Trademark Policy. The previous draft caused division within the community several years ago, prompting its retraction with the aim of creating a new, milder version.<p>Well, that failed. While certain issues were addressed (thank you, we appreciate it!), the new version remains excessively restrictive and, in my opinion, will harm both the Rust community as a whole <em>and</em> compiler and crate developers. While I expect the stricter rules to not be enforced in practice, I don’t want to constantly feel like I’m under threat while contributing to the Rust ecosystem, and this is exactly what it would feel like if this draft is finalized.<p>Below are some of my core objections to the draft.<p><a href=./the-rust-trademark-policy-is-still-harmful/>Keep reading</a></div><div class=post-entry><h2><a href=./bringing-faster-exceptions-to-rust/>Bringing faster exceptions to Rust</a></h2><time>November 6, 2024</time><a class=discussion href=https://www.reddit.com/r/rust/comments/1gl050z/bringing_faster_exceptions_to_rust/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Three months ago, I wrote about why <a href=../you-might-want-to-use-panics-for-error-handling/>you might want to use panics for error handling</a>. Even though it’s a catchy title, panics are hardly suited for this goal, even if you try to hack around with macros and libraries. The real star is <em>the unwinding mechanism</em>, which powers panics. This post is the first in a series exploring what unwinding is, how to speed it up, and how it can benefit Rust and C++ programmers.<p><a href=./bringing-faster-exceptions-to-rust/>Keep reading</a></div><div class=post-entry><h2><a href=./we-built-the-best-bad-apple-in-minecraft/>We built the best "Bad Apple!!" in Minecraft</a></h2><time>October 10, 2024</time><a class=discussion href=https://news.ycombinator.com/item?id=41798369><i class="nf nf-md-comment"title=Comment></i> Hacker News</a><p>Demoscene is the art of pushing computers to perform tasks they weren’t designed to handle. One recurring theme in demoscene is the shadow-art animation “Bad Apple!!”. We’ve played it on the Commodore 64, <a href=https://en.wikipedia.org/wiki/Vectrex>Vectrex</a> (a unique game console utilizing only vector graphics), <a href=https://www.youtube.com/watch?v=SDvk3aL78fI>Impulse Tracker</a>, and even <a href=https://tasvideos.org/6012M>exploited Super Mario Bros.</a> to play it.<p>But how about Bad Apple!!.. in Minecraft?<p><a href=./we-built-the-best-bad-apple-in-minecraft/>Keep reading</a></div><div class=post-entry><h2><a href=ru/minecraft-compares-arrays-in-cubic-time/>Minecraft сравнивает массивы за куб</a></h2><time>September 14, 2024</time><a class=discussion href=https://t.me/alisa_rummages/156><i class="nf nf-md-comment"title=Comment></i> Telegram</a><p>Коллизии в играх обнаруживаются тяжелыми алгоритмами. Для примера попробуйте представить себе, насколько сложно это для просто двух произвольно повернутых кубов в пространстве. Они могут контактировать двумя ребрами, вершиной и гранью или еще как-то более сложно.<p>В майнкрафте вся геометрия хитбоксов параллельна осям координат, т.е. наклона не бывает. Это сильно упрощает поиск коллизий.<p>Я бы такое писала просто. Раз хитбокс блока — это объединение нескольких параллелепипедов, то можно его так и хранить: как список 6-элементных тьюплов. В подавляющем большинстве случаев этот список будет очень коротким. Для обычных кубов его длина — 1, для стеклопаналей может достигать 2, наковальня, о боги, состоит из 3 элементов, а стены могут иметь их аж целых 4. Для проверки хитбоксов на пересечение достаточно перебрать пары параллелепипедов двух хитбоксов (кажется, их может быть максимум 16). Для параллелепипедов с параллельными осями задача решается тривиально.<p>Но Minecraft JE писала не я, поэтому там реализация иная.<p><a href=ru/minecraft-compares-arrays-in-cubic-time/>Keep reading</a></div><div class=post-entry><h2><a href=./webp-the-webpage-compression-format/>WebP: The WebPage compression format</a></h2><time>September 7, 2024</time><a class=discussion href=https://news.ycombinator.com/item?id=41475124><i class="nf nf-md-comment"title=Comment></i> Hacker News</a><a class=discussion href=https://www.reddit.com/r/programming/comments/1fb5pzh/webp_the_webpage_compression_format/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><a class=discussion href=https://lobste.rs/s/t81n2g/webp_webpage_compression_format><i class="nf nf-md-comment"title=Comment></i> Lobsters</a><a class=discussion href=https://habr.com/ru/articles/841754/><i class="nf nf-md-translate"title=Translation></i> Russian</a><p>I want to provide a smooth experience to my site visitors, so I work on accessibility and ensure it works without JavaScript enabled. I care about page load time because some pages contain large illustrations, so I minify my HTML.<p>But one <em>thing</em> makes turning my blog light as a feather a pain in the ass.<p><a href=./webp-the-webpage-compression-format/>Keep reading</a></div><div class=post-entry><h2><a href=./division-is-hard-but-it-does-not-have-to-be/>Division is hard, but it doesn't have to be</a></h2><time>August 24, 2024</time><a class=discussion href=https://www.reddit.com/r/programming/comments/1f0n8sk/division_is_hard_but_it_doesnt_have_to_be/><i class="nf nf-md-comment"title=Comment></i> Reddit</a><p>Developers don’t usually divide numbers all the time, but hashmaps often need to compute <a href=https://en.wikipedia.org/wiki/Remainder>remainders</a> modulo a prime. Hashmaps are really common, so fast division is useful.<p>For instance, rolling hashes might compute <code>u128 % u64</code> with a fixed divisor. Compilers just drop the ball here:<pre><code class=language-rust><span class=hljs-keyword>fn</span> <span class="hljs-title function_">modulo</span>(n: <span class=hljs-type>u128</span>) <span class=hljs-punctuation>-></span> <span class=hljs-type>u64</span> {
     (n % <span class=hljs-number>0xffffffffffffffc5</span>) <span class=hljs-keyword>as</span> <span class=hljs-type>u64</span>
 }
 </code></pre><pre><code class=language-x86asm><span class=hljs-symbol>modulo:</span>
diff --git a/blog/thoughts-on-rust-hashing/index.html b/blog/thoughts-on-rust-hashing/index.html
index 83fd4c4..27ce535 100644
--- a/blog/thoughts-on-rust-hashing/index.html
+++ b/blog/thoughts-on-rust-hashing/index.html
@@ -1,5 +1,5 @@
 <!doctypehtml><html prefix="og: http://ogp.me/ns#"lang=en_US><meta charset=utf-8><meta content=width=device-width,initial-scale=1 name=viewport><title>Thoughts on Rust hashing | purplesyringa's blog</title><link href=../../favicon.ico?v=2 rel=icon><link href=../../all.css rel=stylesheet><link href=../../blog.css rel=stylesheet><link href=../../vendor/Temml-Local.css rel=stylesheet><link crossorigin href=https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&family=Roboto+Mono:ital,wght@0,100..700;1,100..700&family=Roboto:ital,wght@0,400;0,700;1,400;1,700&family=Slabo+27px&display=swap rel=stylesheet><link href=../../fonts/webfont.css rel=stylesheet><link media="screen and (prefers-color-scheme: dark"href=../../vendor/atom-one-dark.min.css rel=stylesheet><link media="screen and (prefers-color-scheme: light"href=../../vendor/a11y-light.min.css rel=stylesheet><link title="Blog posts"href=../../blog/feed.rss rel=alternate type=application/rss+xml><meta content="Thoughts on Rust hashing"property=og:title><meta content=article property=og:type><meta content=https://purplesyringa.moe/blog/thoughts-on-rust-hashing/og.png property=og:image><meta content=https://purplesyringa.moe/blog/thoughts-on-rust-hashing/ property=og:url><meta content="In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:
-How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance."property=og:description><meta content=en_US property=og:locale><meta content="purplesyringa's blog"property=og:site_name><meta content=summary_large_image name=twitter:card><meta content=https://purplesyringa.moe/blog/thoughts-on-rust-hashing/og.png name=twitter:image><script data-website-id=0da1961d-43f2-45cc-a8e2-75679eefbb69 defer src=https://zond.tei.su/script.js></script><body><header><div class=viewport-container><div class=media><a href=https://github.com/purplesyringa><img alt=GitHub src=../../images/github-mark-white.svg></a></div><h1><a href=/>purplesyringa</a></h1><nav><a href=../..>about</a><a class=current href=../../blog/>blog</a><a href=../../sink/>kitchen sink</a></nav></div></header><section><div class=viewport-container><h2>Thoughts on Rust hashing</h2><time>December 11, 2024</time><p class=next-group><span aria-level=3 class=side-header role=heading><span>Intro</span></span>In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:<p>How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.<p>How do you mix hashes? You can:<ul><li>Leave that to the users. Everyone will then invent their own terrible mixers, like <code>x ^ y</code>. Indeed, both arguments are pseudo-random, what could possibly go wrong?<li>Provide a good-enough mixer for most use cases, like <code>a * x + y</code>. Cue CVEs because people used <code>mix(x, mix(y, z))</code> instead of <code>mix(mix(x, y), z)</code>.<li>Provide a quality mixer, missing out on performance in common simple cases.</ul><p>What if the input data is already random? Then you’re just wasting cycles.<p>What guarantees do you provide regarding the hash values?<ul><li>Do you require the avalanche effect? Your hash is suboptimal even for simple power-of-two-sized hash tables.<li>Do you require a half-avalanche effect instead? Congrats, you broke either those or prime-sized hash tables.<li>Do you require the hash table to perform finalization manually? Using strings as keys is now suboptimal, because computing a non-finalized hash of a string is of good enough quality already.</ul><p>Is your hash function seeded?<ul><li>If not, hi DoS.<li>If yes, but you reuse the same seed between different hash tables, <a href=https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion>your tables are now quadratic</a>.<li>If the seed is explicitly passed to each hasher, how do you ensure different hashers don’t accidentally cancel out?</ul><p class=next-group><span aria-level=3 class=side-header role=heading><span>In Rust</span></span>Rust learnt from these mistakes by splitting the responsibilities:<ul><li>Objects implement the <code>Hash</code> trait, allowing them to write underlying data into a <code>Hasher</code>.<li>Hashers implement the <code>Hasher</code> trait, which hashes the data written by <code>Hash</code> objects.</ul><p>Objects turn the structured data into a stream of integers; hashers turn the stream into a numeric hash.<p>On paper, this is a good solution:<ul><li>Hashing an integer is as simple as sending the integer to the hasher. Consumers can choose hashers that provide the necessary guarantees.<li>Users don’t have to mix hashes. Hashers can do that optimally.<li>If the data is known to be random, a fast simple hasher can be used without changing the <code>Hash</code> implementation.<li>Different hash tables can use different hashers, efficiently providing only as much avalanche as necessary.<li>The hasher can be seeded per-table. Only the hasher has access to the seed, so safely using the seed during mixing is easy.</ul><p>Surely this enables optimal and performant hashing in practice, right?<p class=next-group><span aria-level=3 class=side-header role=heading><span>No</span></span>Let’s take a look at the <code>Hasher</code> API:<pre><code class=language-rust><span class=hljs-keyword>pub</span> <span class=hljs-keyword>trait</span> <span class="hljs-title class_">Hasher</span> {
+How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance."property=og:description><meta content=en_US property=og:locale><meta content="purplesyringa's blog"property=og:site_name><meta content=summary_large_image name=twitter:card><meta content=https://purplesyringa.moe/blog/thoughts-on-rust-hashing/og.png name=twitter:image><script data-website-id=0da1961d-43f2-45cc-a8e2-75679eefbb69 defer src=https://zond.tei.su/script.js></script><body><header><div class=viewport-container><div class=media><a href=https://github.com/purplesyringa><img alt=GitHub src=../../images/github-mark-white.svg></a></div><h1><a href=/>purplesyringa</a></h1><nav><a href=../..>about</a><a class=current href=../../blog/>blog</a><a href=../../sink/>kitchen sink</a></nav></div></header><section><div class=viewport-container><h2>Thoughts on Rust hashing</h2><time>December 12, 2024</time><p class=next-group><span aria-level=3 class=side-header role=heading><span>Intro</span></span>In languages like Python, Java, or C++, values are hashed by calling a “hash me” method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:<p>How do you hash an integer? If you use a no-op hasher (booo), DoS attacks on hash tables are inevitable. If you hash it thoroughly, consumers that only cache hashes to optimize equality checks lose out of performance.<p>How do you mix hashes? You can:<ul><li>Leave that to the users. Everyone will then invent their own terrible mixers, like <code>x ^ y</code>. Indeed, both arguments are pseudo-random, what could possibly go wrong?<li>Provide a good-enough mixer for most use cases, like <code>a * x + y</code>. Cue CVEs because people used <code>mix(x, mix(y, z))</code> instead of <code>mix(mix(x, y), z)</code>.<li>Provide a quality mixer, missing out on performance in common simple cases.</ul><p>What if the input data is already random? Then you’re just wasting cycles.<p>What guarantees do you provide regarding the hash values?<ul><li>Do you require the avalanche effect? Your hash is suboptimal even for simple power-of-two-sized hash tables.<li>Do you require a half-avalanche effect instead? Congrats, you broke either those or prime-sized hash tables.<li>Do you require the hash table to perform finalization manually? Using strings as keys is now suboptimal, because computing a non-finalized hash of a string is of good enough quality already.</ul><p>Is your hash function seeded?<ul><li>If not, hi DoS.<li>If yes, but you reuse the same seed between different hash tables, <a href=https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion>your tables are now quadratic</a>.<li>If the seed is explicitly passed to each hasher, how do you ensure different hashers don’t accidentally cancel out?</ul><p class=next-group><span aria-level=3 class=side-header role=heading><span>In Rust</span></span>Rust learnt from these mistakes by splitting the responsibilities:<ul><li>Objects implement the <code>Hash</code> trait, allowing them to write underlying data into a <code>Hasher</code>.<li>Hashers implement the <code>Hasher</code> trait, which hashes the data written by <code>Hash</code> objects.</ul><p>Objects turn the structured data into a stream of integers; hashers turn the stream into a numeric hash.<p>On paper, this is a good solution:<ul><li>Hashing an integer is as simple as sending the integer to the hasher. Consumers can choose hashers that provide the necessary guarantees.<li>Users don’t have to mix hashes. Hashers can do that optimally.<li>If the data is known to be random, a fast simple hasher can be used without changing the <code>Hash</code> implementation.<li>Different hash tables can use different hashers, efficiently providing only as much avalanche as necessary.<li>The hasher can be seeded per-table. Only the hasher has access to the seed, so safely using the seed during mixing is easy.</ul><p>Surely this enables optimal and performant hashing in practice, right?<p class=next-group><span aria-level=3 class=side-header role=heading><span>No</span></span>Let’s take a look at the <code>Hasher</code> API:<pre><code class=language-rust><span class=hljs-keyword>pub</span> <span class=hljs-keyword>trait</span> <span class="hljs-title class_">Hasher</span> {
     <span class=hljs-comment>// Required methods</span>
     <span class=hljs-keyword>fn</span> <span class="hljs-title function_">finish</span>(&<span class=hljs-keyword>self</span>) <span class=hljs-punctuation>-></span> <span class=hljs-type>u64</span>;
     <span class=hljs-keyword>fn</span> <span class="hljs-title function_">write</span>(&<span class=hljs-keyword>mut</span> <span class=hljs-keyword>self</span>, bytes: &[<span class=hljs-type>u8</span>]);
@@ -24,7 +24,7 @@
     <span class=hljs-keyword>let</span> <span class=hljs-variable>block</span> = <span class=hljs-type>u64</span>::<span class="hljs-title function_ invoke__">from_ne_bytes</span>(*block);
     *state = state.<span class="hljs-title function_ invoke__">wrapping_mul</span>(K).<span class="hljs-title function_ invoke__">wrapping_add</span>(block);
 }
-</code></pre><p>This is just a multiplicative hash, not unlike FNV-1, but consuming <eq><math><mn>8</mn></math></eq> bytes at a time instead of <eq><math><mn>1</mn></math></eq>.<p>Now what happens if you try to hash two 32-bit integers with this hash? With padding, that will compile to two multiplications even though one would work. This halves throughput and increases latency.<p>Practical hashes uses much larger blocks. <code>rapidhash</code> has a <eq><math><mn>24</mn></math></eq>-byte state and can absorb <eq><math><mn>48</mn></math></eq> bytes at once. <code>ahash</code> has a <eq><math><mn>48</mn></math></eq>-byte state and absorbs <eq><math><mn>64</mn></math></eq>-byte blocks. <code>meowhash</code> has a <eq><math><mn>128</mn></math></eq>-byte state and absorbs <eq><math><mn>256</mn></math></eq> bytes. (I only selected these particular hashes because I’m familiar with their kernels; others have similar designs.)<p>These are some of the fastest non-cryptographic hashes in the world. Do you really want to nuke their performance by padding <eq><math><mn>8</mn></math></eq>-byte inputs to <eq><math><mn>48</mn></math></eq>, <eq><math><mn>64</mn></math></eq>, or <eq><math><mn>256</mn></math></eq> bytes? Probably not.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Chains</span></span>Okay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than absorbing a full block?<p>Say, the <code>rapidhash</code> kernel is effectively <em>this</em>:<pre><code class=language-rust><span class=hljs-keyword>fn</span> <span class="hljs-title function_">absorb</span>(state: &<span class=hljs-keyword>mut</span> [<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], seed: &[<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], block: &[<span class=hljs-type>u64</span>; <span class=hljs-number>6</span>]) {
+</code></pre><p>This is just a multiplicative hash, not unlike FNV-1, but consuming <eq><math><mn>8</mn></math></eq> bytes at a time instead of <eq><math><mn>1</mn></math></eq>.<p>Now what happens if you try to hash two 32-bit integers with this hash? With padding, that will compile to two multiplications even though one would work. This halves throughput and increases latency.<p>Practical hashes use much larger blocks. <code>rapidhash</code> has a <eq><math><mn>24</mn></math></eq>-byte state and can absorb <eq><math><mn>48</mn></math></eq> bytes at once. <code>ahash</code> has a <eq><math><mn>48</mn></math></eq>-byte state and absorbs <eq><math><mn>64</mn></math></eq>-byte blocks. <code>meowhash</code> has a <eq><math><mn>128</mn></math></eq>-byte state and absorbs <eq><math><mn>256</mn></math></eq> bytes. (I only selected these particular hashes because I’m familiar with their kernels; others have similar designs.)<p>These are some of the fastest non-cryptographic hashes in the world. Do you really want to nuke their performance by padding <eq><math><mn>8</mn></math></eq>-byte inputs to <eq><math><mn>48</mn></math></eq>, <eq><math><mn>64</mn></math></eq>, or <eq><math><mn>256</mn></math></eq> bytes? Probably not.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Chains</span></span>Okay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than by absorbing a full block?<p>Say, the <code>rapidhash</code> kernel is effectively <em>this</em>:<pre><code class=language-rust><span class=hljs-keyword>fn</span> <span class="hljs-title function_">absorb</span>(state: &<span class=hljs-keyword>mut</span> [<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], seed: &[<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], block: &[<span class=hljs-type>u64</span>; <span class=hljs-number>6</span>]) {
     <span class=hljs-keyword>for</span> <span class=hljs-variable>i</span> <span class=hljs-keyword>in</span> <span class=hljs-number>0</span>..<span class=hljs-number>3</span> {
         state[i] = <span class="hljs-title function_ invoke__">mix</span>(block[i] ^ state[i], block[i + <span class=hljs-number>3</span>] ^ seed[i]);
     }
@@ -32,7 +32,7 @@
 </code></pre><p>That’s three independent iterations, so <em>surely</em> we can absorb a smaller 64-bit block like this instead:<pre><code class=language-rust><span class=hljs-keyword>fn</span> <span class="hljs-title function_">absorb_64bit</span>(state: &<span class=hljs-keyword>mut</span> [<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], seed: &[<span class=hljs-type>u64</span>; <span class=hljs-number>3</span>], block: <span class=hljs-type>u64</span>) {
     state[<span class=hljs-number>0</span>] = <span class="hljs-title function_ invoke__">mix</span>(block ^ state[<span class=hljs-number>0</span>], seed[<span class=hljs-number>0</span>]);
 }
-</code></pre><p>Surely this is going to reduce the <eq><math><mrow><mn>6</mn><mo>×</mo></mrow></math></eq> slowdown to at least something like <eq><math><mrow><mn>2</mn><mo>×</mo></mrow></math></eq>, right?<p>Why does <code>rapidhash</code> even use three independent chains in the first place? That’s right, latency!<p><code>mix</code> has a <eq><math><mn>5</mn></math></eq> tick latency on modern x86 processors, but a throughput of <eq><math><mn>1</mn></math></eq>. Chain independence allows a <eq><math><mn>16</mn></math></eq>-byte block to be consumed without waiting for the previous <eq><math><mn>16</mn></math></eq> bytes to be mixed in. We just threw this optimization out.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Accumulation</span></span>Okay, so padding is a terrible idea. Can we accumulate a buffer instead? How much hashes I had to scroll through in SMHasher before I found <em>one</em> Rust implementation that took this approach is a warning bell.<p><a href=https://docs.rs/farmhash/1.1.5/src/farmhash/lib.rs.html#92-110>The implementation I found</a>, of course, stores a <code>Vec&LTu8></code> and passes it to the underlying hasher in <code>finish</code>. I believe I don’t need to explain why allocating during hash function is not the brightest idea.<p>Let’s consider <a href=https://docs.rs/highway/1.2.0/src/highway/portable.rs.html#272-288>another implementation</a> that stores a fixed-size buffer instead. Huh, that’s a lot of <code>if</code>s and <code>for</code>s. I wonder what Godbolt will say about this. Let’s try something very simple:<pre><code class=language-rust><div class=expansible-code><input id=expansible1 type=checkbox><div class=highlighted><span class=hljs-keyword>struct</span> <span class="hljs-title class_">StreamingHasher</span> {
+</code></pre><p>Surely this is going to reduce the <eq><math><mrow><mn>6</mn><mo>×</mo></mrow></math></eq> slowdown to at least something like <eq><math><mrow><mn>2</mn><mo>×</mo></mrow></math></eq>, right?<p>Why does <code>rapidhash</code> even use three independent chains in the first place? That’s right, latency!<p><code>mix</code> has a <eq><math><mn>5</mn></math></eq> tick latency on modern x86 processors, but a throughput of <eq><math><mn>1</mn></math></eq>. Chain independence allows a <eq><math><mn>16</mn></math></eq>-byte block to be consumed without waiting for the previous <eq><math><mn>16</mn></math></eq> bytes to be mixed in. We just threw this optimization out.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Accumulation</span></span>Okay, so padding is a terrible idea. Can we accumulate a buffer instead? How much hashes I had to scroll through in SMHasher before I found <em>one</em> Rust implementation that took this approach is a warning bell.<p><a href=https://docs.rs/farmhash/1.1.5/src/farmhash/lib.rs.html#92-110>The implementation I found</a>, of course, stores a <code>Vec&LTu8></code> and passes it to the underlying hasher in <code>finish</code>. I believe I don’t need to explain why allocating in a hash function is not the brightest idea.<p>Let’s consider <a href=https://docs.rs/highway/1.2.0/src/highway/portable.rs.html#272-288>another implementation</a> that stores a fixed-size buffer instead. Huh, that’s a lot of <code>if</code>s and <code>for</code>s. I wonder what Godbolt will say about this. Let’s try something very simple:<pre><code class=language-rust><div class=expansible-code><input id=expansible1 type=checkbox><div class=highlighted><span class=hljs-keyword>struct</span> <span class="hljs-title class_">StreamingHasher</span> {
     block_hasher: BlockHasher,
     buffer: [<span class=hljs-type>u8</span>; <span class=hljs-number>8</span>],
     length: <span class=hljs-type>usize</span>,
@@ -252,7 +252,17 @@
 alloc::vec::Vec&LTruined_portal::NewType>: 177.900032ms (-> 4ae6133ab0e0fe9f)
 </code></pre><p><code>highway</code>:<pre><code>alloc::vec::Vec&LTi32>: 53.843217ms (-> f2e68b031ff10c02)
 alloc::vec::Vec&LTruined_portal::NewType>: 547.520541ms (-> f2e68b031ff10c02)
-</code></pre><p>That’s not good. Note that all hashers have about the same performance on <code>Vec&LTi32></code>. That’s about the speed of RAM. For small arrays that fits in cache, the difference is even more prominent. (I didn’t verify this, but I am the smartest person in the room and thus am obviously right.)<h2>My goal</h2><p class=next-group><span aria-level=3 class=side-header role=heading><span>(Kinda)</span></span>What I really want is a general-purpose hash that’s good for most practical purposes and kinda DoS-resistant but not necessarily cryptographic. It needs to perform fast on short inputs, so it can’t be a “real” block hash, but rather something close to <code>rapidhash</code>.<p>We want:<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>consume</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>x</mi><mo separator=true>,</mo><mi>y</mi><mo form=postfix stretchy=false>)</mo><mo>=</mo><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⊕︎</mo><mi>a</mi><mo separator=true>,</mo><mi>y</mi><mo>⊕︎</mo><mi>C</mi><mo form=postfix stretchy=false>)</mo><mi>.</mi></mrow></math></eqn></section><p>Right, Rust doesn’t support this. Okay, let’s try another relatively well-known scheme that might be easier to implement. It’s parallel, surely that’ll help?<p>To hash a <eq><math><mn>64</mn></math></eq>-bit word sequence <eq><math><mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator=true>,</mo><mo>…</mo><mo separator=true>,</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo></mrow></math></eq>, we compute<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>⊕︎</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator=true>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>⊕︎</mo><msub><mi>a</mi><mn>2</mn></msub><mo form=postfix stretchy=false>)</mo><mo>+</mo><mo>⋯</mo><mo>+</mo><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>⊕︎</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><mo separator=true>,</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo>⊕︎</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo><mo separator=true>,</mo></mrow></math></eqn></section><p>where <eq><math><mrow><mo form=prefix stretchy=false>(</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator=true>,</mo><mo>…</mo><mo separator=true>,</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo></mrow></math></eq> is random data (possibly generated from the seed once), and<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo separator=true>,</mo><mi>y</mi><mo form=postfix stretchy=false>)</mo><mo>=</mo><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⋅</mo><mi>y</mi><mo lspace=0.2222em rspace=0.2222em>mod</mo><msup><mn>2</mn><mn>64</mn></msup><mo form=postfix stretchy=false>)</mo><mo>⊕︎</mo><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⋅</mo><mi>y</mi><mo lspace=0.1667em rspace=0.1667em><mi>d</mi><mi>i</mi><mi>v</mi></mo><msup><mn>2</mn><mn>64</mn></msup><mo form=postfix stretchy=false>)</mo><mi>.</mi></mrow></math></eqn></section><p>This is a combination of certain well-known primitives. The problem here is that <eq><math><msub><mi>a</mi><mi>i</mi></msub></math></eq> needs to be precomputed beforehand. This is not a problem for fixed-length keys, like structs of integers – something often used in, say, <code>rustc</code>.<p>Unfortunately, Rust forces each hasher to handle <em>all</em> possible inputs, including of different lengths, so this scheme can’t work. The hasher isn’t even parametrized by the type of the hashed object. Four well-layouted 64-bit integers that can easily be mixed together with just two full-width multiplications? Nah, <code>write_u64</code> goes brrrrrrrrrrrr-<p class=next-group><span aria-level=3 class=side-header role=heading><span>Stop bitching</span></span>I’ve been designing fast hash-based data structures for several months before realizing they are almost unusable because of these design decisions. <em>Surely</em> something that isn’t a problem in C++ and Python won’t be a problem in Rust, I thought. I deserve a little bitching, okay?<p class=next-group><span aria-level=3 class=side-header role=heading><span>Actually how</span></span>The obvious way forward is to bring the structure of the data back into the picture. If the hasher knew it’s hashing fixed-size data, it could use the <eq><math><msub><mi>a</mi><mi>i</mi></msub></math></eq> approach. If the hasher knew it’s hashing an array, it could vectorize the computation of individual hashes. If the hasher knew the types of the fields in the structure it’s hashing, it could prevent tearing, or perhaps merge small fields into 64-bit blocks efficiently. Alas, the hasher is clueless…<p>In my opinion, <code>Hasher</code> and <code>Hash</code> are a wrong abstraction. Instead of the <code>Hash</code> driving the <code>Hasher</code> <s>insane</s>, it should be the other way round: <code>Hash</code> providing introspection facilities and <code>Hasher</code> navigating the hashed objects recursively. As a bonus, this could enable (opt-in) portable hashers.<p>How this API should look like and whether it can be shoehorned into the existing interfaces remains to be seen. I have not started work on the design yet, and perhaps this article might be a bit premature, but I’d love to hear your thoughts on how I missed something really obvious (or, indeed, on how Rust is fast enough and no one cares).</div></section><footer><div class=viewport-container><h2>Made with my own bare hands (why.)</h2></div></footer><script>window.addEventListener("keydown", e => {
+</code></pre><p>That’s not good. Note that all hashers have about the same performance on <code>Vec&LTi32></code>. That’s about the speed of RAM. For small arrays that fits in cache, the difference is even more prominent. (I didn’t verify this, but I am the smartest person in the room and thus am obviously right.)<h2>My goal</h2><p class=next-group><span aria-level=3 class=side-header role=heading><span>(Kinda)</span></span>What I really want is a general-purpose hash that’s good for most practical purposes and kinda DoS-resistant but not necessarily cryptographic. It needs to perform fast on short inputs, so it can’t be a “real” block hash, but rather something close to <code>rapidhash</code>.<p>We want:<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>consume</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>x</mi><mo separator=true>,</mo><mi>y</mi><mo form=postfix stretchy=false>)</mo><mo>=</mo><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⊕︎</mo><mi>a</mi><mo separator=true>,</mo><mi>y</mi><mo>⊕︎</mo><mi>C</mi><mo form=postfix stretchy=false>)</mo><mi>.</mi></mrow></math></eqn></section><p>Right, Rust doesn’t support this. Okay, let’s try another relatively well-known scheme that might be easier to implement. It’s parallel, surely that’ll help?<p>To hash a <eq><math><mn>64</mn></math></eq>-bit word sequence <eq><math><mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator=true>,</mo><mo>…</mo><mo separator=true>,</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo></mrow></math></eq>, we compute<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>⊕︎</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator=true>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>⊕︎</mo><msub><mi>a</mi><mn>2</mn></msub><mo form=postfix stretchy=false>)</mo><mo>+</mo><mo>⋯</mo><mo>+</mo><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>⊕︎</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><mo separator=true>,</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo>⊕︎</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo><mo separator=true>,</mo></mrow></math></eqn></section><p>where <eq><math><mrow><mo form=prefix stretchy=false>(</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator=true>,</mo><mo>…</mo><mo separator=true>,</mo><msub><mi>a</mi><mrow><mn>2</mn><mi>n</mi></mrow></msub><mo form=postfix stretchy=false>)</mo></mrow></math></eq> is random data (possibly generated from the seed once), and<section><eqn><math style="display:block math;"class=tml-display display=block><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo separator=true>,</mo><mi>y</mi><mo form=postfix stretchy=false>)</mo><mo>=</mo><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⋅</mo><mi>y</mi><mo lspace=0.2222em rspace=0.2222em>mod</mo><msup><mn>2</mn><mn>64</mn></msup><mo form=postfix stretchy=false>)</mo><mo>⊕︎</mo><mo form=prefix stretchy=false>(</mo><mi>x</mi><mo>⋅</mo><mi>y</mi><mo lspace=0.1667em rspace=0.1667em><mi>d</mi><mi>i</mi><mi>v</mi></mo><msup><mn>2</mn><mn>64</mn></msup><mo form=postfix stretchy=false>)</mo><mi>.</mi></mrow></math></eqn></section><p>This is a UMAC-style combination of certain well-known primitives. The problem here is that <eq><math><msub><mi>a</mi><mi>i</mi></msub></math></eq> needs to be precomputed beforehand. This is not a problem for fixed-length keys, like structs of integers – something often used in, say, <code>rustc</code>.<p>Unfortunately, Rust forces each hasher to handle <em>all</em> possible inputs, including of different lengths, so this scheme can’t work. The hasher isn’t even parametrized by the type of the hashed object. Four well-layouted 64-bit integers that can easily be mixed together with just two full-width multiplications? Nah, <code>write_u64</code> goes brrrrrrrrrrrr-<p class=next-group><span aria-level=3 class=side-header role=heading><span>Stop bitching</span></span>I’ve been designing fast hash-based data structures for several months before realizing they are, in fact, not fast, purely because of the hashing performance. <em>Surely</em> something that isn’t a problem in C++ and Python won’t be a problem in Rust, I thought.<p>But yeah, sorry for whining.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Actually how</span></span>The obvious way forward is to bring the structure of the data back into the picture. If the hasher knew it’s hashing fixed-size data, it could use the <eq><math><msub><mi>a</mi><mi>i</mi></msub></math></eq> approach. If the hasher knew it’s hashing an array, it could vectorize the computation of individual hashes. If the hasher knew the types of the fields in the structure it’s hashing, it could prevent tearing, or perhaps merge small fields into 64-bit blocks efficiently. Alas, the hasher is clueless…<p>In my opinion, <code>Hasher</code> and <code>Hash</code> are a wrong abstraction. Instead of the <code>Hash</code> driving the <code>Hasher</code> <s>insane</s>, it should be the other way round: <code>Hash</code> providing introspection facilities and <code>Hasher</code> navigating the hashed objects recursively. As a bonus, this could enable (opt-in) portable hashers.<p>How this API should look like and whether it can be shoehorned into the existing interfaces remains to be seen. I have not started work on the design yet, and perhaps this article might be a bit premature, but I’d love to hear your thoughts on how I missed something really obvious (or, indeed, on how Rust is fast enough and no one cares).<h2>Non-solutions</h2><p class=next-group><span aria-level=3 class=side-header role=heading><span>Like C++</span></span>I’d like to note that the way Java, C++, and Python take is not without its own share of problems. The good news is that fields in a product type are hashed the same way regardless of the values of other fields. For example, hashing <code>(Vec&LTT>, U)</code> always applies the same hash to <code>U</code> and the mixes it with the hash of <code>Vec&LTT></code>, unlike Rust.<p>However, this approach is suboptimal in the general case. Let’s get back to the UMAC example. Hashing <eq><math><mrow><mo form=prefix stretchy=false>(</mo><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>b</mi><mo form=postfix stretchy=false>)</mo><mo separator=true>,</mo><mi>c</mi><mo form=postfix stretchy=false>)</mo></mrow></math></eq> as <eq><math><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>b</mi><mo form=postfix stretchy=false>)</mo><mo separator=true>,</mo><mi>c</mi><mo form=postfix stretchy=false>)</mo></mrow></math></eq> has a higher latency than necessary: computing <eq><math><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>b</mi><mo form=postfix stretchy=false>)</mo><mo>⊕︎</mo></mrow><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>c</mi><mo separator=true>,</mo><mn>0</mn><mo form=postfix stretchy=false>)</mo></mrow></math></eq> would suffice. But, again, applying this rule generally as <eq><math><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mn>0</mn><mo form=postfix stretchy=false>)</mo><mo>⊕︎</mo></mrow><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>b</mi><mo separator=true>,</mo><mn>0</mn><mo form=postfix stretchy=false>)</mo><mo>⊕︎</mo></mrow><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>c</mi><mo separator=true>,</mo><mn>0</mn><mo form=postfix stretchy=false>)</mo></mrow></math></eq> is suboptimal too.<p>This odd <eq><math><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mi>b</mi><mo form=postfix stretchy=false>)</mo></mrow></math></eq>/<eq><math><mrow><mrow><mtext></mtext><mi>mix</mi></mrow><mo form=prefix stretchy=false>(</mo><mi>a</mi><mo separator=true>,</mo><mn>0</mn><mo form=postfix stretchy=false>)</mo></mrow></math></eq> duality arises because the block size of the UMAC-style approach is, at its minimum, two <eq><math><mn>64</mn></math></eq>-bit words, while hashes take <eq><math><mn>64</mn></math></eq> bits. This distinction gets much worse for larger block sizes.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Specialization</span></span>After this article was published, several people advised me to look into specialization. I’d like to comment a bit on why this does not solve the problem either.<p>Specialization does not support efficient hashing of nested objects. Although <code>(u8, u8, u8, u8)</code> can be specialized to be hashed with <code>write_u32</code>, this gets complicated with types like:<pre><code class=language-rust><span class=hljs-keyword>struct</span> <span class="hljs-title class_">S</span> {
+    a: (<span class=hljs-type>u8</span>, <span class=hljs-type>u16</span>),
+    b: <span class=hljs-type>u8</span>,
+}
+</code></pre><p>The best way to serialize this type is to fit <code>b</code> into the padding byte of <code>a</code>. We can’t do that during layouting, but we can when hashing. This is very hard to do automatically just with specialization, and next to impossible if people implement <code>Hash</code> manually.<p class=next-group><span aria-level=3 class=side-header role=heading><span>Rule of thumb</span></span>The bottom line is: hashing a product type can only be efficient if it’s linearized. Hashing a structure composed of structures <em>needs</em> to consider the nested fields. Each such field <em>needs</em> to be associated with a static index, so that it can be associated with a constant from a pool, an offset in the block, or what have you.<p>Fields that are stored in the structure after variable-length fields like <code>&[T]</code>/<code>Vec&LTT></code> needs to have static indices regardless.<p>This applies to arrays: hashing <code>[T; 2]</code> by performing two calls into <code>T::hash</code> is suboptimal, because that leads to reuse of constants, which in turn requires more thorough mixing for acceptable hash quality.<p>It also applies to <em>slices</em>: hashing <code>[T]</code> needs to split the slice into fixed-size chunks, where each chunk is hashed as a single block. Extending the API to emit start/end annotations for <code>[T]</code> slices does not help <em>either</em>, because the indices of fields inside each <code>T</code> need to be predictable, too. If <code>Hash for T</code> emits <eq><math><mn>3</mn></math></eq> words and the block size is <eq><math><mn>8</mn></math></eq> words, this will vectorize <em>badly</em> due to the misalignment.<p>As much as these rules apply to product types, they apply to sum types. Hashing a <code>Result&LTT, E></code> needs to either produce <eq><math><mrow><msub><mi>h</mi><mn>1</mn></msub><mo form=prefix stretchy=false>(</mo><mrow><mtext></mtext><mi>ok</mi></mrow><mo form=postfix stretchy=false>)</mo></mrow></math></eq> or <eq><math><mrow><msub><mi>h</mi><mn>2</mn></msub><mo form=prefix stretchy=false>(</mo><mrow><mtext></mtext><mi>err</mi></mrow><mo form=postfix stretchy=false>)</mo></mrow></math></eq>, where <eq><math><msub><mi>h</mi><mi>i</mi></msub></math></eq> are different elements of a hash family. This can be <em>simulated</em> by prepending the discriminant, but this is suboptimal. Perhaps more clearly, <code>Option&LTT></code> should either hash its element or return a random (but static) constant for <code>None</code>.<p>These rules apply to objects that contain non-primitives too. Hashing<pre><code class=language-rust><span class=hljs-keyword>struct</span> <span class="hljs-title class_">Key</span> {
+    top: <span class=hljs-type>u64</span>,
+    middle: <span class=hljs-type>u64</span>,
+    low: <span class=hljs-type>u64</span>,
+    meta: <span class=hljs-type>Option</span><<span class=hljs-type>String</span>>,
+}
+</code></pre><p>shouldn’t be slower than hashing <code>[u64; 3]</code> in the cases where <code>meta</code> is <code>None</code>, and should be barely slower than that if it’s <code>Some</code>, as long as the string is short. This isn’t magic – Rust just can’t represent the solution in the type system.</div></section><footer><div class=viewport-container><h2>Made with my own bare hands (why.)</h2></div></footer><script>window.addEventListener("keydown", e => {
 				if (e.key === "Enter") {
 					if (e.ctrlKey) {
 						window.open("https://github.com/purplesyringa/site/edit/master/blog/thoughts-on-rust-hashing/index.md", "_blank");
diff --git a/blog/thoughts-on-rust-hashing/index.md b/blog/thoughts-on-rust-hashing/index.md
index f5f70b6..4144a96 100644
--- a/blog/thoughts-on-rust-hashing/index.md
+++ b/blog/thoughts-on-rust-hashing/index.md
@@ -1,6 +1,6 @@
 ---
 title: Thoughts on Rust hashing
-time: December 11, 2024
+time: December 12, 2024
 intro: |
     In languages like Python, Java, or C++, values are hashed by calling a "hash me" method on them, implemented by the type author. This fixed-hash size is then immediately used by the hash table or what have you. This design suffers from some obvious problems, like:
 
@@ -118,14 +118,14 @@ This is just a multiplicative hash, not unlike FNV-1, but consuming $8$ bytes at
 
 Now what happens if you try to hash two 32-bit integers with this hash? With padding, that will compile to two multiplications even though one would work. This halves throughput and increases latency.
 
-Practical hashes uses much larger blocks. `rapidhash` has a $24$-byte state and can absorb $48$ bytes at once. `ahash` has a $48$-byte state and absorbs $64$-byte blocks. `meowhash` has a $128$-byte state and absorbs $256$ bytes. (I only selected these particular hashes because I'm familiar with their kernels; others have similar designs.)
+Practical hashes use much larger blocks. `rapidhash` has a $24$-byte state and can absorb $48$ bytes at once. `ahash` has a $48$-byte state and absorbs $64$-byte blocks. `meowhash` has a $128$-byte state and absorbs $256$ bytes. (I only selected these particular hashes because I'm familiar with their kernels; others have similar designs.)
 
 These are some of the fastest non-cryptographic hashes in the world. Do you really want to nuke their performance by padding $8$-byte inputs to $48$, $64$, or $256$ bytes? Probably not.
 
 
 ### Chains
 
-Okay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than absorbing a full block?
+Okay, but what if we cheated and modified the hash functions to absorb small data somewhat more efficiently than by absorbing a full block?
 
 Say, the `rapidhash` kernel is effectively *this*:
 
@@ -156,7 +156,7 @@ Why does `rapidhash` even use three independent chains in the first place? That'
 
 Okay, so padding is a terrible idea. Can we accumulate a buffer instead? How much hashes I had to scroll through in SMHasher before I found *one* Rust implementation that took this approach is a warning bell.
 
-[The implementation I found](https://docs.rs/farmhash/1.1.5/src/farmhash/lib.rs.html#92-110), of course, stores a `Vec<u8>` and passes it to the underlying hasher in `finish`. I believe I don't need to explain why allocating during hash function is not the brightest idea.
+[The implementation I found](https://docs.rs/farmhash/1.1.5/src/farmhash/lib.rs.html#92-110), of course, stores a `Vec<u8>` and passes it to the underlying hasher in `finish`. I believe I don't need to explain why allocating in a hash function is not the brightest idea.
 
 Let's consider [another implementation](https://docs.rs/highway/1.2.0/src/highway/portable.rs.html#272-288) that stores a fixed-size buffer instead. Huh, that's a lot of `if`s and `for`s. I wonder what Godbolt will say about this. Let's try something very simple:
 
@@ -512,14 +512,16 @@ $$
 \mathrm{mix}(x, y) = (x \cdot y \bmod 2^{64}) \oplus (x \cdot y \mathop{div} 2^{64}).
 $$
 
-This is a combination of certain well-known primitives. The problem here is that $a_i$ needs to be precomputed beforehand. This is not a problem for fixed-length keys, like structs of integers -- something often used in, say, `rustc`.
+This is a UMAC-style combination of certain well-known primitives. The problem here is that $a_i$ needs to be precomputed beforehand. This is not a problem for fixed-length keys, like structs of integers -- something often used in, say, `rustc`.
 
 Unfortunately, Rust forces each hasher to handle *all* possible inputs, including of different lengths, so this scheme can't work. The hasher isn't even parametrized by the type of the hashed object. Four well-layouted 64-bit integers that can easily be mixed together with just two full-width multiplications? Nah, `write_u64` goes brrrrrrrrrrrr-
 
 
 ### Stop bitching
 
-I've been designing fast hash-based data structures for several months before realizing they are almost unusable because of these design decisions. *Surely* something that isn't a problem in C++ and Python won't be a problem in Rust, I thought. I deserve a little bitching, okay?
+I've been designing fast hash-based data structures for several months before realizing they are, in fact, not fast, purely because of the hashing performance. *Surely* something that isn't a problem in C++ and Python won't be a problem in Rust, I thought.
+
+But yeah, sorry for whining.
 
 
 ### Actually how
@@ -529,3 +531,56 @@ The obvious way forward is to bring the structure of the data back into the pict
 In my opinion, `Hasher` and `Hash` are a wrong abstraction. Instead of the `Hash` driving the `Hasher` ~~insane~~, it should be the other way round: `Hash` providing introspection facilities and `Hasher` navigating the hashed objects recursively. As a bonus, this could enable (opt-in) portable hashers.
 
 How this API should look like and whether it can be shoehorned into the existing interfaces remains to be seen. I have not started work on the design yet, and perhaps this article might be a bit premature, but I'd love to hear your thoughts on how I missed something really obvious (or, indeed, on how Rust is fast enough and no one cares).
+
+
+## Non-solutions
+
+### Like C++
+
+I'd like to note that the way Java, C++, and Python take is not without its own share of problems. The good news is that fields in a product type are hashed the same way regardless of the values of other fields. For example, hashing `(Vec<T>, U)` always applies the same hash to `U` and the mixes it with the hash of `Vec<T>`, unlike Rust.
+
+However, this approach is suboptimal in the general case. Let's get back to the UMAC example. Hashing $((a, b), c)$ as $\mathrm{mix}(\mathrm{mix}(a, b), c)$ has a higher latency than necessary: computing $\mathrm{mix}(a, b) \oplus \mathrm{mix}(c, 0)$ would suffice. But, again, applying this rule generally as $\mathrm{mix}(a, 0) \oplus \mathrm{mix}(b, 0) \oplus \mathrm{mix}(c, 0)$ is suboptimal too.
+
+This odd $\mathrm{mix}(a, b)$/$\mathrm{mix}(a, 0)$ duality arises because the block size of the UMAC-style approach is, at its minimum, two $64$-bit words, while hashes take $64$ bits. This distinction gets much worse for larger block sizes.
+
+
+### Specialization
+
+After this article was published, several people advised me to look into specialization. I'd like to comment a bit on why this does not solve the problem either.
+
+Specialization does not support efficient hashing of nested objects. Although `(u8, u8, u8, u8)` can be specialized to be hashed with `write_u32`, this gets complicated with types like:
+
+```rust
+struct S {
+    a: (u8, u16),
+    b: u8,
+}
+```
+
+The best way to serialize this type is to fit `b` into the padding byte of `a`. We can't do that during layouting, but we can when hashing. This is very hard to do automatically just with specialization, and next to impossible if people implement `Hash` manually.
+
+
+### Rule of thumb
+
+The bottom line is: hashing a product type can only be efficient if it's linearized. Hashing a structure composed of structures *needs* to consider the nested fields. Each such field *needs* to be associated with a static index, so that it can be associated with a constant from a pool, an offset in the block, or what have you.
+
+Fields that are stored in the structure after variable-length fields like `&[T]`/`Vec<T>` needs to have static indices regardless.
+
+This applies to arrays: hashing `[T; 2]` by performing two calls into `T::hash` is suboptimal, because that leads to reuse of constants, which in turn requires more thorough mixing for acceptable hash quality.
+
+It also applies to *slices*: hashing `[T]` needs to split the slice into fixed-size chunks, where each chunk is hashed as a single block. Extending the API to emit start/end annotations for `[T]` slices does not help *either*, because the indices of fields inside each `T` need to be predictable, too. If `Hash for T` emits $3$ words and the block size is $8$ words, this will vectorize *badly* due to the misalignment.
+
+As much as these rules apply to product types, they apply to sum types. Hashing a `Result<T, E>` needs to either produce $h_1(\mathrm{ok})$ or $h_2(\mathrm{err})$, where $h_i$ are different elements of a hash family. This can be *simulated* by prepending the discriminant, but this is suboptimal. Perhaps more clearly, `Option<T>` should either hash its element or return a random (but static) constant for `None`.
+
+These rules apply to objects that contain non-primitives too. Hashing
+
+```rust
+struct Key {
+    top: u64,
+    middle: u64,
+    low: u64,
+    meta: Option<String>,
+}
+```
+
+shouldn't be slower than hashing `[u64; 3]` in the cases where `meta` is `None`, and should be barely slower than that if it's `Some`, as long as the string is short. This isn't magic -- Rust just can't represent the solution in the type system.