diff --git a/.bleep b/.bleep index 285c673..f9d13ee 100644 --- a/.bleep +++ b/.bleep @@ -1 +1 @@ -46cdb8138867aa29ff1fd9d672c1c4bdd63914f7 \ No newline at end of file +6f6a59de57389578cd13e173b6f8cf2069ea83e1 \ No newline at end of file diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 8008978..83a1939 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -6,7 +6,7 @@ jobs: trie-hard: strategy: matrix: - toolchain: [nightly, 1.72, 1.80.0] + toolchain: [nightly, 1.74, 1.80.0] runs-on: ubuntu-latest # Only run on "pull_request" event for external PRs. This is to avoid # duplicate builds for PRs created from internal branches. diff --git a/Cargo.toml b/Cargo.toml index 789682b..a6e4374 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -11,8 +11,8 @@ Fast implementation of a trie data structure """ [dev-dependencies] -rstest = "0.21.0" -criterion = "0.3" +rstest = "0.22.0" +criterion = "0.5.1" radix_trie = "0.2.1" paste = "1.0.15" once_cell = "1.19.0" diff --git a/README.md b/README.md index 8afcc33..9655461 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ This crate is an implementation of the [trie](https://en.wikipedia.org/wiki/Trie ## Performance -There are several other trie implementations for rust that are more full featured, so it you are looking for a more robust tool, you will probably want to check out [`radix_trie`](https://crates.io/crates/radix_trie) which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place! +There are several other trie implementations for rust that are more full-featured, so if you are looking for a more robust tool, you will probably want to check out [`radix_trie`](https://crates.io/crates/radix_trie) which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place! Here is a chart showing the time taken to read 10k entries from a map that consists of 119 entries containing only lower-case characters, numbers, and `-`. As you can see, when miss rate gets above 50% the performance of trie-hard surpasses `std::HashMap` and improves as miss rates get higher. @@ -69,7 +69,7 @@ let root = Node { This tells us that if a byte other than `a` or `d` appears in the first position, the key being tested does not appear in the trie. This ability to make an exclusion decision at every step is what makes tries more appealing than even hashmaps in some cases. Searching for a string in a hashmap requires hashing the entire string whereas a trie can potentially determine that a string is not part of a set within a single byte. -If the byte is `a` or `d` we still need to know which node to go to next. All nodes in the graph are stored in contiguous a vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with `a` and 2 is the node for keys starting with `d`. It is important that these child nodes are ordered by their corresponding byte. +If the byte is `a` or `d` we still need to know which node to go to next. All nodes in the graph are stored in a contiguous vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with `a` and 2 is the node for keys starting with `d`. It is important that these child nodes are ordered by their corresponding byte. ```rust let root = Node { @@ -89,7 +89,7 @@ At this point we can visualize the conceptual trie and trie-hard like this | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ![First Layer Conceptual Trie](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerVanilla.png?raw=true "Header Read vs HashMap Benchmark") | ![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerTrieHard.png?raw=true "Header Read vs HashMap Benchmark") | -Because of the recursive nature of a trie, we can repeat the same process of creating a mask based on allowed bytes at each node and preparing a set of children for each node. When we reach a complete word that appears in the initial set, we need to signify that the node is a valid word. Visually we will mark them with greed, but in rust they just appear as a different enum variant of `TrieNode`. +Because of the recursive nature of a trie, we can repeat the same process of creating a mask based on allowed bytes at each node and preparing a set of children for each node. When we reach a complete word that appears in the initial set, we need to signify that the node is a valid word. Visually we will mark them with green, but in rust they just appear as a different enum variant of `TrieNode`. After repeating for one more layer, we can visualize the trie like the this. @@ -99,7 +99,7 @@ After repeating for one more layer, we can visualize the trie like the this. Notice that `do` shows up as green because it is a complete word found in the original collection. -Finally we add the last layer and complete this small trie. +Finally, we add the last layer and complete this small trie. | Conceptual | Trie-Hard |