diff --git a/blog/the-ram-myth/index.html b/blog/the-ram-myth/index.html index 3250d79..718627b 100644 --- a/blog/the-ram-myth/index.html +++ b/blog/the-ram-myth/index.html @@ -8,7 +8,7 @@ In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data."property=og:description>
The RAM myth is a belief that modern computer memory resembles perfect random-access memory. Cache is seen as an optimization for small data: if it fits in L2, it’s going to be processed faster; if it doesn’t, there’s nothing we can do.
Most likely, you believe that code like this is the fastest way to shard data (I’m using Python as pseudocode; pretend I used your favorite low-level language):
groups = [[] for _ in range(n_groups)]
for element in elements:
groups[element.group].append(element)
-
Indeed, it’s linear (i.e. asymptotically optimal), and we have to access random indices anyway, so cache isn’t going to help us in any case.
In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data.
SolutionThe algorithm from above has group
, that’s great. If you can’t, you can still sort the accesses before the for
loop:
elements.sort(key = lambda element: element.group)
+
Indeed, it’s linear (i.e. asymptotically optimal), and we have to access random indices anyway, so cache isn’t going to help us in any case.
In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data.
SolutionThe algorithm from above has group
, that’s great. If you can’t, you can still sort the accesses before the for
loop:
elements.sort(key = lambda element: element.group)
Sorting costs some time, but in return, removes cache misses from the for
loop entirely. If the data is large enough that it doesn’t fit in cache, this is a net win. As a bonus, creating individual lists can be replaced with a group-by operation:
elements.sort(key = lambda element: element.group)
groups = [
group_elements
diff --git a/blog/the-ram-myth/index.md b/blog/the-ram-myth/index.md
index e1dc920..d834f4e 100644
--- a/blog/the-ram-myth/index.md
+++ b/blog/the-ram-myth/index.md
@@ -35,7 +35,7 @@ In reality, this is leaving a lot of performance on the table, and certain *asym
### Solution
-The algorithm from above has $\Theta(n)$ cache misses on random input (where $n$ is the number of elements). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by `group`, that's great. If you can't, you can still sort the accesses before the `for` loop:
+The algorithm from above has $\Theta(n)$ cache misses on random input (where $n$ is the number of groups). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by `group`, that's great. If you can't, you can still sort the accesses before the `for` loop:
```python
elements.sort(key = lambda element: element.group)