diff --git a/blog/the-ram-myth/index.html b/blog/the-ram-myth/index.html index 3250d79..718627b 100644 --- a/blog/the-ram-myth/index.html +++ b/blog/the-ram-myth/index.html @@ -8,7 +8,7 @@ In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data."property=og:description>
GitHub

purplesyringa

The RAM myth

Reddit

The RAM myth is a belief that modern computer memory resembles perfect random-access memory. Cache is seen as an optimization for small data: if it fits in L2, it’s going to be processed faster; if it doesn’t, there’s nothing we can do.

Most likely, you believe that code like this is the fastest way to shard data (I’m using Python as pseudocode; pretend I used your favorite low-level language):

groups = [[] for _ in range(n_groups)]
 for element in elements:
     groups[element.group].append(element)
-

Indeed, it’s linear (i.e. asymptotically optimal), and we have to access random indices anyway, so cache isn’t going to help us in any case.

In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data.

SolutionThe algorithm from above has Θ(n) cache misses on random input (where n is the number of elements). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by group, that’s great. If you can’t, you can still sort the accesses before the for loop:

elements.sort(key = lambda element: element.group)
+

Indeed, it’s linear (i.e. asymptotically optimal), and we have to access random indices anyway, so cache isn’t going to help us in any case.

In reality, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster on large input. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data.

SolutionThe algorithm from above has Θ(n) cache misses on random input (where n is the number of groups). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by group, that’s great. If you can’t, you can still sort the accesses before the for loop:

elements.sort(key = lambda element: element.group)
 

Sorting costs some time, but in return, removes cache misses from the for loop entirely. If the data is large enough that it doesn’t fit in cache, this is a net win. As a bonus, creating individual lists can be replaced with a group-by operation:

elements.sort(key = lambda element: element.group)
 groups = [
     group_elements
diff --git a/blog/the-ram-myth/index.md b/blog/the-ram-myth/index.md
index e1dc920..d834f4e 100644
--- a/blog/the-ram-myth/index.md
+++ b/blog/the-ram-myth/index.md
@@ -35,7 +35,7 @@ In reality, this is leaving a lot of performance on the table, and certain *asym
 
 ### Solution
 
-The algorithm from above has $\Theta(n)$ cache misses on random input (where $n$ is the number of elements). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by `group`, that's great. If you can't, you can still sort the accesses before the `for` loop:
+The algorithm from above has $\Theta(n)$ cache misses on random input (where $n$ is the number of groups). The only way to reduce this number is to make the memory accesses more ordered. If you can ensure the elements are ordered by `group`, that's great. If you can't, you can still sort the accesses before the `for` loop:
 
 ```python
 elements.sort(key = lambda element: element.group)