Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache profiling (WIP) #188

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mrflip
Copy link
Contributor

@mrflip mrflip commented Aug 7, 2022

Still working on this, but here are some benchmarks for the cache classes.

Results so far are that the LRUCache is about 1.5-2x as fast as the LRUMap through all initial tries, and that the differences between LRUCache and LRUCacheWithDelete (or Map/MapWD) is small.

flat writes, 1x gentle read spread LRUCache x 27.93 ops/sec ±1.28% (50 runs sampled)
flat writes, 1x gentle read spread LRUCacheWithDelete x 24.34 ops/sec ±2.58% (44 runs sampled)
flat writes, 1x gentle read spread LRUMap x 12.61 ops/sec ±1.83% (36 runs sampled)
flat writes, 1x gentle read spread LRUMapWithDelete x 11.42 ops/sec ±4.76% (33 runs sampled)
flat writes, 1x gentle read spread LRUCache x 21.32 ops/sec ±1.21% (39 runs sampled)

Random-order reads (no writes) of 60,000 distinct values.
The top 30k of values occur ~70% of the time and the top 10k values 33% of the time.

individual get then set, sharp spread LRUCache x 2,192,917 ops/sec ±1.47% (92 runs sampled)
individual get then set, sharp spread LRUCacheWithDelete x 2,348,333 ops/sec ±1.19% (89 runs sampled)
individual get then set, sharp spread LRUMap x 1,560,643 ops/sec ±2.33% (80 runs sampled)
individual get then set, sharp spread LRUMapWithDelete x 1,408,341 ops/sec ±6.98% (70 runs sampled)
individual get then set, sharp spread LRUCache x 2,331,573 ops/sec ±1.57% (86 runs sampled)

individual get then set, flat spread LRUCache x 3,931,479 ops/sec ±1.92% (85 runs sampled)
individual get then set, flat spread LRUCacheWithDelete x 4,175,905 ops/sec ±1.28% (86 runs sampled)
individual get then set, flat spread LRUMap x 2,242,931 ops/sec ±3.71% (76 runs sampled)
individual get then set, flat spread LRUMapWithDelete x 2,541,944 ops/sec ±3.40% (79 runs sampled)
individual get then set, flat spread LRUCache x 3,748,182 ops/sec ±2.24% (84 runs sampled)

Pre-loaded 30k capacity caches, random-order reads (no writes) of 42,000 distinct values.
The top 30k of values occur ~97% of the time and the top 10k values 75% of the time.

read-only sharp spread LRUCache x 112 ops/sec ±1.28% (72 runs sampled)
read-only sharp spread LRUCacheWithDelete x 100 ops/sec ±1.62% (73 runs sampled)
read-only sharp spread LRUMap x 59.71 ops/sec ±2.01% (62 runs sampled)
read-only sharp spread LRUMapWithDelete x 59.43 ops/sec ±4.16% (61 runs sampled)
read-only sharp spread LRUCache x 102 ops/sec ±3.10% (75 runs sampled)

Pre-loaded 30k capacity caches, random-order reads (no writes) of 60,000 distinct values.
The top 30k of values occur ~70% of the time and the top 10k values 33% of the time.

read-only gentle spread LRUCache x 90.15 ops/sec ±2.23% (73 runs sampled)
read-only gentle spread LRUCacheWithDelete x 86.24 ops/sec ±1.25% (74 runs sampled)
read-only gentle spread LRUMap x 57.07 ops/sec ±1.77% (60 runs sampled)
read-only gentle spread LRUMapWithDelete x 62.46 ops/sec ±1.45% (65 runs sampled)
read-only gentle spread LRUCache x 94.03 ops/sec ±1.14% (76 runs sampled)

I have added the ubiquitous benchmark.js library to help power this if that's OK.

@Yomguithereal I fumbled together a method to give back a pareto-distributed random integer, or at least fake it well enough to serve this benchmark, but I suspect that you will have good advice on how to do it right. If this isn't built in to pandemonium it would be nice to have a few such distributions in the toolbag.

@Yomguithereal
Copy link
Owner

Hello @mrflip,

Have you checked this: https://github.com/dominictarr/bench-lru? Are you only trying to assess whether the cache is faster than the map and what is the impact of deletion here?

Note that the Map is sometimes, on some node versions, faster if you handle specific keys such as long strings etc.

@Yomguithereal
Copy link
Owner

If this isn't built in to pandemonium it would be nice to have a few such distributions in the toolbag.

Why not. What would the API looks like? Something taking a rng and some alpha params and returning a distributed rng? At some point it might start being too statistically complicated for pandemonium which is mostly about algorithms to fit another lib such as simple-statistics.

@mrflip
Copy link
Contributor Author

mrflip commented Aug 8, 2022

This is primarily in service of the later PR for TTK time-expiring the cache and knowing what the tradeoffs were on Cache vs Map, not so much on comparisons to others (I found out about this lib via that suite). Mixed in with that PR (I'll pull it back to here) is a shell script to run any of the benchmarks with the node profiler turned on.

I'll make the test exercise long strings, shortish strings and number keys.

Do you have any concerns on adding the dev dependency of benchmark.js? I'd also recommend adding the chai-js library -- it makes tests more explanatory and beautiful to read. I was surprised by how many more and deeper tests we wrote after adopting it. For just one example it becomes very pleasant to add type guards like expect(result).to.include.keys([...]), expect(arr).to.be.an('array').length(5), which give much clearer errors than the runtime exceptions when you access that wrong-typed return.

mrflip added 7 commits August 8, 2022 18:14
* LRUCache and family .inspect limits its output -- showing the youngest items, and ellipsis, and the oldest item. Options allow dumping the raw object or controlling the size of output (and the number of items a console.log will mindlessly iterate over).
* LRUCache and family all have inspect wired up to the magic 'nodejs.util.inspect.custom' symbol property that drives console.log output
* LRUCache and family all have a summaryString method returning eg 'LRUCache[8/200]' for a cache with size 8 and capacity 200, wired to the magic Symbol.toStringTag property that drives string interpolation (partially addresses Yomguithereal#129).
@mrflip mrflip force-pushed the CacheProfiling branch 4 times, most recently from 22cea6b to a2d5172 Compare August 9, 2022 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants