-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache entries can grow unbounded #112
Comments
I think relying on the cache store is the way to go, I think the same as we do a hash for the URL, we can do it for all the vary headers too, so the cache entry is smaller and the time to serialize or deserialize will be faster. |
It's not as simple though, the |
I'd start by the simple solution. If that doesn't work well we can increase the complexity. |
Hi, did you end up solving this? I'm curious if you find out a good solution for it. |
We didn't implement a solution but I think |
We ran into this same problem and went with limiting the number of entries stored per cache key. Take a look at #121 and let me know what you think! |
Initially reported in Shopify/shipit-engine#935 by @josacar
Context
Shipit is a deploy tool, as such it's frequently hitting the same endpoint to grab updates, e.g
GET https://api.github.com/v3/repos/my-org/my-repo/commits?branch=master
.Additionally it uses the GitHub App api for authentication, so it's
Authorization
header changes every hour.And finally GitHub's API responses specify the following
Vary
header:Accept, Authorization, Cookie, X-GitHub-OTP
.Problem
Since the URL never changes, the same cache entry is re-used all the time. However since the
Authorization
token change every hour, new entries are regularly prepended, until the cache entry becomes huge, and start taking a lot of time to be deserialized, or even go over the cache store size (memcached limit values to 1MB).Solutions
I can see several solutions to this, with very different tradeoffs.
Very simple, set a cap on the number of entries per key
Self explanatory, we could simply set a configurable upper limit on the number of entries, and drop the oldest entry whenever we go over.
It's an easy fix, not quite ideal as it might surprise people, but IMHO it's better to be surprised by cache evictions than by cache OOM / corruption (in case of memcache truncation).
Somewhat simple, evict stale entries
When prepending a new entry, we could evict all the stale ones. It's easy if
max-age / Expires
is set, if it isn't, then we have to rely on heuristics as specify in the RFCHarder, and potentially slower, store varying responses under different keys
Since
Faraday::HttpCache
implicitly relies on the underlying cache store for garbage collection, the idea here would be to store individual responses under different keys, so that rarely accessed ones can be garbage collected.The very tricky part is that the
Vary
is in the response, so it means maintaining some kind of non-trivial index of what kind ofVary
values were returned for that response, and then use that to select a good candidate for revalidation.Additionally it means that it will at least double the number of cache lookups, which is not a small downside.
cc @rafaelfranca @Edouard-chin @josacar
The text was updated successfully, but these errors were encountered: