Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache entries can grow unbounded #112

Open
casperisfine opened this issue Aug 29, 2019 · 6 comments
Open

Cache entries can grow unbounded #112

casperisfine opened this issue Aug 29, 2019 · 6 comments

Comments

@casperisfine
Copy link

Initially reported in Shopify/shipit-engine#935 by @josacar

Context

Shipit is a deploy tool, as such it's frequently hitting the same endpoint to grab updates, e.g GET https://api.github.com/v3/repos/my-org/my-repo/commits?branch=master.

Additionally it uses the GitHub App api for authentication, so it's Authorization header changes every hour.

And finally GitHub's API responses specify the following Vary header: Accept, Authorization, Cookie, X-GitHub-OTP.

Problem

Since the URL never changes, the same cache entry is re-used all the time. However since the Authorization token change every hour, new entries are regularly prepended, until the cache entry becomes huge, and start taking a lot of time to be deserialized, or even go over the cache store size (memcached limit values to 1MB).

Solutions

I can see several solutions to this, with very different tradeoffs.

Very simple, set a cap on the number of entries per key

Self explanatory, we could simply set a configurable upper limit on the number of entries, and drop the oldest entry whenever we go over.

It's an easy fix, not quite ideal as it might surprise people, but IMHO it's better to be surprised by cache evictions than by cache OOM / corruption (in case of memcache truncation).

Somewhat simple, evict stale entries

When prepending a new entry, we could evict all the stale ones. It's easy if max-age / Expires is set, if it isn't, then we have to rely on heuristics as specify in the RFC

Harder, and potentially slower, store varying responses under different keys

Since Faraday::HttpCache implicitly relies on the underlying cache store for garbage collection, the idea here would be to store individual responses under different keys, so that rarely accessed ones can be garbage collected.

The very tricky part is that the Vary is in the response, so it means maintaining some kind of non-trivial index of what kind of Vary values were returned for that response, and then use that to select a good candidate for revalidation.

Additionally it means that it will at least double the number of cache lookups, which is not a small downside.

cc @rafaelfranca @Edouard-chin @josacar

@josacar
Copy link

josacar commented Aug 30, 2019

I think relying on the cache store is the way to go, I think the same as we do a hash for the URL, we can do it for all the vary headers too, so the cache entry is smaller and the time to serialize or deserialize will be faster.

@byroot
Copy link
Contributor

byroot commented Aug 30, 2019

I think the same as we do a hash for the URL, we can do it for all the vary headers too

It's not as simple though, the Vary header is in the response, not the request, so if you do GET /foo the middleware don't know which headers to hash, hence which entry to look for. So you need to either look at all / some entries to consider historical values of the Vary header, or maintain some kind of index.

@rafaelfranca
Copy link
Collaborator

I'd start by the simple solution. If that doesn't work well we can increase the complexity.

@georgeguimaraes
Copy link
Member

georgeguimaraes commented Apr 6, 2020

Hi, did you end up solving this? I'm curious if you find out a good solution for it.

@rafaelfranca
Copy link
Collaborator

We didn't implement a solution but I think Very simple, set a cap on the number of entries per key is a good solution.

tjwallace pushed a commit to ZenHubHQ/faraday-http-cache that referenced this issue Oct 1, 2020
tjwallace pushed a commit to ZenHubHQ/faraday-http-cache that referenced this issue Oct 1, 2020
@tjwallace
Copy link

We ran into this same problem and went with limiting the number of entries stored per cache key. Take a look at #121 and let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants