Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All caching server caching matters #524

Open
jeroen opened this issue Nov 10, 2024 · 1 comment
Open

All caching server caching matters #524

jeroen opened this issue Nov 10, 2024 · 1 comment

Comments

@jeroen
Copy link
Member

jeroen commented Nov 10, 2024

The front-end uses

Cache-control: public, max-age=60

This means a response may be cached for 60 seconds, by either proxy or client, until it must be revalidated. If this works well we could use the same for back-end (though maybe 60 is too much for PACKAGES files).

Revalidation

Every response the server sets ETag and Last-Modified to the id and timestamp of the most recent database record within the scope of the page. I.e. for universe wide pages this is the last file uploaded by that universe, and for package specific pages this is the last file for this package. For global pages it is the last upload anywhere in the db.

The front-end can revalidate using If-None-Match and If-Modified-Since headers. If it has not changed it can send HTTP 304 instead of doing the full page render.

We do not set must-revalidate because validation is enabled by default, and this directive just forbids the use of stale caches when the upstream server is offline.

Nginx

I think we need to remove the expires statements which override cache-control set by node.

Should figure out how to properly set proxy_cache_valid and proxy_cache_path statements. I think we just need to remove proxy_cache_valid and test if nginx still caches 400 responses. Also make sure no Set-Cookie header exists.

We might need several caches in nginx such that html pages can be kept in cache for longer, and don't get evicted due to a few big files entering the cache. We probably want to use different caches for frequently accessed html pages and vignettes, and a separate one for datasets and PACKAGES files. I think cdn files should not be cached by nginx, they are too big and we probably do not gain much speedup from reading them in mongo.

CDN

URLs under https://cdn.r-universe.dev are content-addressable, these do not revalidation:

Cache-Control: public, max-age=31557600, immutable

This should hopefully encourage cloudflare to cache these things as much as possible. Again, we do not want to cache cdn files in nginx: it is expensive and there is no benefit.

Manually bypass the cache

The easiest way to manually bypass the cache is by adding some random parameter e.g. ?nocache=123 to the url.

@jeroen jeroen changed the title All caching issues All caching server caching matters Nov 10, 2024
@jeroen
Copy link
Member Author

jeroen commented Nov 12, 2024

I have now enabled 'cache-everything' on cloudflare such that it also caches unknown file extentions and non-200 status. Note that I had to manually change to respect origin TTL, otherwise it would overwrite max-age to be an hour.

Screenshot 2024-11-12 at 15 06 23 Screenshot 2024-11-12 at 15 06 17 Screenshot 2024-11-12 at 15 27 32

Serve stale while updating

Have also made a separate rule to op for the global domain that allows for serving stale content while updating.

Screenshot 2024-11-12 at 15 28 28

Therefore can remove these rules from nginx:

proxy_cache_use_stale updating;
proxy_cache_background_update on;
proxy_cache_lock on;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant