-
Notifications
You must be signed in to change notification settings - Fork 11
Use public-read for S3-enabled storage of thumbnails by default #217
base: develop
Are you sure you want to change the base?
Conversation
When the `ENABLE_S3_MEDIA` setting is applied, continue to use "private" S3 storage for media assets by default, but switch to using "public" S3 storage for thumbnails generated by the thumbnailing system. Here "private" storage means that images are only accessible through time-limited signed S3 URLs, while "public" means that thumbnails will be accessible through non-time-limited standard S3 URLs and the thumbnail images will be stored with the `public-read` ACL behind the scenes. As part of this change, the `S3DefaultStorage` storage class is renamed to `S3DefaultPrivateStorage` to be explicit (the original class name remains available for backwards-compatibility) and the new `S3DefaultPublicStorage` class is added.
@james you might be right that S3's default behaviour is a bit of a distraction, but we might also have tunnel vision by focusing on thumbnails (in the context of ICEkit, at least)... the default storage backend stores files, images, and thumbnails (repurposed images)... without S3, we have typically configured the media directory (where the file storage saves all files) as public, served by nginx, and we've only either used a private storage class (attached to model fields in a project, not in a lib like ICEkit) that saves files outside this directory and is protected with a Django view (in the project) or nginx internal redirects... since we started moving everything to S3, I wanted to keep the config with the project (and not require S3 buckets to be manually configured or special nginx config), and I wanted to avoid setting whole buckets public when we have often stored things like DB dump and log backups on S3, so I decided to stick with S3's private by default... I had just fixed up our use of django-storages to make it consistent (previously sometimes we got signed URLs, sometimes we didn't) and I thought, if we just use the storage backend whenever we access files, we don't need to care about the S3 object being private... and that works fine, except for the case which has come up where a direct link is cached for longer than the expiry (which is max 7 days)... somewhat separately to that, we now have singular concrete image and file libraries in icekit, which themselves have no concept of public/private file access and use a single storage backend, so we can't easily change the storage class for a single project, and we can't easily store both public and private files in these image/file libraries... this is probably a feature limitation of icekit which may be somewhat orthogonal to the immediate problem, and not one we often run up against... but now that we are using S3 storage for everything, our options to deal with it are more limited because nginx internal redirects for protection won't work anymore without local file access... what I am disagreeing with is the assumption that a "thumbnail" (an image repurposed by easy-thumbnails) should always be public... that's only one use-case where we want to protect the original high res file and allow smaller (traditional "thumbnails") to be public, and not all repurposed images are small, anyway... I think a repurposed image should have the same public/private status as its original... in general I am in favour of "safe by default" (e.g. private) and explicit exceptions in code in the cases where we know it is appropriate... I don't care if it's an original image, or a zip/pdf/other file, or a thumbnail repurposed by easy thumbnails... I think private by default, and having to explicitly render a public permalink via template tag or other method, without having to hack the concrete models in ICEkit, aligns with this "safe by default" pattern and should be more easily defined/customised/overridden in project codebase ... so before we change ICEkit, perhaps we should lay out all the other known use cases for public/private restrictions on image/file library content, and then discuss changes to support them... the most common one might be that all media can probably be public, but if implementing that as a default storage backend makes it harder to keep some files private because the concrete image/file library models are in ICEkit, then it might not be the best way forward... |
Thanks for the feedback @mrmachine. Leaving aside the implementation details of S3 as a backend, and thumbnails as a specific kind of media content, it sounds like we might need to take a step back and properly (or at least more clearly) handle different classes of stored assets. So we need:
The key goal in everything above is to percolate the concepts of two storages – private and public – all through ICEkit, and select the appropriate one to use at points in code where it matters. The next step might be to add explicit configuration options for the various ICEkit features – such as IIIF Image API, collection image processing, whatever else – so you can configure whether you want to use the public or private storage engine (or some variation) for the different use cases. As an aside, I don't think we should use easy-thumbnails as a general toolkit for repurposing images anyway. I tried this with the IIIF Image API work and it simply didn't give us the control we needed, and was too opaque and poorly document for me to easily understand what it did do. The intended use of easy-thumbnails is to generate smaller representations of images for inclusion in web pages. Using it for anything other than this is asking for trouble, or at least confusion. |
So I'm not wedded to I agree that having ICEkit itself be able to store both public and private files, and possibly determine which via I think having the user assign public permissions in the file or image library change form page (checkbox, or |
Make it possible to more easily configure a GLAMkit site to use public instead of private S3 storage. The implementation is drawn from the languishing and contentious pull request at #217
When the
ENABLE_S3_MEDIA
setting is applied, continue to use "private"S3 storage for media assets by default, but switch to using "public"
S3 storage for thumbnails generated by the thumbnailing system.
Here "private" storage means that images are only accessible through
time-limited signed S3 URLs, while "public" means that thumbnails will
be accessible through non-time-limited standard S3 URLs and the
thumbnail images will be stored with the
public-read
ACL behind thescenes.
As part of this change, the
S3DefaultStorage
storage class is renamedto
S3DefaultPrivateStorage
to be explicit (the original class nameremains available for backwards-compatibility) and the new
S3DefaultPublicStorage
class is added.