Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Use public-read for S3-enabled storage of thumbnails by default #217

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jmurty
Copy link
Contributor

@jmurty jmurty commented Mar 16, 2017

When the ENABLE_S3_MEDIA setting is applied, continue to use "private"
S3 storage for media assets by default, but switch to using "public"
S3 storage for thumbnails generated by the thumbnailing system.

Here "private" storage means that images are only accessible through
time-limited signed S3 URLs, while "public" means that thumbnails will
be accessible through non-time-limited standard S3 URLs and the
thumbnail images will be stored with the public-read ACL behind the
scenes.

As part of this change, the S3DefaultStorage storage class is renamed
to S3DefaultPrivateStorage to be explicit (the original class name
remains available for backwards-compatibility) and the new
S3DefaultPublicStorage class is added.

When the `ENABLE_S3_MEDIA` setting is applied, continue to use "private"
S3 storage for media assets by default, but switch to using "public"
S3 storage for thumbnails generated by the thumbnailing system.

Here "private" storage means that images are only accessible through
time-limited signed S3 URLs, while "public" means that thumbnails will
be accessible through non-time-limited standard S3 URLs and the
thumbnail images will be stored with the `public-read` ACL behind the
scenes.

As part of this change, the `S3DefaultStorage` storage class is renamed
to `S3DefaultPrivateStorage` to be explicit (the original class name
remains available for backwards-compatibility) and the new
`S3DefaultPublicStorage` class is added.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 72.892% when pulling 7719b26 on add-public-read-s3-storage-for-thumbnails into 1efc39c on develop.

@jmurty jmurty requested review from mrmachine and cogat March 24, 2017 04:36
@mrmachine
Copy link
Member

@james you might be right that S3's default behaviour is a bit of a distraction, but we might also have tunnel vision by focusing on thumbnails (in the context of ICEkit, at least)... the default storage backend stores files, images, and thumbnails (repurposed images)...

without S3, we have typically configured the media directory (where the file storage saves all files) as public, served by nginx, and we've only either used a private storage class (attached to model fields in a project, not in a lib like ICEkit) that saves files outside this directory and is protected with a Django view (in the project) or nginx internal redirects...

since we started moving everything to S3, I wanted to keep the config with the project (and not require S3 buckets to be manually configured or special nginx config), and I wanted to avoid setting whole buckets public when we have often stored things like DB dump and log backups on S3, so I decided to stick with S3's private by default... I had just fixed up our use of django-storages to make it consistent (previously sometimes we got signed URLs, sometimes we didn't) and I thought, if we just use the storage backend whenever we access files, we don't need to care about the S3 object being private... and that works fine, except for the case which has come up where a direct link is cached for longer than the expiry (which is max 7 days)...

somewhat separately to that, we now have singular concrete image and file libraries in icekit, which themselves have no concept of public/private file access and use a single storage backend, so we can't easily change the storage class for a single project, and we can't easily store both public and private files in these image/file libraries... this is probably a feature limitation of icekit which may be somewhat orthogonal to the immediate problem, and not one we often run up against... but now that we are using S3 storage for everything, our options to deal with it are more limited because nginx internal redirects for protection won't work anymore without local file access...

what I am disagreeing with is the assumption that a "thumbnail" (an image repurposed by easy-thumbnails) should always be public... that's only one use-case where we want to protect the original high res file and allow smaller (traditional "thumbnails") to be public, and not all repurposed images are small, anyway... I think a repurposed image should have the same public/private status as its original...

in general I am in favour of "safe by default" (e.g. private) and explicit exceptions in code in the cases where we know it is appropriate... I don't care if it's an original image, or a zip/pdf/other file, or a thumbnail repurposed by easy thumbnails... I think private by default, and having to explicitly render a public permalink via template tag or other method, without having to hack the concrete models in ICEkit, aligns with this "safe by default" pattern and should be more easily defined/customised/overridden in project codebase ...

so before we change ICEkit, perhaps we should lay out all the other known use cases for public/private restrictions on image/file library content, and then discuss changes to support them... the most common one might be that all media can probably be public, but if implementing that as a default storage backend makes it harder to keep some files private because the concrete image/file library models are in ICEkit, then it might not be the best way forward...

@jmurty
Copy link
Contributor Author

jmurty commented Mar 31, 2017

Thanks for the feedback @mrmachine.

Leaving aside the implementation details of S3 as a backend, and thumbnails as a specific kind of media content, it sounds like we might need to take a step back and properly (or at least more clearly) handle different classes of stored assets. So we need:

  • a way to configure two storage engines for ICEkit, one private, the other public
  • to make all ICEkit features configurable so they can choose either one of these storage engines, and do the appropriate thing
  • to decide whether to choose the private or public engine as the default thumbnail storage. This choice won't matter too much, however, because we should also...
  • override the default thumbnail storage to ensure it is private in any usage where it doesn't make sense to generate public images. E.g. we might want to ensure ThumbnailAdminMixin always generates private images for display within the admin, even when the default thumbnail engine is the public one.

The key goal in everything above is to percolate the concepts of two storages – private and public – all through ICEkit, and select the appropriate one to use at points in code where it matters.

The next step might be to add explicit configuration options for the various ICEkit features – such as IIIF Image API, collection image processing, whatever else – so you can configure whether you want to use the public or private storage engine (or some variation) for the different use cases.

As an aside, I don't think we should use easy-thumbnails as a general toolkit for repurposing images anyway. I tried this with the IIIF Image API work and it simply didn't give us the control we needed, and was too opaque and poorly document for me to easily understand what it did do. The intended use of easy-thumbnails is to generate smaller representations of images for inclusion in web pages. Using it for anything other than this is asking for trouble, or at least confusion.

@mrmachine
Copy link
Member

easy-thumbnails has additional limitations when paired with a general purpose image library and remote storage:

  • we need to configure a predefined set of aliases, and generate all the required thumbnails at upload time to avoid crippling performance issues with remote storage
  • easy-thumbnails allows us to map a given alias to an app label, a model, or a particular field on a model
  • with a general purpose image library storing all images in a single model/field, we have no granular control over which aliases get rendered for which images -- and it's probably not ideal to simply render all aliases for all images in the library

So I'm not wedded to easy-thumbnails, but I don't know that anything else will have a better solution to the above problem, and besides that issue, easy-thumbnails appears to be pretty widely used and works reasonably well.

I agree that having ICEkit itself be able to store both public and private files, and possibly determine which via django-guardian integration, sounds like a promising way to go.

I think having the user assign public permissions in the file or image library change form page (checkbox, or django-guardian integration), and having one storage backend that abides by that for files, images and thumbnails, without having to use special template tags or Python code for each generated link, would be ideal.

jmurty added a commit that referenced this pull request Oct 19, 2017
Make it possible to more easily configure a
GLAMkit site to use public instead of private
S3 storage.

The implementation is drawn from the languishing
and contentious pull request at
#217
@cogat cogat removed their request for review November 1, 2019 05:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants