Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract file storage features into own project - Common storage abstraction #480

Open
winged opened this issue Mar 7, 2024 · 1 comment

Comments

@winged
Copy link
Contributor

winged commented Mar 7, 2024

We use file storage in various projects in the Caluma / Inosca space, not just Alexandria.

Users expect features to be similar / equal independent of where in the application something happens. Therefore it makes sense to extract the file storage code into something reusable, that also provides the features we need.

The goal here is to provide a single file storage abstraction that can provide features in addition to
just storing files. Similar to DGAP, it should be configure-once, use anywhere, with a few simple interfaces.

Project-wise, the implementation should be similar to DGAP as well: Extract what we have, move to it's own project, and start using it immediately in the project where the extraction happened (Alexandria in this case). Ideally during development of a feature when we have to work on the code anyway.

Integration into other code bases will happen later on when it has been stabilized a bit:

  • Caluma file answers
  • Inosca eBau documents (optional, as this is going to be migrated to Alexandria "someday soon")

Technical proposal:

  • Define a set of configuration options: Storage backend to use and associated settings (File storage, S3)

  • Implement functionality as a FileField ORM field type

  • Proposed features:

    • Encryption at rest (Either S3 SSE, custom, or both)
    • Virus scanning (possibly even with pluggable backends - optional). Must: ClamAV
    • Thumbnailing (Either a trigger for custom thumbnailing code, or integrated - interface to be defined)
    • Fulltext search index triggering (or at least stemming / text extraction via SOLR or configurable backend)
  • Interface: Minimal configuration in the settings, and a custom FileField ORM field type.

  • Optional, further development: Storage As-a-Service app, where FileFields and Caluma could offload file storage. This may also happen much further down the line. The interface here could be REST, but might also provide an S3 interface. To be discussed when the problem actually arises.

@Yelinz
Copy link
Member

Yelinz commented Jul 18, 2024

Discussion about usefulness of extraction lead to:

  • removing the complex dynamic storage backend and field, as it is not really needed and usefully implemented (per file encryption key is missing and only use).
  • virus scanning (clamav) can be done as a middleware, scanning all multipart requests
  • Thumbnail, Full text searcj are not always useful. TBD how to deal with them
  • generating presigned urls will be extracted to a seperate package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants