You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the contents of the data directory are directly uploaded to S3 and retrieved exactly as they are. Given that data files lend themselves well to being compressed (generally), substantial storage, bandwidth and time savings may be possible for larger projects if something like gzip were run on the data directory. This raises one problem, idiosyncratic to the AP's use case, which is where we have publicly viewable HTML files in a subdirectory of data meant for people to look at.
I have identified three possible approaches:
Option 1
Compress all folders under dataexcept for reports, or some similarly named subfolder, that is explicitly not compressed before being uploaded to s3. Thus, anything put in that subfolder will be accessible directly by s3 pathing.
On s3, the data files would look like this after compression:
Support a 'protect' dotfile in directories that marks that directory and all subfolders as being compression-exempt. For example, data/reports would now have a file in it, data/reports/.nocompress, which would stop it from being compressed before being uploaded to s3. This would have the same overall s3 folder structure as above.
Option 3
The datakit-data.json config file expands to include a whitelist of folders to not be compressed, with the default set to data/reports (or just reports)
The text was updated successfully, but these errors were encountered:
I like this idea in general and option 1 in particular. A related consideration on the price front would be to consider glacierizing data assets either manually or automatically after some period of time. But this may best be discussed as part of a new ticket, and may ultimately be something specific to each user (e.g. some may want the cost savings at the expense of losing immediate access, whereas others always require immediate access to files).
Currently, the contents of the
data
directory are directly uploaded to S3 and retrieved exactly as they are. Given that data files lend themselves well to being compressed (generally), substantial storage, bandwidth and time savings may be possible for larger projects if something like gzip were run on thedata
directory. This raises one problem, idiosyncratic to the AP's use case, which is where we have publicly viewable HTML files in a subdirectory ofdata
meant for people to look at.I have identified three possible approaches:
Option 1
Compress all folders under
data
except forreports
, or some similarly named subfolder, that is explicitly not compressed before being uploaded to s3. Thus, anything put in that subfolder will be accessible directly by s3 pathing.On s3, the data files would look like this after compression:
Option 2
Support a 'protect' dotfile in directories that marks that directory and all subfolders as being compression-exempt. For example,
data/reports
would now have a file in it,data/reports/.nocompress
, which would stop it from being compressed before being uploaded to s3. This would have the same overall s3 folder structure as above.Option 3
The
datakit-data.json
config file expands to include a whitelist of folders to not be compressed, with the default set todata/reports
(or justreports
)The text was updated successfully, but these errors were encountered: