Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up all temp files created during processing and storage #285

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dmarkow
Copy link
Contributor

@dmarkow dmarkow commented Jul 18, 2019

Fixes #256

Arc creates temp files in the directory that System.tmp_dir returns, but never cleans up after itself. Running unchecked on a production server could result in running out of disk space. These temp files are created in three scenarios:

  1. The input is binary data
  2. The input is a remote URL
  3. During transformations for each version

Scenarios 1 and 2 seem very similar (converting something that isn't a local file path into a local file path), however scenario 1 was happening once per transformation (where 10 versions/transformations meant the binary data would be written to disk 10 separate times), whereas scenario 2 was happening once right at the beginning (the URL is retrieved and saved to disk immediately during Arc.File.new/1).

Scenario 3 creates a temp file for each transformation.

Changes:

  • Write binary input to a tempfile one time during Arc.File.new/1 rather than once per transformation (the tradeoff is that even with no transformations, it'll still write once, but this matches how remote URLs are handled)
  • Got rid of Arc.File.ensure_path as it was only there to convert binary data to a path which now happens automatically
  • Add a tempfile? attribute to %Arc.File{} that can be used later for cleanup (I preferred this over just testing the path to see if it's in the temp dir, otherwise there's a chance of deleting the input file itself if it was also in the temp dir)
  • After storing each version, the version's temp file is deleted (scenario 3)
  • After storing all of the versions, the input temp file is deleted if it exists (scenarios 1 and 2)
  • Added tests for checking temp files (had to change some of the setup/teardown logic in the tests, but they all pass including the S3 tests)


# If this version was transformed in any way, a tempfile was created
# that we need to clean up.
if file.tempfile? do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the same code as in cleanup/2. I would suggest replacing this (and the ending return of result) with the function call cleanup(result, file) to minimize unnecessary code duplication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!

@@ -33,6 +33,7 @@ end

def new(%{filename: filename, binary: binary}) do
%Arc.File{binary: binary, file_name: Path.basename(filename)}
|> write_binary()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping new/1 without side-effects. Although this might be a contrived example, imagine a situation where someone is transferring a very large file over the network and they are not using Plug.Upload or any related mechanism to do so. Now, they could store the filename and binary data as part of a GenServer state, but they might also want to store a copy of Arc.File instead, since it already contains the filename and binary data in its structure.

And sure, someone could easily create just use %Arc.File{...} to create a structure without side-effects, but I don't think new/1 would be an obvious place for a side-effect. Also imagine trying to test some code that requires an Arc.File struct but does not require an actual file to exist, or maybe can't write a file for some reason; this would have negative impacts on those test cases where some other flow attempts to use new/1 and the whole thing breaks even though the app code never directly called new/1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to delete temporary file automatically
3 participants