Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Inspired by the S3 hash cache and the slow runtimes on my bad VPS for some file synchronization where scanning a moderate size directory for hashes could take 20-30 seconds on unchanged data, this PR implements an optional hash cache for disk backends.
I initially started only using last-modified times (as in the first commits), but this showed a consistent failure in synchronization tests, which run too fast to detect sub-second edit times. However, I know that S3 usage works, and I also understand that logically the cache should work, and the problem is mostly granularity at test level.
So instead the cache uses overall file info details, which includes modification and change times, but also inode and file size (we ignore access time because reading shouldn't be significant). The session tests validating the cache also got their sample file sizes modified to exercise that part of the flow, and now they pass.
In practice, running the
scan
operation on my bad VPS improves drastically:A ~25x improvement on slower disks/file systems seems worth it as an option, particularly when one wants to synchronize much bigger files.