Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs-storage: Millisecond timestamps and proper rounding #37

Closed
kirillt opened this issue Apr 23, 2024 · 3 comments · Fixed by #63
Closed

fs-storage: Millisecond timestamps and proper rounding #37

kirillt opened this issue Apr 23, 2024 · 3 comments · Fixed by #63
Assignees

Comments

@kirillt
Copy link
Member

kirillt commented Apr 23, 2024

Context: #23 (comment)

We might need more granular timestamps than just seconds. Let's consider such a scenario:

  1. We scan the filesystem at the moment HH:MM:ss:xxx, where xxx is milliseconds part.
  2. A file cat.jpg is modified at the moment HH:MM:ss:yyy.

If we truncate xxx and yyy then on next scan we'll skip cat.jpg and we miss the update. We has performed the scan in the moment HH:MM:ss and the file has been modified at HH:MM:ss, so this doesn't look like an update.

As a workaround, it seems possible to round HH:MM:ss:xxx always to the lowest side when we memoize the timestamp of latest scan, and to round HH:MM:ss:yyy always to the highest side when we retrieve the timestamp of file modification. This would result in excessive updates for the files which were updated in the same second when scan happened, but it should be better than losing an update.

With this logic implemented, we control amount of excessive scans by altering the timestamp granularity. What is the smallest division of time that makes sense for file systems? Nanoseconds seem to be too low, but what about milliseconds?

@tareknaser
Copy link
Collaborator

As a workaround, it seems possible to round HH:MM:ss:xxx always to the lowest side when we memoize the timestamp of latest scan, and to round HH:MM:ss:yyy always to the highest side when we retrieve the timestamp of file modification.

I think this is a brilliant idea. very straightforward to implement as well.

This would result in excessive updates for the files which were updated in the same second when scan happened,

In my opinion, this seems like a rare case that we can overlook.

What is the smallest division of time that makes sense for file systems? Nanoseconds seem to be too low, but what about milliseconds?

I think microseconds or milliseconds should be enough

We should also account for the fact that we only find this particular error either on the open PR or ARK-Builders/arklib#87 (comment) on macOS. I notice that our CI doesn't detect these issues either. We should really include macOS CI tests and run them with each PR.

@twitu
Copy link
Collaborator

twitu commented Apr 25, 2024

I agree that including milliseconds will significantly reduce conflicts. It's very unlikely (probably impossible) that the user will edit/update the same file multiple times within the same millisecond. So this solution fits well for human interaction based use cases.

For systems that might be using nautilus file system in an automated way like a server. It can use atomic versioning.

Also it's best to avoid nanoseconds since different OS have different levels of support. https://doc.rust-lang.org/std/time/struct.SystemTime.html#platform-specific-behavior

@kirillt kirillt changed the title fs-storage: preventing missed updates fs-storage: Millisecond timestamps and proper rounding Apr 27, 2024
@kirillt kirillt moved this to Todo in Development Apr 27, 2024
@twitu twitu mentioned this issue May 20, 2024
@twitu
Copy link
Collaborator

twitu commented May 29, 2024

Now it seems the problem is not just related to rounding but more about the semantics around syncing. We have two timestamps here, one in the metadata of the physical file stored on disk (T2) and a timestamp recorded for when the in-memory key-value mapping (T1) is updated.

Here's what we have -

  • file metadata timestamp (T2) is updated when

    • file is written to by BaseStorage logic write_fs
    • file is written to by external system
  • key-value mapping timestamp (T1) is updated when

    • set, or remove is called
    • write_fs is called, timestamp is made equal to file metadata timestamp

So we have three cases here -

  • T2 > T1 - (up) sync data from file to mapping
  • T1 > T2 - (down) sync data from mapping to file
  • T1 == T2 - only happens after write_fs call, no syncing needed

Case 1

The most common flow will be where new entries are added to the mapping making T1 > T2, then the mapping is written to disk making T2 == T1.

Case 2

However, there can be cases where the mapping is updated and the underlying file is also updated. This case will need a full sync that uses monoid implementation to merge the data. However, this is tricky because this can be triggered by

  • T1 > T2
  • T2 > T1

We have no way of differentiating between case 1 and case 2 so all syncs will need to be full syncs which will be very inefficient. We want to be able to differentiate case 1 and case 2. So that we can use the more efficient write_fs for case 1.

Possible solutions

  1. Only FileStorage can update the file at the given path. No external updates to the file are allowed. This will ensure case 2 cannot happen.

  2. FileStorage stores two timestamp fields one for the mapping and one for the file. This way we have three timestamps.

    • in-memory mapping T1
    • in-memory file updated timestamp T2
    • file metadata T3

Now we have the following cases -

  • T1 > T2 && T2 == T3 -> write_fs (down) sync -> T1 == T2 == T3
  • T1 > T2 && T2 > T3 -> full sync -> T1 == T2 == T3
  • T3 > T2 && T2 == T1 -> read_fs (up) sync -> T1 == T2 == T3
  • T3 > T2 && T1 > T2 -> full sync -> T1 == T2 == T3

What are your thoughts?

@twitu twitu closed this as completed in #63 Jun 14, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Development Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants