`fs-storage`: Millisecond timestamps and proper rounding #37

kirillt · 2024-04-23T15:57:15Z

We might need more granular timestamps than just seconds. Let's consider such a scenario:

We scan the filesystem at the moment HH:MM:ss:xxx, where xxx is milliseconds part.
A file cat.jpg is modified at the moment HH:MM:ss:yyy.

If we truncate xxx and yyy then on next scan we'll skip cat.jpg and we miss the update. We has performed the scan in the moment HH:MM:ss and the file has been modified at HH:MM:ss, so this doesn't look like an update.

As a workaround, it seems possible to round HH:MM:ss:xxx always to the lowest side when we memoize the timestamp of latest scan, and to round HH:MM:ss:yyy always to the highest side when we retrieve the timestamp of file modification. This would result in excessive updates for the files which were updated in the same second when scan happened, but it should be better than losing an update.

With this logic implemented, we control amount of excessive scans by altering the timestamp granularity. What is the smallest division of time that makes sense for file systems? Nanoseconds seem to be too low, but what about milliseconds?

The text was updated successfully, but these errors were encountered:

tareknaser · 2024-04-25T02:51:53Z

As a workaround, it seems possible to round HH:MM:ss:xxx always to the lowest side when we memoize the timestamp of latest scan, and to round HH:MM:ss:yyy always to the highest side when we retrieve the timestamp of file modification.

I think this is a brilliant idea. very straightforward to implement as well.

This would result in excessive updates for the files which were updated in the same second when scan happened,

In my opinion, this seems like a rare case that we can overlook.

What is the smallest division of time that makes sense for file systems? Nanoseconds seem to be too low, but what about milliseconds?

I think microseconds or milliseconds should be enough

We should also account for the fact that we only find this particular error either on the open PR or ARK-Builders/arklib#87 (comment) on macOS. I notice that our CI doesn't detect these issues either. We should really include macOS CI tests and run them with each PR.

twitu · 2024-04-25T16:16:58Z

I agree that including milliseconds will significantly reduce conflicts. It's very unlikely (probably impossible) that the user will edit/update the same file multiple times within the same millisecond. So this solution fits well for human interaction based use cases.

For systems that might be using nautilus file system in an automated way like a server. It can use atomic versioning.

Also it's best to avoid nanoseconds since different OS have different levels of support. https://doc.rust-lang.org/std/time/struct.SystemTime.html#platform-specific-behavior

twitu · 2024-05-29T12:38:57Z

Now it seems the problem is not just related to rounding but more about the semantics around syncing. We have two timestamps here, one in the metadata of the physical file stored on disk (T2) and a timestamp recorded for when the in-memory key-value mapping (T1) is updated.

Here's what we have -

file metadata timestamp (T2) is updated when
- file is written to by BaseStorage logic write_fs
- file is written to by external system
key-value mapping timestamp (T1) is updated when
- set, or remove is called
- write_fs is called, timestamp is made equal to file metadata timestamp

So we have three cases here -

T2 > T1 - (up) sync data from file to mapping
T1 > T2 - (down) sync data from mapping to file
T1 == T2 - only happens after write_fs call, no syncing needed

Case 1

The most common flow will be where new entries are added to the mapping making T1 > T2, then the mapping is written to disk making T2 == T1.

Case 2

However, there can be cases where the mapping is updated and the underlying file is also updated. This case will need a full sync that uses monoid implementation to merge the data. However, this is tricky because this can be triggered by

T1 > T2
T2 > T1

We have no way of differentiating between case 1 and case 2 so all syncs will need to be full syncs which will be very inefficient. We want to be able to differentiate case 1 and case 2. So that we can use the more efficient write_fs for case 1.

Possible solutions

Only FileStorage can update the file at the given path. No external updates to the file are allowed. This will ensure case 2 cannot happen.
FileStorage stores two timestamp fields one for the mapping and one for the file. This way we have three timestamps.
- in-memory mapping T1
- in-memory file updated timestamp T2
- file metadata T3

Now we have the following cases -

T1 > T2 && T2 == T3 -> write_fs (down) sync -> T1 == T2 == T3
T1 > T2 && T2 > T3 -> full sync -> T1 == T2 == T3
T3 > T2 && T2 == T1 -> read_fs (up) sync -> T1 == T2 == T3
T3 > T2 && T1 > T2 -> full sync -> T1 == T2 == T3

What are your thoughts?

kirillt changed the title ~~fs-storage: preventing missed updates~~ fs-storage: Millisecond timestamps and proper rounding Apr 27, 2024

kirillt added this to Development Apr 27, 2024

kirillt assigned twitu and Pushkarm029 Apr 27, 2024

kirillt moved this to Todo in Development Apr 27, 2024

twitu mentioned this issue May 20, 2024

New syncing logic #63

Merged

twitu closed this as completed in #63 Jun 14, 2024

github-project-automation bot moved this from Todo to Done in Development Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fs-storage`: Millisecond timestamps and proper rounding #37

`fs-storage`: Millisecond timestamps and proper rounding #37

kirillt commented Apr 23, 2024

tareknaser commented Apr 25, 2024

twitu commented Apr 25, 2024

twitu commented May 29, 2024 •

edited

Loading

fs-storage: Millisecond timestamps and proper rounding #37

fs-storage: Millisecond timestamps and proper rounding #37

Comments

kirillt commented Apr 23, 2024

tareknaser commented Apr 25, 2024

twitu commented Apr 25, 2024

twitu commented May 29, 2024 • edited Loading

Case 1

Case 2

Possible solutions

`fs-storage`: Millisecond timestamps and proper rounding #37

`fs-storage`: Millisecond timestamps and proper rounding #37

twitu commented May 29, 2024 •

edited

Loading