[C++] Provide a Sync() abstraction on writable file abstractions provided in arrow::io #39967

felipecrv · 2024-02-06T19:49:09Z

Describe the enhancement requested

Operating systems don't immediately commit data provided by userland code into storage devices. This is usually not a problem because (1) the kernel will not take very long to asynchronously commit the data on its own, (2) the kernel mediates access to the filesystem and guarantees all processes sees the writes performed to the same file so far [1], and (3) applications can handle missing data due to power loss or kernel crashes (e.g. a file used for caching getting corrupted can be easily re-downloaded).

Applications with more stringent durability requirements (e.g. SQLite) will force commit of pending data on the kernel by calling fsync on transaction commit. But even databases avoid doing this for every file and opt to fsync only a special file storing the Write-Ahead Log [2] containing batched updates. Don't think of Sync() as a flushing mechanism that you should always call —fsync can add a lot of unnecessary latency (in the many hundreds of milliseconds) and wear down storage devices.

Network and Distributed File Systems

Networked file systems usually provide commands that ensure durability of the pending writes on the server storage media — Sync would delegate to these commands in these cases.

Distributed filesystems that rely on data replication might provide operations to ensure writes are propagated before returning (Quorum Writes). Since late 2020 this is not an issue with AWS S3, so Sync on S3 [3] files would be a no-op.

Masking Latency

If you must issue Sync calls, one way to mask the latency caused by them is to issue writes as soon as possible, do some other work, and only then call Sync.

file.Write(data);
file.Write(more_data);
FunctionThatTakesSomeTimeAndDoesUsefulWork();
file.Sync();  // it's very likely the kernel has no pending data on file at this point and `Sync` will return quickly
// more work with file and possibly another Sync
file.Close()

[1] exceptions to this exist with the use of flags like O_DIRECT on Linux's open syscall https://www.man7.org/linux/man-pages/man2/open.2.html
[2] then in the event of a power loss, the database can replay the write-ahead log and complete any missing write to the more complex structures of the database like indexes
[3] https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/

Component(s)

C++

The text was updated successfully, but these errors were encountered:

pitrou · 2024-02-07T09:14:47Z

Let's call it ExpensiveDataSync?
Explanation:

we want to call out that the operation is expensive, because a sync operation is an attractive magnet
we want to call out that this would synchronize the file data, not necessary its directory metadata (which on some filesystems /OSes at least would require a different system call?)

NicolasDenoyelle · 2024-02-08T14:42:09Z

we want to call out that this would synchronize the file data, not necessary its directory metadata (which on some filesystems /OSes at least would require a different system call?)

My understanding of fsync() man page is that you have to also call it on the directory.

As well as flushing the file data, fsync() also flushes the metadata information associated with the file (see inode(7)).
Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

Either way, maybe you could add the details of what is used under the hood for each implementation?

felipecrv · 2024-02-08T15:13:12Z

@pitrou Would DurableSync() be OK? ExpensiveDataSync() still sounds like something you should call at least once. DurableSync(), by sounding cryptic, can deter people who don't understand the semantics from calling it (hopefully).

pitrou · 2024-02-08T15:53:48Z

Ok, then DurableDataSync? :-)

pitrou · 2024-02-08T15:56:23Z

My understanding of fsync() man page is that you have to also call it on the directory.

That depends what your needs are? Since I don't know how SSD-to-GPU works, it's difficult to know if that's what you want.

Also, regardless, FileOutputStream does not have any knowledge of the directory it writes in...

felipecrv · 2024-02-08T16:15:00Z

@pitrou DurableDataSync might be too close to the fdatasync syscall whereas DurableSync makes the definition of "sync" more open to what's more adequate to each implementation.

@NicolasDenoyelle the answer about the directory question is easy: no.

Reasons: as @pitrou said, (1) a file handle doesn't (can't) have a reference to the directory it's currently in (imagine how hard that would be in a multi-threaded environment!), and (2) an application that needs the directory entry update might have multiple files to sync and it should sync all files before it syncs the directory and it should do that once.

You should also note that fsync doesn't eschew the app from doing its own concurrency control. If you fsync a file, then another thread updates that file, before you fsync the directory, you could fsync the directory with inconsistent data. It's the application's responsibility to prevent that. (It would remain inconsistent only in the event of a power failure which is probably an issue for an application issuing fsyncs).

pitrou · 2024-02-08T16:17:47Z

Well, the manpage seems to indicate that fdatasync is what we want here?

fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see inode(7)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.

mapleFU · 2024-02-10T16:00:09Z

+1 on this idea. And from description, seems it would not have any gurantee on object-store when part-upload enabled?

mpoeter · 2025-01-21T12:55:57Z

What is the status of this feature request? Has there been any work on this? I was just looking through the code for exactly that feature and was a bit surprised that it does not yet exist.

I would be happy to chime in and get this done!

felipecrv · 2025-01-21T18:03:46Z

What is the status of this feature request? Has there been any work on this? I was just looking through the code for exactly that feature and was a bit surprised that it does not yet exist.

I would be happy to chime in and get this done!

I started but got overwhelmed about all the implementations of the IO interfaces.

mpoeter · 2025-01-23T08:49:33Z

I started but got overwhelmed about all the implementations of the IO interfaces.

@felipecrv do you have a branch somewhere with your changes so far?

felipecrv added the Type: enhancement label Feb 6, 2024

github-actions bot added the Component: C++ label Feb 6, 2024

felipecrv self-assigned this Feb 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Provide a Sync() abstraction on writable file abstractions provided in arrow::io #39967

[C++] Provide a Sync() abstraction on writable file abstractions provided in arrow::io #39967

felipecrv commented Feb 6, 2024 •

edited

Loading

pitrou commented Feb 7, 2024 •

edited

Loading

NicolasDenoyelle commented Feb 8, 2024

felipecrv commented Feb 8, 2024

pitrou commented Feb 8, 2024

pitrou commented Feb 8, 2024

felipecrv commented Feb 8, 2024 •

edited

Loading

pitrou commented Feb 8, 2024

mapleFU commented Feb 10, 2024

mpoeter commented Jan 21, 2025

felipecrv commented Jan 21, 2025

mpoeter commented Jan 23, 2025

[C++] Provide a Sync() abstraction on writable file abstractions provided in arrow::io #39967

[C++] Provide a Sync() abstraction on writable file abstractions provided in arrow::io #39967

Comments

felipecrv commented Feb 6, 2024 • edited Loading

Describe the enhancement requested

Network and Distributed File Systems

Masking Latency

Component(s)

pitrou commented Feb 7, 2024 • edited Loading

NicolasDenoyelle commented Feb 8, 2024

felipecrv commented Feb 8, 2024

pitrou commented Feb 8, 2024

pitrou commented Feb 8, 2024

felipecrv commented Feb 8, 2024 • edited Loading

pitrou commented Feb 8, 2024

mapleFU commented Feb 10, 2024

mpoeter commented Jan 21, 2025

felipecrv commented Jan 21, 2025

mpoeter commented Jan 23, 2025

felipecrv commented Feb 6, 2024 •

edited

Loading

pitrou commented Feb 7, 2024 •

edited

Loading

felipecrv commented Feb 8, 2024 •

edited

Loading