-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use atomic writes for data store file operations #12715
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 073ab1f The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
CodSpeed Performance ReportMerging #12715 will not alter performanceComparing Summary
|
!preview atomic-writes |
Snapshots have been released for the following packages:
Publish Log
Build Log
|
const dir = await fs.mkdtemp(tmpdir() + '/astro-'); | ||
const tempFile = join(dir, 'temp'); | ||
await fs.writeFile(tempFile, data); | ||
try { | ||
await fs.rename(tempFile, filePath); | ||
} finally { | ||
await fs.rmdir(dir).catch(() => {}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of this solution because now ALL projects, without discrimination, will incur in four (create temp dir, write file, rename file, remove directory) writing I/O operations instead of one. And writing operations are slow.
Maybe we could have an internal heuristic, where we enable this atomic rewriting only when we exceed a certain number of collections.
As you said, this is hard to test and incur in a project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a high-volume write though. It's debounced, so only called once in builds. Switching based on collection size is a good idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a new approach which is just a single write and a rename. Renames are cheap, so it shouldn't be much overhead compared to the current approach.
Ah, Windows isn't happy with this approach |
!preview atomic-writes |
Snapshots have been released for the following packages:
Publish Log
Build Log
|
So apparently this breaks differently. I'm seeing if I can reproduce that. |
!preview atomic-writes |
Snapshots have been released for the following packages:
Publish Log
Build Log
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way better now, thank you very for bearing with me 😅
Changes
The data store has to write a number of different files to disk, which are in turn watched by the filesystem watcher. In some very large sites this was causing issues where the watcher would report the changed file before it had finished writing to disk.
This PR changes all data store file operations to use atomic writes: files are first written to a tmp dir, and then moved.
Fixes #12702
Testing
This is hard to reproduce. Tested manually using a reproduction with 50k pages and a deliberately shortened debounce time.
Docs