Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[next]: Guard diskcache creation by file lock #1745

Merged
merged 10 commits into from
Dec 2, 2024

Conversation

tehrengruber
Copy link
Contributor

@tehrengruber tehrengruber commented Nov 21, 2024

The disk cache used to cache compilation in the gtfn backend has a race condition manifesting itself in sqlite3.OperationalError: database is locked errors if multiple python processes try to initialize the diskcache.Cache object concurrently. This PR fixes this by guarding the object creation by a file-based lock in the same directory as the database.

While this issue occurred frequently and was observed to be fixed on distributed file systems, the lock does not guarantee correct behavior in particular for accesses to the cache (beyond opening) since the underlying SQLite database is unreliable when stored on an NFS based file system. It does however ensure correctness of concurrent cache accesses on a local file system. See more information here:

https://grantjenks.com/docs/diskcache/tutorial.html#settings
https://www.sqlite.org/faq.html#q5
tox-dev/filelock#73
NFS safe locking:
https://gitlab.com/warsaw/flufl.lock
Barry Warsaw / FLUFL Lock · GitLab

i.e. it ensures that any resources associated with the cache are properly
released when the instance is garbage collected.
This class extends `diskcache.Cache` to ensure the cache is properly
- opened on a distributed file system by using a file lock. This guards the creating of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is writing not a problem? Except that it's much harder to make all processes write at the same time and there less likely to trigger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, writing to the cache is actually fine. SQLite which is used internally by diskcache.Cache handles concurrency very well in most cases, except for changing settings of the database itself, and that is the problem here. On creation of the diskcache.Cache object, some sql PRAGMA statements are executed and they lead to the observed issues. It is not really clear why, this should work, but it doesn't.

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tehrengruber tehrengruber merged commit f57d6e9 into GridTools:main Dec 2, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants