You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, only the _cache variable has a threading lock, and it's only used for write operations. All global variables need to use a lock for both read and write accesses.
In order to be thread safe, values which have the possibility of being changed in one thread and which can be either read or written from another thread need to be protected by a threading lock prior to either reading or writing them. If they are protected only upon writing, then they can be read while the writing thread is in the middle of a write, which can result in undefined operation. While it would be possible to design Python such that a lock isn't needed upon read, Python isn't designed that way, and it's definitely not something which is guaranteed. Not needing a lock upon read accesses would only be possible if changes to every readable value were always something accomplished with a single memory write access. That's just not the case in Python. So, for thread safety, locks need to be obtained prior to both read and write operations from/to variables which are accessible to more than one thread (i.e. basically any value which isn't local to a function).
Specifically, every access to _cache needs to be protected by the _cache_lockRLock which is currently used only for writes. In addition, all other global variables need to use thread locking semantics for both reading and writing.
Ideally, everything which can, potentially, be changed and accessed by multiple threads should be protected by a lock of some sort (RLock() is normally good). You don't need to lock things like constants, which are never changed once initially set upon package initialization, but things like _named_args, _replacement_cache, _locale_sensitive, and even _cache_all, and any other global settings, should only be accessed (read or write) once a threading lock has been obtained. A potential alternative for some variables would be to use a value which is only accessible to the current thread.
I don't have a specific thread safety test cases for the regex package, but I have tested thread safety with only obtaining locks for writes, but not reads, in other Python code and experienced issues.
The text was updated successfully, but these errors were encountered:
Currently, only the
_cache
variable has a threading lock, and it's only used for write operations. All global variables need to use a lock for both read and write accesses.In order to be thread safe, values which have the possibility of being changed in one thread and which can be either read or written from another thread need to be protected by a threading lock prior to either reading or writing them. If they are protected only upon writing, then they can be read while the writing thread is in the middle of a write, which can result in undefined operation. While it would be possible to design Python such that a lock isn't needed upon read, Python isn't designed that way, and it's definitely not something which is guaranteed. Not needing a lock upon read accesses would only be possible if changes to every readable value were always something accomplished with a single memory write access. That's just not the case in Python. So, for thread safety, locks need to be obtained prior to both read and write operations from/to variables which are accessible to more than one thread (i.e. basically any value which isn't local to a function).
Specifically, every access to
_cache
needs to be protected by the_cache_lock
RLock
which is currently used only for writes. In addition, all other global variables need to use thread locking semantics for both reading and writing.Ideally, everything which can, potentially, be changed and accessed by multiple threads should be protected by a lock of some sort (
RLock()
is normally good). You don't need to lock things like constants, which are never changed once initially set upon package initialization, but things like_named_args
,_replacement_cache
,_locale_sensitive
, and even_cache_all
, and any other global settings, should only be accessed (read or write) once a threading lock has been obtained. A potential alternative for some variables would be to use a value which is only accessible to the current thread.I don't have a specific thread safety test cases for the regex package, but I have tested thread safety with only obtaining locks for writes, but not reads, in other Python code and experienced issues.
The text was updated successfully, but these errors were encountered: