Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Premature thread local attribute access: possibly breaking isolation #154

Open
verschmelzen opened this issue Nov 19, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@verschmelzen
Copy link

There might be a serious bug here. Afaiu local() is that resolves to the correct context when it's attribute is accessed from inside the running thread. Here you access the members possibly before other threads fork, thus sharing all the data between them.

https://github.com/netzkolchose/django-computedfields/blob/master/computedfields/handlers.py#L31

@verschmelzen
Copy link
Author

Here is showcase

from threading import local, Thread

# isolated
print('Isolated, so we will see errors')
l = local()
l.a = 1 # is not available to threads

def print_l():
    try:
        print('try access to non-existent attribute', l.a)
    except:
        print('isolation works')
        import traceback
        traceback.print_exc()
    l.a = 2 # is not available to "next" thread

t = Thread(target=print_l)
t.start()
t.join()

t2 = Thread(target=print_l)
t2.start()
t2.join()

print('\n')
print('\n')

# shared
print('Value shared between threads')
l = local()
class B: pass
A = l.a = B()
A.a = 1

def print_A():
    print('state is shared', A.a)
    A.a = 2 # new value is available to "next" thread

t = Thread(target=print_A)
t.start()
t.join()

t2 = Thread(target=print_A)
t2.start()
t2.join()

@jerch
Copy link
Collaborator

jerch commented Nov 19, 2024

@verschmelzen Thx for finding this one - yes you are right, it currently does not decouple data across threads correctly. The issue is line 31-34, were the dict objects get bound back into to module level sharing all stuff across all threads.

A quickfix would be to reshape the data parts and put them behind a guarded access, something like this:

STORAGE = local()

def get_DATA():
    try:
        return STORAGE.DATA
    except AttributeError:
        STORAGE.DATA = {'DELETES': {}, 'M2M_REMOVE': {}, 'M2M_CLEAR': {}, 'UPDATE_OLD': {}}
    return STORAGE.DATA

# access in data handlers:
    # instead of
    DELETES...
    # it is now:
    get_DATA()['DELETES']

(No clue, if the dict abstraction is good enough or if real attributes on STORAGE would be better here ...)

Do you want to write a fix for it?

@jerch jerch added the bug Something isn't working label Nov 19, 2024
@jerch
Copy link
Collaborator

jerch commented Nov 19, 2024

Just benchmarked it - fully decoupling is actually the fastest here:

STORAGE = local()

def get_DELETES():
    try:
        return STORAGE.DELETES
    except AttributeError:
        STORAGE.DELETES = {}
    return STORAGE.DELETES

def get_M2M_REMOVE():
    try:
        return STORAGE.M2M_REMOVE
    except AttributeError:
        STORAGE.M2M_REMOVE = {}
    return STORAGE.M2M_REMOVE
...

Which seems plausible, as it only cares for the datapoint actually needed.

Edit: Fixed SHARED vs. STORAGE naming. 😅

@verschmelzen
Copy link
Author

I would be happy to submit a patch! Eventually ....

I think separate getters are very good, since I didn't found any way to set default value of entire local() per-thread.

On more general note I also think switch to asgiref.Local could be done in the same patch. Since weird stuff can happen if someone uses sync_to_async or async_to_sync otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants