-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in SNAT during reference counter increase #621
Comments
Also looking at the code, there is a call to |
While testing my temporary test branch that reorders the reference instructions and adds a log line, while also checking reference memory for validity (i.e. not freed before). But already I get crashes in What is more interesting, that the crash in SNAT is still there, but the reason seems to be memory already beeing freed, i.e.:
I will deploy the proper fix tonight and will get back here with the results, I just wanted to note the fact that i's not an "off-by-one error" but a reuse of freed reference. |
A crash happening in commit 4bbcae9 which is OSC branch with increased refcound guards and the change you provided.
I do not have a log for this as I got to it too late, but will try to make it happen again. |
Found the appropriate log |
In the same commit 4bbcae9 there is also crash in aging-out flows:
so I guess the fix for NAT flow removal did not affect these kinds of errors. relevant |
This has happened multiple times in OSC. Looking at the code, this is caused by improper order of operations in
snat_node.c:74
dp_delete_flow()
causesdp_ref_dec()
which can possibly go to zerodp_ref_inc()
is then called on a freed-up referenceI have created a temporary fix for OSC that simply changes the order to:
dp_ref_inc()
dp_delete_flow()
dp_ref_dec()
is needed to revert the previous increaseNow I stand by this order of operations, but I am also aware, that the situation should never happen, as there should always be at least 2 references for a flow. But from a local code review the order simply should be done this way to avoid confusion.
The next question is, why the situation has arisen, because I am simply curing the symptom and not a cause. This is still ongoing in OSC.
I have not yet created a PR because I think this can have better solutions and some discussion is surely needed before doing any big changes.
The text was updated successfully, but these errors were encountered: