-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iocp: fix crash, GetQueuedCompletionStatus() write freed WSAOVERLAPPED memory #4136
base: master
Are you sure you want to change the base?
Conversation
Great! I assume you've tested this and it does fix the issue :) I think we also need to run under somekind of stress test, e.g: using ioqueue test in pjlib-test, to make sure all memory pools (of pending ops) are properly released. Note that the after an ioqueue key is unregistered, the key will be put into the closing-key-list and soon into the free-key-list to be reused by another socket. We need to make sure that all pending op has been freed before the key is freed & reused. Next, perhaps we can apply a little bit optimization, e.g: instead of mem-pool for each pending-op, perhaps mem-pool per ioqueue-key to avoid multiple alloc+free for multiple pending-op, using same mechanism as ioqueue key (employing additional list for keeping unused pending-op instances to be reused later). |
Note: |
When |
Tried to run
Not sure if this is the same issue, but this assertion does not happen when using ioqueue select. |
@nanangizz no this patch, Is there this assert? |
Yes, same assert without this patch. |
I found the reason: key double unregister. |
Thanks @jimying . Honestly I haven't got a chance to reproduce the original issue and test the proposed solution. I believe you are using this ioqueue in real world, experienced the issue, and find this solution does work, is that correct? Next, here are few notes about the proposed solution:
Also, this ioqueue has been disabled for quite sometime and some improvement in the ioqueue area may not be integrated into this ioqueue, e.g: group lock for key. So please understand that there may still be some steps required to enable this ioqueue again :) |
@nanangizz i write a simple demo to reproduce the crash issue in msys2, #4172 I have tested it, in old code, it can 100% reproduce the crash. To test new code we can git cherry-pick the demo patch to this branch. |
Thanks @jimying. |
…D memory try to fix issue pjsip#985
Hi @jimying, please let us know whether you plan to incorporate @nanangizz suggestions above. |
@sauwming sorry, I will submit it as soon as possible today or tomorrow |
new commits do:
|
The pool is owned by key/socket, instead of by ioqueue, to avoid possible infinite memory grow in ioqueue.
Update ioq_stress_test not to use the global group lock for key registration, as otherwise the keys won't be released until the global group lock is destroyed (i.e: after ioqueue destroy).
I think this is ready for review @jimying , @sauwming , @trengginas. |
- added info to clarify codes - added copyright for test code - minors.
The last commit should cover all review comments above. @jimying, re: copyright text, feel free to change the name :) |
@@ -0,0 +1,208 @@ | |||
/* | |||
* Copyright (C) 2024 jimying at github dot com. | |||
* Copyright (C) 2024 Teluu Inc. (http://www.teluu.com) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, Teluu is usually put above, for two reasons: 1. to signify that the original author has agreed to contribute it to Teluu (as per CLA), 2. to make it easier to update copyright year (i.e. only the latest/first copyright info will get updated, the rest will remain the same).
* operations must be cancelled. As cancelling ops is asynchronous, | ||
* IOCP destroy may need to wait for the maximum time specified here. | ||
*/ | ||
#define TIMEOUT_CANCEL_OP 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIMEOUT_CANCEL_OP macro is unused. WAIT_KEY_MS (in pj_ioqueue_destroy()) the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually WAIT_KEY_MS should have been replaced by TIMEOUT_CANCEL_OP, so WAIT_KEY_MS is unused (and undefined).
pjlib/src/pj/ioqueue_winnt.c
Outdated
|
||
pj_list_push_back(&ioqueue->free_list, key); | ||
} | ||
#endif | ||
ioqueue->max_fd = pj_list_size(&ioqueue->free_list); // max_fd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why compute again use pj_list_size()? better revert to ioqueue->max_fd= max_fd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops right, forgot to revert.
pj_gettickcount(&timeout); | ||
if (PJ_TIME_VAL_GTE(timeout, stop)) { | ||
PJ_LOG(3, (THIS_FILE, "Warning, IOCP destroy timeout in waiting " | ||
"for cancelling ops, after %dms, pending keys=%d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor build warning: format '%d' expects argument of type 'int', but argument 4 has type 'pj_size_t'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Try to fix issue #985. The idea is to call CancelIoEx() for the unregistering socket/key to cancel all pending operations of the key. However, as
CancelIoEx()
is basically asynchronous, this also makes the key unregistration asynchronous, so here are some consequences: