-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way call-IDs are tracked in the SIP plugin (fixes #3404) #3443
Conversation
Pinging @lmangani as well, since before this fix SIP messages sent to Homer may have been misattributed by the core. |
@lminiero We've successfully built Janus with this PR and are beginning our testing process. Please expect a detailed feedback report within several days. |
@ycherniavskyi any feedback on this? |
@lminiero, unfortunately, we encountered two problematic cases that appear to be related to the same root issue. In both instances, Janus crashed (in our environment, the Janus container exited with code 139). First case Two accounts, Steps to reproduce:
Here is the call flow and detailed SIP message log for this case: Link to Gist (IP addresses, numbers, and names have been anonymized). It’s also worth noting that this issue does not occur with all softswitches we’ve tested. Second case In this scenario, three accounts are registered on our prospect softswitch:
Steps to reproduce:
The only requirement for this crash is that a non-Janus initiates the transfer. Since both cases seem related to the same issue, I didn’t collect a separate SIP dump for the second case. However, if needed, I can gather it ASAP. Let me know if you need additional information or further logs. |
If there's crashes, I'll need a gdb stacktrace or, even better, a libasan dump, to see where it crashes. Just the SIP messages (which is what I see in that gist) won't help unfortunately. |
@ycherniavskyi while we wait for those crash logs I mentioned, the only thing I noticed in your log is that there seem to be many SIP BYEs that are never answered, I guess because, as you pointed out, Janus seems to be crashing when it gets the first one. That said, we don't do anything in |
@ycherniavskyi I should have fixed the crash in the commit above: it was a double free, caused by the fact we were not strdup-ing the call-id when adding it as a key to the |
Merging ✌️ |
…#3404) (meetecho#3443) Thanks to WebTrit for sponsoring the effort!
…#3404) (meetecho#3443) Thanks to WebTrit for sponsoring the effort!
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [meetecho/janus-gateway](https://github.com/meetecho/janus-gateway) | minor | `v1.2.4` -> `v1.3.0` | --- ### Release Notes <details> <summary>meetecho/janus-gateway (meetecho/janus-gateway)</summary> ### [`v1.3.0`](https://github.com/meetecho/janus-gateway/blob/HEAD/CHANGELOG.md#v130---2024-11-25) [Compare Source](meetecho/janus-gateway@v1.2.4...v1.3.0) - Refactored logging internals \[[PR-3428](meetecho/janus-gateway#3428)] - Use strtok to parse SDPs \[[PR-3424](meetecho/janus-gateway#3424)] - Fixed rare condition that could lead to a deadlock in the VideoRoom \[[PR-3446](meetecho/janus-gateway#3446)] - Fixed broken switch when using remote publishers in VideoRoom \[[PR-3447](meetecho/janus-gateway#3447)] - Added SRTP support to VideoRoom remote publishers (thanks [@​spscream](https://github.com/spscream)!) \[[PR-3449](meetecho/janus-gateway#3449)] - Added support for generic JSON metadata to VideoRoom publishers (thanks [@​spscream](https://github.com/spscream)!) \[[PR-3467](meetecho/janus-gateway#3467)] - Fixed deadlock in VideoRoom when failing to open a socket for a new RTP forwarder (thanks [@​spscream](https://github.com/spscream)!) \[[PR-3468](meetecho/janus-gateway#3468)] - Fixed deadlock in VideoRoom caused by reverse ordering of mutex locks \[[PR-3474](meetecho/janus-gateway#3474)] - Fixed memory leaks when using remote publishers in VideoRoom \[[PR-3475](meetecho/janus-gateway#3475)] - Diluted frequency of PLI in the VideoRoom (thanks [@​natikaltura](https://github.com/natikaltura)!) \[[PR-3423](meetecho/janus-gateway#3423)] - Better cleanup after failed mountpoint creations in Streaming plugin \[[PR-3465](meetecho/janus-gateway#3465)] - Fixed compilation of AudioBridge in case libogg isn't available (thanks [@​tmatth](https://github.com/tmatth)!) \[[PR-3438](meetecho/janus-gateway#3438)] - Better management of call cleanup in SIP plugin \[[Issue-3430](meetecho/janus-gateway#3430)] - Change the way call-IDs are tracked in the SIP plugin (thanks WebTrit!) \[[PR-3443](meetecho/janus-gateway#3443)] - Increased maximum size of custom SIP headers \[[Issue-3459](meetecho/janus-gateway#3459)] - Other smaller fixes and improvements (thanks to all who contributed pull requests and reported issues!) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yOC4wIiwidXBkYXRlZEluVmVyIjoiMzkuMjguMCIsInRhcmdldEJyYW5jaCI6Im1hc3RlciIsImxhYmVscyI6W119--> Reviewed-on: https://git.walbeck.it/walbeck-it/docker-janus-gateway/pulls/157 Co-authored-by: renovate-bot <[email protected]> Co-committed-by: renovate-bot <[email protected]>
This PR is an attempt to address the problem @ycherniavskyi reported in issue #3404. We have to thank Yurii for this fix as well, as his company, @WebTrit, sponsored the effort to investigate the issue and come up with a potential solution. Thanks, guys!
Coming to the core of the issue, Yurii identified an inconsistency in how Call-IDs were used as ways to address sessions in the SIP plugin, that would occur specifically when caller and callee were handled by the same Janus instance, and so the map would be overwritten. This could cause problems in two specific cases:
Replaces
header properly.This PR introduces a new kind of mapping where we don't map the call-ID to a single session, but instead map it to a new structure that can keep track of both a caller and a callee, by keeping track of caller tags too. That structure is only updated with info on sessions handled by the Janus instance, and allows us to track both parties in a call from the same call-ID. This allowed us to solve the two problems above in two separate ways:
As usual, while this PR is currently for
master
, I'll backport it to0.x
once it's merged. I tested the PR briefly and it seems to work, but of course my ability to test SIP deployments (and those use cases in particular) is limited. I'm looking forward to feedback, to ensure that first of all this fixes the two problems we mentioned, but also that we're not introducing regressions with these changes. If you use SIP extensively in your deployments, please make sure to test this and give us all the feedback we need to make sure it's merged as soon as possible.