Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix crash due to session disconnect #332

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stavroskladis
Copy link

@stavroskladis stavroskladis commented Sep 20, 2024

The issue was described initially here:
unispeech/asterisk-unimrcp#67 (comment)

In case of an asr server crash (mrcp server) asterisk-unimrcp and mrcp_client does not terminate properly the mpf_engine / stream and all the relevant to mpf resources. So the following crash is takes place.

(gdb) bt
#0  0x00007f3b6d4f8283 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0
#1  0x00007f3b2f3d2c70 in speech_channel_read (schannel=schannel@entry=0x7f3b502d1048, data=0x7f3b50189a20, len=len@entry=0x7f3b19195ca8, block=block@entry=0) at speech_channel.c:699
#2  0x00007f3b2f3dbc69 in recog_stream_read (stream=<optimized out>, frame=0x7f3b50189928) at app_synthandrecog.c:1190
#3  0x00007f3b2f1a7b29 in mpf_audio_stream_frame_read (frame=0x7f3b50189928, stream=<optimized out>) at ../../libs/mpf/include/mpf_stream.h:136
#4  mpf_decoder_process (stream=<optimized out>, frame=0x7f3b50189c08) at src/mpf_decoder.c:60
#5  0x00007f3b2f1a210b in mpf_bridge_process (object=0x7f3b50189bd0) at src/mpf_bridge.c:63
#6  0x00007f3b2f1a493e in mpf_context_process (context=context@entry=0x7f3b50188490) at src/mpf_context.c:438
#7  0x00007f3b2f1a4979 in mpf_context_factory_process (factory=0x3059ac8) at src/mpf_context.c:105
#8  0x00007f3b2f1a7580 in timer_thread_proc (thread=0x30729a0, data=0x3059b28) at src/mpf_scheduler.c:212
#9  0x00007f3b6d4f444b in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3b6afaa52f in clone () from /lib64/libc.so.6

From logs perspective we can see that the session has been marked as disconnected:

[2024-09-16 10:22:38.310] NOTICE[20011] src/rtsp_client.c: Cancel RTSP Request 0x7fdf5407ad98 <be04e9f83ffe4fc9aea4e1762d864369> CSeq:6 [500]
[2024-09-16 10:22:38.311] DEBUG[20008] src/mrcp_client_session.c: Mark Session as Disconnected ASR-592 <be04e9f83ffe4fc9aea4e1762d864369>
[2024-09-16 10:22:39.996] ERROR[22835][C-00000250] app_synthandrecog.c: (ASR-592) Unable to load grammar
...
[2024-09-16 10:22:40.000] DEBUG[22835][C-00000250] speech_channel.c: Destroy speech channel: Name=ASR-592, Type=RECOGNIZER, Codec=PCMA, Rate=8000
[2024-09-16 10:22:40.000] DEBUG[22835][C-00000250] src/apt_task.c: Signal Message to [MRCP Client] [0x7fdf60058480;4;0]
[2024-09-16 10:22:40.012] DEBUG[22835][C-00000250] speech_channel.c: (ASR-592) Waiting for MRCP session to terminate
[2024-09-16 10:22:40.013] NOTICE[20008] src/mrcp_client_session.c: Receive App Request ASR-592 <be04e9f83ffe4fc9aea4e1762d864369> [1]
[2024-09-16 10:22:40.013] DEBUG[20008] src/mrcp_client_session.c: Push Request to Queue ASR-592 <be04e9f83ffe4fc9aea4e1762d864369>
[2024-09-16 10:22:42.013] WARNING[22835][C-00000250] speech_channel.c: (ASR-592) MRCP session has not terminated after 2000 ms
[2024-09-16 10:22:42.014] ERROR[22835][C-00000250] speech_channel.c: (ASR-592) Failed to destroy channel. Continuing
[2024-09-16 10:22:42.014] DEBUG[22835][C-00000250] audio_queue.c: (ASR-592) Audio queue destroyed
[2024-09-16 10:22:42.014] DEBUG[22835][C-00000250] speech_channel.c: (ASR-592) MPF generator has been reset
[2024-09-16 10:22:42.015] DEBUG[22835][C-00000250] speech_channel.c: (ASR-592) DTMF generator destroyed
...

Finally I detected the issue that leads to the above crash.
mrcp_client of unimrcp (implements mrcp protocol and is used by asterisk-unimrcp asterisk module) in case of a disconnection in any of its’ connections, didn’t destroy the mrcp channel and its’ resources (mpf topology, engine, etc). Without this destroy action mpf engine was trying to read frames from the speech channel even after its termination from the asterisk-unimrcp module (pointers pointed to garbage).
After this realisation the solution was 1 line of code
Stresstesting for 40 minutes, ~4000 calls processed per pjsip/distributor, 10 asr server crashes, no asterisk taskprocessor lock and no asterisk crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant