-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BC_FREE_BUFFER not getting executed after processing oneway RIL indication #62
Comments
Please check if it's still happening with libgbinder 1.1.6, which touched quite a few things related to buffer management. |
It took around 22 hours on the Volla phone to trigger the issue with libgbinder 1.1.6. |
As an additional note, we do use glib2 version 2.48.2, maybe bug there? |
So mine failed also with both libgbinder and libgbinder-radio updated, so 1.1.6 sadly didnt fix the issue :( @monich Maybe the glib2 version have something to say? what version do sailfish use? |
I very much doubt that this has anything to do with glib2, we have tried many different versions and never experienced this sort of problem. Most recently it's been 2.66.2. My feeling is that it's more likely to be related to the binder driver. |
Reason why i ask, is that we have tried many different phones and chipsets and it happens on all, even tried the same phones with the same kernel and hal on sailfish but it does not happen there. All from kernel, hal, and the ofono stack is the same. |
Another way to try to pin down the root cause would be to try running ofono under valgrind (which is something I periodically do). If it's indeed something not being deallocated (and BC_FREE_BUFFER not being sent as a consequence) it must show up as a leak in valgrind. It would be nice to confirm that it's actually a refcount screw-up somewhere in ofono binder plugin. |
I have ran this under valgrind for over 6 days now... no issues at all. So it does not seems to appear when its ran with valgrind... This devices fails within 24h when running without valgrind. |
Hmm, this suggests some sort of a race condition (valgrind slowing things down, shifting timings sufficiently for the race to never occur). That may be a tough one to debug. |
Next step that I suggest to do is to reproduce it like @NotKit originally did, but this time with GUTIL_LOG_TID=1 (and libglibutil >= 1.0.51) on the ofono side. That will prefix each line of debug output with the thread id. I added this GUTIL_LOG_TID thing relatively recently: https://git.sailfishos.org/mer-core/libglibutil/commit/c707a15 |
At some point in time (hours or days after boot) ofono gets stuck on Ubuntu Touch Android 9 adaptations and stops processing incoming async transactions. It results in more and more buffers being allocated for incoming indication transactions from RIL which do not reach ofono until there are no buffers left, which results in binder transaction failure with code -28 in dmesg:
binder: 2989:3012 transaction failed 29201/-28, size 6572-1304 line 3137
It seems to be caused by BC_FREE_BUFFER command not getting issued under rare circumstances which I have not been able to figure out yet. The way async_todo queue is handled in binder kernel driver, next oneway transaction is dequeued only after BC_FREE_BUFFER command is processed.
I have not seen this issue on SailfishOS, however it is common on Ubuntu Touch devices based on Android 9, even though it uses SailfishOS fork of ofono for those devices and I tested with libgbinder 1.1.5 on UT.
In attached binder_trace.log, the last oneway transaction processed by /dev/hwbinder looper is the following and has no matching binder_transaction_buffer_release:
/dev/hwbinder#-13934 [005] ...1 68522.162937: binder_transaction_received: transaction=45804
It corresponds to
2021-02-19 17:21:25 [gbinder] < BC_TRANSACTION 0x00000002 0x00000082
in ofono.log. ofono_binder_error_28.tar.gzFrom my understanding of libgbinder code, the buffers for incoming one-way transactions are supposed to be freed after remote request is unreferenced here: https://github.com/mer-hybris/libgbinder/blob/master/src/gbinder_driver.c#L498. Going through ibgbinder-radio and ofono-ril-binder-plugin code, I have not encountered anything that could increase reference count for it. Would you have any idea what else could this be related to?
The text was updated successfully, but these errors were encountered: