Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of a device being removed from the bus while we are waiting for a change #1329

Closed
wants to merge 1 commit into from

Conversation

bdraco
Copy link
Contributor

@bdraco bdraco commented Jun 8, 2023

We could wait forever if the device was removed from the bus as the _wait_condition callback would never fire because we would can get InterfacesRemoved before PropertiesChanged (which currently results in passed KeyError in the PropertiesChanged path).

This was presented as a long wait when a connection fails if the device is removed from the bus because the connection state change callback will never fire as the device gets removed after the le-connection-abort-by-local or the adapter runs out of slots.

This fix is blocking another fix: esphome/aioesphomeapi#528 (see Bluetooth-Devices/bleak-retry-connector#101 for more info)

The reason everything didn't stall completely in Home Assistant as we have everything wrapped in async_timeout as an additional safety.

2023-06-08 17:15:51.630 DEBUG (MainThread) [bleak.backends.bluezdbus.client] Connecting to BlueZ path /org/bluez/hci1/dev_78_9C_85_0A_94_40
2023-06-08 17:15:51.720 DEBUG (MainThread) [bleak.backends.bluezdbus.client] retry due to le-connection-abort-by-local
2023-06-08 17:15:51.724 DEBUG (MainThread) [bleak.backends.bluezdbus.manager] received D-Bus signal: org.freedesktop.DBus.ObjectManager.InterfacesRemoved (/): ['/org/bluez/hci1/dev_78_9C_85_0A_94_40', ['org.freedesktop.DBus.Properties', 'org.freedesktop.DBus.Introspectable', 'org.bluez.Device1']]
66    17:04:21 connect(204, {sa_family=AF_UNIX, sun_path="/var/run/dbus/system_bus_socket"}, 34) = 0
66    17:04:21 ioctl(204, FIONBIO, [1] <unfinished ...>
66    17:04:21 sendto(204, "\0", 1, 0, NULL, 0 <unfinished ...>
66    17:04:21 sendto(204, "AUTH EXTERNAL 30\r\n", 18, 0, NULL, 0 <unfinished ...>
66    17:04:21 recvfrom(204, "OK eeb9e653007b4524f7fe8b6eff03a22f\r\n", 1024, 0, NULL, NULL) = 37
66    17:04:21 sendto(204, "NEGOTIATE_UNIX_FD\r\n", 19, 0, NULL, 0 <unfinished ...>
66    17:04:21 recvfrom(204,  <unfinished ...>
66    17:04:21 sendto(204, "BEGIN\r\n", 7, 0, NULL, 0 <unfinished ...>
66    17:04:21 sendto(204, "l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\5\0\0\0Hello\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", 128, 0, NULL, 0) = 128
66    17:04:21 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="l\2\1\1\f\0\0\0\377\377\377\377?\0\0\0", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 16
66    17:04:21 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="\5\1u\0\1\0\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\7\0\0\0:1.8882\0\10\1g\0\1s\0\0\7\0\0\0:1.8882\0", iov_len=76}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 76
66    17:04:21 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="l\4\1\1\f\0\0\0\377\377\377\377\217\0\0\0", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 16
66    17:04:21 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\7\0\0\0:1.8882\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\f\0\0\0NameAcquired\0\0\0\0\10\1g\0\1s\0\0\7\0\0\0:1.8882\0", iov_len=156}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 156
66    17:04:21 recvmsg(204, {msg_namelen=110}, 0) = -1 EAGAIN (Resource temporarily unavailable)
66    17:04:21 sendto(204, "l\1\0\1\0\0\0\0\2\0\0\0r\0\0\0\1\1o\0%\0\0\0/org/bluez/hci1/dev_78_9C_85_0A_94_40\0\0\0\2\1s\0\21\0\0\0org.bluez.Device1\0\0\0\0\0\0\0\3\1s\0\7\0\0\0Connect\0\6\1s\0\t\0\0\0org.bluez\0\0\0\0\0\0\0", 136, 0, NULL, 0) = 136
66    17:04:22 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="l\3\1\1!\0\0\0\237\7@\2M\0\0\0", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 16
66    17:04:22 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="\6\1s\0\7\0\0\0:1.8882\0\4\1s\0\26\0\0\0org.bluez.Error.Failed\0\0\5\1u\0\2\0\0\0\10\1g\0\1s\0\0\7\1s\0\4\0\0\0:1.8\0\0\0\0\34\0\0\0le-connection-abort-by-local\0", iov_len=113}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 113
66    17:04:22 recvmsg(204, {msg_namelen=110}, 0) = -1 EAGAIN (Resource temporarily unavailable)
66    17:04:41 sendto(204, "l\1\0\1\0\0\0\0\3\0\0\0z\0\0\0\1\1o\0%\0\0\0/org/bluez/hci1/dev_78_9C_85_0A_94_40\0\0\0\2\1s\0\21\0\0\0org.bluez.Device1\0\0\0\0\0\0\0\3\1s\0\n\0\0\0Disconnect\0\0\0\0\0\0\6\1s\0\t\0\0\0org.bluez\0\0\0\0\0\0\0", 144, 0, NULL, 0) = 144
66    17:04:41 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="l\2\1\1\0\0\0\0\10\20@\2%\0\0\0", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 16
66    17:04:41 recvmsg(204, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=110 => 30, msg_iov=[{iov_base="\6\1s\0\7\0\0\0:1.8882\0\5\1u\0\3\0\0\0\7\1s\0\4\0\0\0:1.8\0\0\0\0", iov_len=40}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 40
66    17:04:41 recvmsg(204, {msg_namelen=110}, 0) = -1 EAGAIN (Resource temporarily unavailable)
66    17:04:41 shutdown(204, SHUT_RDWR) = 0
66    17:04:41 recvmsg(204, {msg_name=0x7ffe572cd400, msg_namelen=110 => 0, msg_iov=[{iov_base="", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 0
66    17:05:24 ioctl(204, FIONBIO, [1]) = 0

@bdraco
Copy link
Contributor Author

bdraco commented Jun 8, 2023

new behavior after change

2023-06-08 17:39:50.358 DEBUG (MainThread) [bleak.backends.bluezdbus.manager] received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci1/dev_78_9C_85_0A_94_40): ['org.bluez.Device1', {'RSSI': <dbus_fast.signature.Variant ('n', -80)>}, []]
2023-06-08 17:39:50.376 DEBUG (MainThread) [bleak.backends.bluezdbus.client] retry due to le-connection-abort-by-local
2023-06-08 17:39:50.384 DEBUG (MainThread) [bleak.backends.bluezdbus.manager] received D-Bus signal: org.freedesktop.DBus.ObjectManager.InterfacesRemoved (/): ['/org/bluez/hci1/dev_78_9C_85_0A_94_40', ['org.freedesktop.DBus.Properties', 'org.freedesktop.DBus.Introspectable', 'org.bluez.Device1']]
2023-06-08 17:39:50.384 DEBUG (MainThread) [bleak.backends.bluezdbus.client] Device disconnected (/org/bluez/hci1/dev_78_9C_85_0A_94_40)
2023-06-08 17:39:50.384 DEBUG (MainThread) [bleak.backends.bluezdbus.client] _cleanup_all(/org/bluez/hci1/dev_78_9C_85_0A_94_40)

@bdraco
Copy link
Contributor Author

bdraco commented Jun 14, 2023

I haven't had a chance to come back to this as I had to do a trip unexpectedly and won't be back home until the end of the month to get a good test on validating this fix.

@bdraco
Copy link
Contributor Author

bdraco commented Jul 25, 2023

It's actually worse because the callback what waits for the state to change has an untrapped key error so the state never resolved because it tries to access the device that is no longer there

@bdraco bdraco changed the title Stop waiting for a device to disconnect if its been removed Fix handling of a device being removed from the bus while we are waiting for a change Jul 25, 2023
Comment on lines 673 to 676
if device_path not in self._properties:
raise BleakError(
f"Device '{device_path}' disappeared while waiting to discover services"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't check here, we could end up with an unretrieved exception when calling _wait_condition since it will raise KeyError

raise BleakError(
f"Device '{device_path}' disappeared while waiting to discover services"
)

services_discovered_wait_task = asyncio.create_task(
Copy link
Contributor Author

@bdraco bdraco Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would wait forever if the device is removed from the bus after we start waiting for the callback which is likely the problem in home-assistant/core#93171

CHANGELOG.rst Outdated Show resolved Hide resolved
@bdraco
Copy link
Contributor Author

bdraco commented Jul 25, 2023

I adjusted the opening text a bit to try to better explain whats going on here

@bdraco
Copy link
Contributor Author

bdraco commented Aug 23, 2023

@dlech Sorry to ask again. Hopefully you can find some time for this one as its blocking some other fixes now.

@dlech
Copy link
Collaborator

dlech commented Aug 25, 2023

I have proposed an alternative to synthesizing the "Connected" event in #1399.

It looks like there are some other changes here to fix linting issues and perhaps some performance improvements as well? Happy to take those as separate commits.

@bdraco
Copy link
Contributor Author

bdraco commented Aug 25, 2023

I have proposed an alternative to synthesizing the "Connected" event in #1399.

Looks good sans the exception handling you already commented on

It looks like there are some other changes here to fix linting issues and perhaps some performance improvements as well? Happy to take those as separate commits.

I'll break out the code that reduces the linear searches into a fresh PR after #1399 merges to reduce the chance of conflicts

@dlech
Copy link
Collaborator

dlech commented Aug 28, 2023

This was split out into other pull requests.

@dlech dlech closed this Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants