-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIFI][ESP32S3][v5.3.1] WIFI provisioning failing to scan and connect to mobile hotspot multiple times (IDFGH-13971) #14801
Comments
please note that the device is not even able to scan for the hotspot 3 times out of 4 i tried to enable
but it has no effect
|
here logs with verbose wifi. scan: after reboot the device also struggles to connect to the WIFI (partial log with the success after multiple timeouts) |
These disconnections mean the internal timeout. This disconnection means the STA receives a disassociation frame from the AP, reason code is 7. |
@KonssnoK |
@zhangyanjiaoesp we do have enabled LR mode while the device is provisioning because we use esp-now for propagating credentials to other devices in the same installation. |
what i don't understand is why the final error on the connection try is 205 AP not found. we were able to scan for it, why would it fail saying it's not able to see it anymore? |
@KonssnoK If the connection fails, the STA will add the AP to a blacklist. During the next scan, if the STA finds that the AP it wants to connect is in the blacklist, it will return a 205 error and then remove the AP from the blacklist. When the next time it attempts to reconnect,the STA enter the scanning and connecting state again. You can consider the 205 as an intermediate state. |
@KonssnoK Can you provide a demo for us to test? Since Wi-Fi provisioning involves multiple modules, if we can reproduce it locally, it will speed up our debugging process. |
@zhangyanjiaoesp let me see what i can do |
so i made a change to the provisioning example on my branch C:\src\esp-idf\examples\provisioning\wifi_prov_mgr but of course i do not see the logs that i-ve sent from our device that was migrated from 4.4. Could it be something related to the wifi calibration done in 4.4? Ot that gets reset automatically when we update to 5.3? I will try to check for more differences |
@zhangyanjiaoesp i am really disturbed by the fact that now i cannot reproduce the behavior i logged above with both devkits and our dedicated hardware. |
@KonssnoK Do you mean that you can't reproduce the scan fail and connect fail issue? |
no i just reproduced it again on our hardware. I do not understand how to trigger it on a devkit tho. |
|
@zhangyanjiaoesp my problem is that our device is supposed to be compatible with multiple phones, used during installation. Could this issue be related to the commit made on v4.4 where the pixel 8 was caching the authorization ? point 1 here: |
@KonssnoK |
Are you saying that every time the current channel is different from the AP channel, there will be a connection failure? |
@zhangyanjiaoesp no sorry i was wrong in the end, I see multiple
and then disconnection with reason 7. |
|
I'm trying to capture packets but most of the time it cannot even scan the access point... |
here is an example of a very strange behavior from the device:
Note this works in v4.4 device logs with debug level on wifi File .pcapng MAC AP: d2:0f:7d;8a:8b:29 |
@zhangyanjiaoesp example where
logs File .pcapng |
@zhangyanjiaoesp in this other case
logs File .pcapng |
more random errors generated by the device log(partial) |
In this log, the scan is fail, it didn't find the specific AP, if founds, the log will show |
For the other three logs and packet captures, I will check and reply to you ASAP. |
I understand it's not finding the AP, but it found it 5 seconds before during the scan, how can the device stop seeing an AP less than 1 meter away in 2 consecutive scans? |
|
@zhangyanjiaoesp do you know when the nulls are sent? because i don't always manage to reproduce the issue (i want to reproduce it before testing your library) |
@zhangyanjiaoesp with the new library for now i'm not able to replicate the issue, but again, it doesn't always happens.. i'll try some more |
When the STA receives the association response, the hardware responds with an ACK to the AP and simultaneously begins sending NULL data. This is a feature added to address the issue where some APs may not be able to send EAPOL packets in a timely manner. However, if the hotspot does not receive the STA's ACK in time, it will continuously retransmit the association response. When the hotspot receives the NULL data, it will send a disassociation (disassoc) message to the STA, as described in the scenario above. If the hotspot is able to quickly receive the STA's ACK response, the NULL data sent by the STA will not affect the Wi-Fi connection, as shown in the diagram below: |
@zhangyanjiaoesp ok so the way to reproduce is most probably a busy channel. The fix that you sent me is already final? Or it just temporarely removed the NULL data? I'm not sure this covers also the AP not found error seen some of the times. |
The Wi-Fi library I provided does not delete the NULL data, but instead delays sending it for a period of time. It is not the final version, but it will not differ significantly from the final version.
Are you saying that after using the new wifi lib, the AP not found issue has been resolved? |
I have no proof for that, I will try some more in the next days when i can find a busy network in the office |
I think the new wifi lib will not affect the scan results, as it only modifies the connection process code. Scanning occurs before the connection process. |
@KonssnoK
Would it be possible for you to try a different mobile hotspot? |
hi @zhangyanjiaoesp ,
|
@KonssnoK I see that you have switched to a new phone, and it is working properly. |
Hi @zhangyanjiaoesp, Broad compatibility is what we are looking for. |
We understand that you want the product to have better compatibility, but it is not feasible for us to ensure compatibility with all devices. This is because some hotspots or routers exhibit behavior that is either unreasonable or even non-compliant with protocol specifications. What we can do is improve our handling when our response to reasonable behavior on the other end is inadequate. However, if the issue clearly lies with the other device, making adjustments to accommodate such devices seems unreasonable. Moreover, attempting to ensure compatibility with such problematic devices could potentially introduce new issues, leading to incompatibility with other devices. |
@KonssnoK |
@zhangyanjiaoesp Why v4.4 does not has such issue? |
@AxelLin Please take a look at thess comments: #14801 (comment), #14801 (comment) . v5.3 includes changes related to null data, but v4.4 does not. |
well i'll continue to use both phones. Could you give me a small technical summary of the problem so that i can open an issue to google? Thanks |
Just curious if you can connect to pixel 8 using examples/wifi/getting_started/station example on esp-idf-5.3.1? |
@KonssnoK This log indicates that the STA disconnected from the AP due to receiving a disassociation packet sent by the AP. The disassociation packet contains a reason code of 7 (Class 3 frame received from a non-associated STA). The Wireshark packet capture is as follows: When the STA receives the association response, it replies with a hardware ACK and sends a NULL data frame. The issue lies in the fact that the mobile hotspot did not receive the hardware ACK and kept retransmitting the association response, however it was able to receive the NULL data frame from the STA and send the disassociation frame. The normal packet capture is as follows: The hotspot successfully receives the ACK, and the two devices communicate normally. (By the way, The screenshots above are all from your previous packet captures, and the captures do not include MAC ACK packets. If you can perform a new capture and include the MAC ACK packets, it would likely be more convincing. The MAC ACK packet is similar to the packet captures with a blue background shown below.) |
Answers checklist.
General issue report
on top of a0f798c
@zhangyanjiaoesp
we are in the middle of moving to v5.3 from v4.4.
We are seeing issues in the wifi that did not occur in v4.4
Specifically,
we are moving the provisioning part of our devices.
Apart from having to change BLE stack ( #14790 (comment) )
the device is now not apparently able to connect to phone hotspots anymore. Phone is a pixel 8.
I think this might be related to nimble taking the antenna for too long.
Could you point me to the right options to fix this?
The text was updated successfully, but these errors were encountered: