Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIFI][ESP32S3][v5.3.1] WIFI provisioning failing to scan and connect to mobile hotspot multiple times (IDFGH-13971) #14801

Open
3 tasks done
KonssnoK opened this issue Oct 29, 2024 · 47 comments
Assignees
Labels
Status: Reviewing Issue is being reviewed

Comments

@KonssnoK
Copy link
Contributor

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

on top of a0f798c
@zhangyanjiaoesp
we are in the middle of moving to v5.3 from v4.4.
We are seeing issues in the wifi that did not occur in v4.4

Specifically,
we are moving the provisioning part of our devices.
Apart from having to change BLE stack ( #14790 (comment) )

the device is now not apparently able to connect to phone hotspots anymore. Phone is a pixel 8.

I think this might be related to nimble taking the antenna for too long.
Could you point me to the right options to fix this?

I (194914) wifi:new:<7,0>, old:<13,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
I (194916) wifi:state: init -> auth (0xb0)
I (195693) mesh: mesh is not inited
I (196693) mesh: mesh is not inited
I (197693) mesh: mesh is not inited
I (198693) mesh: mesh is not inited
I (199207) wifi:state: auth -> init (0x200)
I (199220) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (199221) wifi:new:<7,0>, old:<7,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
E (01:09:12.802) wifi_prov_mgr: STA Disconnected
E (01:09:12.805) wifi_prov_mgr: Disconnect reason : 2
I (01:09:12.811) prov_BLE: Disconnected. Connecting to the AP again...
E (199243) wifi:sta is connecting, return error
I (199693) mesh: mesh is not inited
I (200693) mesh: mesh is not inited
E (01:09:14.278) now_packet: Receive ESPNOW data too short, len:3
E (01:09:14.298) now_packet: Invalid CRC 274C 445
E (01:09:14.410) now_packet: Invalid CRC 75D8 445
E (01:09:14.465) now_packet: Invalid CRC DA09 4BF
I (201693) mesh: mesh is not inited
I (202693) mesh: mesh is not inited
I (203693) mesh: mesh is not inited
I (204055) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

E (01:09:17.633) wifi_prov_mgr: STA Disconnected
E (01:09:17.634) wifi_prov_mgr: Disconnect reason : 205
I (01:09:17.638) prov_BLE: Disconnected. Connecting to the AP again...
E (204070) wifi:sta is connecting, return error
I (204693) mesh: mesh is not inited
E (01:09:19.175) now_packet: Invalid CRC 7F5D 4C0
E (01:09:19.218) now_packet: Invalid CRC 4E37 445
E (01:09:19.222) now_packet: Invalid CRC 5FB3 4C0
I (205693) mesh: mesh is not inited
I (206693) mesh: mesh is not inited
I (207693) mesh: mesh is not inited
I (208693) mesh: mesh is not inited
I (208882) wifi:new:<7,0>, old:<7,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
I (208883) wifi:state: init -> auth (0xb0)
I (209595) wifi:state: auth -> assoc (0x0)
I (209693) mesh: mesh is not inited
I (01:09:23.600) stats: iteration_time_us=999976 Name=last% IDLE1=60 IDLE0=98 wifi=38 
I (210597) wifi:state: assoc -> init (0x2700)
I (210610) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (210611) wifi:new:<7,0>, old:<7,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
E (01:09:24.192) wifi_prov_mgr: STA Disconnected
E (01:09:24.195) wifi_prov_mgr: Disconnect reason : 39
I (01:09:24.201) prov_BLE: Disconnected. Connecting to the AP again...
E (210633) wifi:sta is connecting, return error
I (210693) mesh: mesh is not inited
I (211693) mesh: mesh is not inited
E (01:09:25.695) now_packet: Invalid CRC B1CA 445
E (01:09:25.729) now_packet: Invalid CRC E3D1 4C0
I (212693) mesh: mesh is not inited
I (213693) mesh: mesh is not inited
I (214693) mesh: mesh is not inited
I (215445) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

E (01:09:29.024) wifi_prov_mgr: STA Disconnected
E (01:09:29.024) wifi_prov_mgr: Disconnect reason : 201
E (01:09:29.028) wifi_prov_mgr: STA AP Not found
I (01:09:29.034) prov_BLE: Disconnected. Connecting to the AP again...
E (01:09:29.043) prov_BLE: Provisioning failed! Reason : Wi-Fi access-point not found
I (215693) mesh: mesh is not inited
I (216693) mesh: mesh is not inited
E (01:09:30.517) now_packet: Invalid CRC E0BD 4E0
E (01:09:30.645) now_packet: Invalid CRC 7937 4C0
W (01:09:30.648) network: LOST IP
D (01:09:30.648) modem_comm: IP event 1
I (217693) mesh: mesh is not inited
I (218693) mesh: mesh is not inited
I (219065) mesh: mesh is not started
W (01:09:32.641) mesh_main: <MESH_MSG_SEND_INTERNAL>send err:16388
I (01:09:32.642) tmpr_propagation: Propagated local temperature
I (219693) mesh: mesh is not inited
I (220285) wifi:new:<7,0>, old:<7,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
I (220285) wifi:state: init -> auth (0xb0)
I (220859) mesh: mesh is not inited
I (220905) wifi:state: auth -> assoc (0x0)
I (220911) wifi:state: assoc -> run (0x10)
I (01:09:34.600) stats: iteration_time_us=1000036 Name=last% IDLE1=50 IDLE0=98 wifi=49 
I (221049) wifi:state: run -> init (0x7a0)
I (221062) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (221063) wifi:new:<7,0>, old:<7,0>, ap:<255,255>, sta:<7,0>, prof:1, snd_ch_cfg:0x0
E (01:09:34.644) wifi_prov_mgr: STA Disconnected
E (01:09:34.647) wifi_prov_mgr: Disconnect reason : 7
I (01:09:34.653) prov_BLE: Disconnected. Connecting to the AP again...
E (221084) wifi:sta is connecting, return error
I (221860) mesh: mesh is not inited
E (01:09:36.174) now_packet: Invalid CRC C084 4C1
E (01:09:36.187) now_packet: Invalid CRC 4409 4E0
E (01:09:36.292) now_packet: Invalid CRC 4543 4C1
E (01:09:36.302) now_packet: Invalid CRC C845 4E0
I (222860) mesh: mesh is not inited
I (223860) mesh: mesh is not inited
I (01:09:37.643) offline: object saved to PSRAM, free 5518 type 10 idx 5
I (224860) mesh: mesh is not inited
I (225860) mesh: mesh is not inited
I (225896) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

E (01:09:39.475) wifi_prov_mgr: STA Disconnected
E (01:09:39.475) wifi_prov_mgr: Disconnect reason : 201
E (01:09:39.479) wifi_prov_mgr: STA AP Not found
I (01:09:39.485) prov_BLE: Disconnected. Connecting to the AP again...
E (01:09:39.494) prov_BLE: Provisioning failed! Reason : Wi-Fi access-point not found
I (226860) mesh: mesh is not inited
E (01:09:41.061) now_packet: Invalid CRC 5E8C 4E0
E (01:09:41.089) now_packet: Invalid CRC 1543 4C1
I (227860) mesh: mesh is not inited
I (228860) mesh: mesh is not inited
I (229860) mesh: mesh is not inited
I (230736) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

E (01:09:44.314) wifi_prov_mgr: STA Disconnected
E (01:09:44.315) wifi_prov_mgr: Disconnect reason : 201
E (01:09:44.319) wifi_prov_mgr: STA AP Not found
I (01:09:44.325) prov_BLE: Disconnected. Connecting to the AP again...
I (01:09:44.340) prov_BLE: Failed to connect with provisioned AP, reseting provisioned credentials
I (230862) mesh: mesh is not inited
I (231862) mesh: mesh is not inited
@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 29, 2024
@github-actions github-actions bot changed the title [WIFI][ESP32S3][v5.3.1] WIFI provisioning failing to scan and connect to mobile hotspot multiple times [WIFI][ESP32S3][v5.3.1] WIFI provisioning failing to scan and connect to mobile hotspot multiple times (IDFGH-13971) Oct 29, 2024
@KonssnoK
Copy link
Contributor Author

KonssnoK commented Oct 29, 2024

please note that the device is not even able to scan for the hotspot 3 times out of 4

i tried to enable

CONFIG_BT_CTRL_COEX_PHY_CODED_TX_RX_TLIM_EN=y
# CONFIG_BT_CTRL_COEX_PHY_CODED_TX_RX_TLIM_DIS is not set
CONFIG_BT_CTRL_COEX_PHY_CODED_TX_RX_TLIM_EFF=1

but it has no effect

I (407412) mesh: mesh is not inited
I (408256) wifi:new:<13,0>, old:<13,0>, ap:<255,255>, sta:<13,0>, prof:1, snd_ch_cfg:0x0
I (408257) wifi:state: init -> auth (0xb0)
I (408274) wifi:state: auth -> assoc (0x0)
I (408300) wifi:state: assoc -> run (0x10)
I (408309) wifi:state: run -> init (0x7a0)
I (408322) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (408323) wifi:new:<13,0>, old:<13,0>, ap:<255,255>, sta:<13,0>, prof:1, snd_ch_cfg:0x0
E (11:18:17.300) wifi_prov_mgr: STA Disconnected
E (11:18:17.303) wifi_prov_mgr: Disconnect reason : 7
I (408412) mesh: mesh is not inited
I (409412) mesh: mesh is not inited
E (11:18:19.240) now_packet: Invalid CRC 67D2 42D
E (11:18:19.251) now_packet: Invalid CRC 9FD7 4DD
E (11:18:19.357) now_packet: Invalid CRC 1295 42D
E (11:18:19.364) now_packet: Invalid CRC 6459 4DD
I (410412) mesh: mesh is not inited
E (11:18:19.406) now_packet: Invalid CRC 9306 42D
E (11:18:19.412) now_packet: Invalid CRC 77E7 458
E (11:18:19.476) now_packet: Invalid CRC C598 4DD
I (411412) mesh: mesh is not inited
I (412412) mesh: mesh is not inited
I (11:18:22.107) mqtt_app: MQTT_EVENT_BEFORE_CONNECT
I (11:18:22.108) mqtt_app: Try [1] connection to mqtts://mqtt-stg.tiko.energy:8883
I (413158) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

E (11:18:22.133) wifi_prov_mgr: STA Disconnected
E (11:18:22.134) wifi_prov_mgr: Disconnect reason : 205
I (413412) mesh: mesh is not inited
I (414412) mesh: mesh is not inited
E (11:18:24.096) now_packet: Invalid CRC DD30 4DD
E (11:18:24.101) now_packet: Invalid CRC B194 458
E (11:18:24.110) now_packet: Invalid CRC 40A3 42D
E (11:18:24.208) now_packet: Invalid CRC A427 4DD
I (415412) mesh: mesh is not inited
I (416412) mesh: mesh is not inited
I (417412) mesh: mesh is not inited
I (417986) wifi:new:<13,0>, old:<13,0>, ap:<255,255>, sta:<13,0>, prof:1, snd_ch_cfg:0x0
I (417986) wifi:state: init -> auth (0xb0)
I (418630) mesh: mesh is not inited
I (418696) wifi:state: auth -> assoc (0x0)
I (418741) wifi:state: assoc -> run (0x10)
I (418778) wifi:state: run -> init (0x7a0)
I (418792) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (418793) wifi:new:<13,0>, old:<13,0>, ap:<255,255>, sta:<13,0>, prof:1, snd_ch_cfg:0x0
E (11:18:27.771) wifi_prov_mgr: STA Disconnected
E (11:18:27.774) wifi_prov_mgr: Disconnect reason : 7
I (11:18:27.999) stats: iteration_time_us=999960 Name=last% IDLE1=50 IDLE0=97 sys_evt=1 wifi=48 
I (419630) mesh: mesh is not inited
I (420630) mesh: mesh is not inited
E (11:18:29.783) now_packet: Invalid CRC 6872 42D
E (11:18:29.893) now_packet: Invalid CRC 9003 42D
E (11:18:29.898) now_packet: Invalid CRC F760 4DE
E (11:18:29.914) now_packet: Invalid CRC 23D0 42D
I (421630) mesh: mesh is not inited
I (422630) mesh: mesh is not inited
I (423627) wifi:Coexist: Wi-Fi connect fail, apply reconnect coex policy

I (423630) mesh: mesh is not inited
E (11:18:32.603) wifi_prov_mgr: STA Disconnected
E (11:18:32.604) wifi_prov_mgr: Disconnect reason : 201
E (11:18:32.610) wifi_prov_mgr: STA AP Not found
E (11:18:32.616) prov_BLE: Provisioning failed! Reason : Wi-Fi access-point not found
I (11:18:33.149) processor: send status... (60000ms)
W (424178) wifi:Haven't to connect to a suitable AP now!
I (424630) mesh: mesh is not inited
I (425630) mesh: mesh is not inited
I (426630) mesh: mesh is not inited

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Oct 29, 2024

here logs with verbose wifi.
The important ones are the "failed"

scan:
24102901_wifi_succeeded_scan.txt
24102901_wifi_failed_scan.txt
connection:
24102901_wifi_success.txt
24102901_wifi_fail.txt

after reboot the device also struggles to connect to the WIFI (partial log with the success after multiple timeouts)
24102901_wifi_fail_success_reboot.txt

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
image

image

These disconnections mean the internal timeout.

image

This disconnection means the STA receives a disassociation frame from the AP, reason code is 7.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
Have you enabled the LR mode? Because I saw this information in the log: scan specific ssid=KI via LR

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp we do have enabled LR mode while the device is provisioning because we use esp-now for propagating credentials to other devices in the same installation.
Once the credentials are propagated and the installation is finished, we remove the LR flag.

@KonssnoK
Copy link
Contributor Author

@KonssnoK image

image

These disconnections mean the internal timeout.

image

This disconnection means the STA receives a disassociation frame from the AP, reason code is 7.

what i don't understand is why the final error on the connection try is 205 AP not found. we were able to scan for it, why would it fail saying it's not able to see it anymore?

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK If the connection fails, the STA will add the AP to a blacklist. During the next scan, if the STA finds that the AP it wants to connect is in the blacklist, it will return a 205 error and then remove the AP from the blacklist. When the next time it attempts to reconnect,the STA enter the scanning and connecting state again. You can consider the 205 as an intermediate state.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK Can you provide a demo for us to test? Since Wi-Fi provisioning involves multiple modules, if we can reproduce it locally, it will speed up our debugging process.

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp let me see what i can do

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Oct 30, 2024
@KonssnoK
Copy link
Contributor Author

so i made a change to the provisioning example on my branch
https://github.com/KonssnoK/esp-idf/tree/lc/provisioning

C:\src\esp-idf\examples\provisioning\wifi_prov_mgr

but of course i do not see the logs that i-ve sent from our device that was migrated from 4.4.

Could it be something related to the wifi calibration done in 4.4? Ot that gets reset automatically when we update to 5.3?

I will try to check for more differences

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp i am really disturbed by the fact that now i cannot reproduce the behavior i logged above with both devkits and our dedicated hardware.
Is there any internal setting that needs to be updated when moving from 4.4 to 5.3?
Is there something that could justify having such a behavior for some period after an update?

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK Do you mean that you can't reproduce the scan fail and connect fail issue?

@KonssnoK
Copy link
Contributor Author

no i just reproduced it again on our hardware. I do not understand how to trigger it on a devkit tho.
24103101_wifi_fail.txt

@zhangyanjiaoesp
Copy link
Collaborator

no i just reproduced it again on our hardware. I do not understand how to trigger it on a devkit tho. 24103101_wifi_fail.txt

  1. This log shows the AP always send disassociation to the ESP device, can you capture a packet for this?
  2. How about you change another hotspot?

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Oct 31, 2024

no i just reproduced it again on our hardware. I do not understand how to trigger it on a devkit tho. 24103101_wifi_fail.txt

1. This log shows the AP always send disassociation to the ESP device, can you capture a packet for this?

2. How about you change another hotspot?

@zhangyanjiaoesp my problem is that our device is supposed to be compatible with multiple phones, used during installation.
If I find a phone where it doesn't work, sadly I need to understand why..
We did try with other routers without seeing such a strange behavior. The main issue is that my phone (Pixel8) did not experience the issue before using v4.4.

Could this issue be related to the commit made on v4.4 where the pixel 8 was caching the authorization ?

point 1 here:
#13212 (comment)

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
The fixes in v4.4 have been integrated into v5.3

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 4, 2024

so by looking at logs it seems that the issue might be related to moving from one channel to another before connecting.

immagine

I'm still working on tryin to replicate on a devkit

@zhangyanjiaoesp
Copy link
Collaborator

Are you saying that every time the current channel is different from the AP channel, there will be a connection failure?

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 4, 2024

@zhangyanjiaoesp no sorry i was wrong in the end,
I updated the channel also in the devkit example, and this is a comparison between a failure (left) and success (right)

I see multiple

D (594710) wifi:Drop duplicate SAE auth: seq=1

and then disconnection with reason 7.
The channel change instead seems to be handled the same way both sides

immagine

@zhangyanjiaoesp
Copy link
Collaborator

D (594710) wifi:Drop duplicate SAE auth: seq=1 This log indicates that STA has received the auth packet and will discard the retransmitted auth packet.
Can you capture packets for the device and the hotspot? Both success and failure are necessary. It is best to display the time in hours/minutes/seconds in the log to align with the packet capture time

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

I'm trying to capture packets but most of the time it cannot even scan the access point...
I should have something soon

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

@zhangyanjiaoesp

here is an example of a very strange behavior from the device:

  • scan ok
  • provisioning
  • rescan for connecting
  • failure: AP not found

Note this works in v4.4

device logs with debug level on wifi
241105_wifi_scanok_connect_not_found.txt

File .pcapng
241105_wifi_scanok_fail_notfound.txt

MAC AP: d2:0f:7d;8a:8b:29
MAC DEV: AC:17:54:00:0A:B0

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

@zhangyanjiaoesp example where

  • scan is ok
  • connection fails
  • connection succeeds

logs
24110502_wifi_scanok_connect_error_ok.txt

File .pcapng
24110502_wifi_scanok_ok_errors.txt

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

@zhangyanjiaoesp in this other case

  • scan ok
  • connection fails
  • AP not found

logs
24110503_wifi_scanok_connect_not_found.txt

File .pcapng
24110503_wifi_scanok_not_found.txt

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

more random errors generated by the device

log(partial)
24110504_wifi_scanok_errors.txt

pcapng
24110504_wifi_scanok_errors.txt

@zhangyanjiaoesp
Copy link
Collaborator

@zhangyanjiaoesp

here is an example of a very strange behavior from the device:

* scan ok

* provisioning

* rescan for connecting

* failure: AP not found

Note this works in v4.4

device logs with debug level on wifi 241105_wifi_scanok_connect_not_found.txt

File .pcapng 241105_wifi_scanok_fail_notfound.txt

MAC AP: d2:0f:7d;8a:8b:29 MAC DEV: AC:17:54:00:0A:B0

In this log, the scan is fail, it didn't find the specific AP, if founds, the log will show D (398716) wifi:ap found, mac=d2:0f:7d:8a:8b:29 . The log wifi:connect status 0 -> 1 is only used to represent some internal status and is not a sign of starting a connection. The sign of starting a connection is: wifi:state: init -> auth (0xb0)

@zhangyanjiaoesp
Copy link
Collaborator

For the other three logs and packet captures, I will check and reply to you ASAP.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 6, 2024

@zhangyanjiaoesp
here is an example of a very strange behavior from the device:

* scan ok

* provisioning

* rescan for connecting

* failure: AP not found

Note this works in v4.4
device logs with debug level on wifi 241105_wifi_scanok_connect_not_found.txt
File .pcapng 241105_wifi_scanok_fail_notfound.txt
MAC AP: d2:0f:7d;8a:8b:29 MAC DEV: AC:17:54:00:0A:B0

In this log, the scan is fail, it didn't find the specific AP, if founds, the log will show D (398716) wifi:ap found, mac=d2:0f:7d:8a:8b:29 . The log wifi:connect status 0 -> 1 is only used to represent some internal status and is not a sign of starting a connection. The sign of starting a connection is: wifi:state: init -> auth (0xb0)

I understand it's not finding the AP, but it found it 5 seconds before during the scan, how can the device stop seeing an AP less than 1 meter away in 2 consecutive scans?

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK

  1. According to the packets, the wifi:state: run -> init (0x7a0) issue seems related to the NULL data sent by STA. The following wifi lib contains a bug fix, please test again.
    wifi_lib_1106_s3.zip
    wifi firmware version: 3eceb8154

Screenshot from 2024-11-06 16-51-16

  1. The wifi:state: assoc -> init (0x3510) indicates that STA has received an associate response with a status code of invalid PMKID.
    If the STA successfully connects to the AP, it will cache the PMK for this time. When reconnecting next time, if it still connects to the exact same AP, it will directly use the cached PMK, which can reduce the computation time of the STA's PMK and speed up the connection. But if the AP finds that the PMK has expired, it will return such an error.

image

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 6, 2024

@zhangyanjiaoesp do you know when the nulls are sent? because i don't always manage to reproduce the issue (i want to reproduce it before testing your library)

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 6, 2024

@zhangyanjiaoesp with the new library for now i'm not able to replicate the issue, but again, it doesn't always happens.. i'll try some more

@zhangyanjiaoesp
Copy link
Collaborator

@zhangyanjiaoesp do you know when the nulls are sent? because i don't always manage to reproduce the issue (i want to reproduce it before testing your library)

When the STA receives the association response, the hardware responds with an ACK to the AP and simultaneously begins sending NULL data. This is a feature added to address the issue where some APs may not be able to send EAPOL packets in a timely manner. However, if the hotspot does not receive the STA's ACK in time, it will continuously retransmit the association response. When the hotspot receives the NULL data, it will send a disassociation (disassoc) message to the STA, as described in the scenario above. If the hotspot is able to quickly receive the STA's ACK response, the NULL data sent by the STA will not affect the Wi-Fi connection, as shown in the diagram below:
image

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 7, 2024

@zhangyanjiaoesp ok so the way to reproduce is most probably a busy channel.

The fix that you sent me is already final? Or it just temporarely removed the NULL data?
Do you need more logs?

I'm not sure this covers also the AP not found error seen some of the times.

@zhangyanjiaoesp
Copy link
Collaborator

The fix that you sent me is already final? Or it just temporarely removed the NULL data? Do you need more logs?

The Wi-Fi library I provided does not delete the NULL data, but instead delays sending it for a period of time. It is not the final version, but it will not differ significantly from the final version.

I'm not sure this covers also the AP not found error seen some of the times.

Are you saying that after using the new wifi lib, the AP not found issue has been resolved?

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 7, 2024

Are you saying that after using the new wifi lib, the AP not found issue has been resolved?

I have no proof for that, I will try some more in the next days when i can find a busy network in the office

@zhangyanjiaoesp
Copy link
Collaborator

I think the new wifi lib will not affect the scan results, as it only modifies the connection process code. Scanning occurs before the connection process.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Nov 20, 2024
@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
Hello, apologies for the inconvenience. After our internal discussion, we believe the original behavior is reasonable. When the device receives the association response, it replies with a hardware ACK and sends a NULL data frame. The issue lies in the fact that the mobile hotspot did not receive the hardware ACK and kept retransmitting the association response, even though it was able to receive the NULL data frame from the device and send the disassociation frame. Clearly, there seems to be a performance issue with the hotspot. If I delay sending the NULL data, it might make the device's behavior appear unusual, and if the device does not send the NULL data, the original issue cannot be resolved.

When the STA receives the association response, the hardware responds with an ACK to the AP and simultaneously begins sending NULL data. This is a feature added to address the issue where some APs may not be able to send EAPOL packets in a timely manner.

Would it be possible for you to try a different mobile hotspot?

@KonssnoK
Copy link
Contributor Author

hi @zhangyanjiaoesp ,
i did the following:

  • Update to 5.3.2, latest commit 083aad9
  • Retried with the pixel8: Still failing, attaching a log
    24121701_wifi_fail.txt
  • Retried with a samsun S21+: seemed to work ok

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
Hi, have you seen my comment above? The previous fix was not merged into the IDF because we determined that the issue was related to the mobile hotspot. We recommend that you try using a different phone.

I see that you have switched to a new phone, and it is working properly.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Dec 18, 2024

Hi @zhangyanjiaoesp,
yes i read your previous comment.
Once again, our issue is that we cannot force installers to use specific phones when using our devices.
If this issue happens with the pixel phone it could happen with other phones or even WIFI routers.

Broad compatibility is what we are looking for.

@zhangyanjiaoesp
Copy link
Collaborator

We understand that you want the product to have better compatibility, but it is not feasible for us to ensure compatibility with all devices. This is because some hotspots or routers exhibit behavior that is either unreasonable or even non-compliant with protocol specifications. What we can do is improve our handling when our response to reasonable behavior on the other end is inadequate. However, if the issue clearly lies with the other device, making adjustments to accommodate such devices seems unreasonable. Moreover, attempting to ensure compatibility with such problematic devices could potentially introduce new issues, leading to incompatibility with other devices.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
So, do you agree to switch to a different phone? If yes, can we close this issue?

@AxelLin
Copy link
Contributor

AxelLin commented Dec 19, 2024

@KonssnoK Hi, have you seen my comment above? The previous fix was not merged into the IDF because we determined that the issue was related to the mobile hotspot. We recommend that you try using a different phone.

I see that you have switched to a new phone, and it is working properly.

@zhangyanjiaoesp Why v4.4 does not has such issue?

@zhangyanjiaoesp
Copy link
Collaborator

@AxelLin Please take a look at thess comments: #14801 (comment), #14801 (comment) . v5.3 includes changes related to null data, but v4.4 does not.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Dec 20, 2024

@KonssnoK So, do you agree to switch to a different phone? If yes, can we close this issue?

well i'll continue to use both phones.

Could you give me a small technical summary of the problem so that i can open an issue to google?

Thanks

@AxelLin
Copy link
Contributor

AxelLin commented Dec 20, 2024

@KonssnoK

Just curious if you can connect to pixel 8 using examples/wifi/getting_started/station example on esp-idf-5.3.1?
(This test excluding BT/wifi:Coexist issue.)

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
The summary is here:
image

This log indicates that the STA disconnected from the AP due to receiving a disassociation packet sent by the AP. The disassociation packet contains a reason code of 7 (Class 3 frame received from a non-associated STA). The Wireshark packet capture is as follows:
image

When the STA receives the association response, it replies with a hardware ACK and sends a NULL data frame. The issue lies in the fact that the mobile hotspot did not receive the hardware ACK and kept retransmitting the association response, however it was able to receive the NULL data frame from the STA and send the disassociation frame.

The normal packet capture is as follows:
image

The hotspot successfully receives the ACK, and the two devices communicate normally.

(By the way, The screenshots above are all from your previous packet captures, and the captures do not include MAC ACK packets. If you can perform a new capture and include the MAC ACK packets, it would likely be more convincing. The MAC ACK packet is similar to the packet captures with a blue background shown below.)
Screenshot from 2024-12-23 10-15-41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Reviewing Issue is being reviewed
Projects
None yet
Development

No branches or pull requests

4 participants