-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esp_modem: silent disconnection leads to endless OTA (IDFGH-10572) #322
Comments
Here are the related esp logs:
|
I've activated DEBUG log level for more components in the following logs: I have the impression that the more component with DEBUG log level are activated, the earlier this silent disconnection issue happens. Could the PPP silent disconnection be due to cpu being busy doing other things (like printing debug logs to uart) and then HTTP connection timeouts? |
Also note that when I change my code to connect through WiFi instead of LTE with BG95 (esp_modem), the OTA always succeeds and is much faster to complete. |
@bbinet Which IDF version are you using? Are you also seeing this issue on IDF v5.0 or v5.1 (if it's easy for you to switch and check)? I could only reproduce the problem when I physically disconnect the UART wires from my modem. When connected normally, it shouldn't happen and and the disconnection should be reported to the http client and OTA layers. By silent disconnection, you mean that the PPP interface is still considered up, so you receive no event about the disconnection? |
@david-cermak I was using a specific commit between IDF v5.0 and v5.1, but I've just updated to v5.1, and I still face the same issue. By silent disconnection, I mean that I don't receive any connection error, and OTA is still trying to continue indefinitly, so I suppose that there is an issue with the PPP connection but it is still considered up. I will have a look to LCP keepalive, and report here when ready. No, the modem does not restart during connection: I have checked its rx/tx line of the modem uart, and the modem just stops sending data to the esp, and I can't see any visible error on uart lines. I also tried to increase baudrate speed to 460800 for the modem uart just to see if the OTA could last shorter, hoping to reduce the risk of OTA failure, but same issue again. Also, I noticed that when trying to connect to MQTT server, it regularly fails in the TLS handshake phase:
I think this is a different issue, but this is still an issue with the modem only (I don't see such handshake issues while connecting through wifi). |
@david-cermak I've just implemented LCP keepalive, and it seems the PPP session is still up (I don't reveive any PPP STATUS about ppp termination...) Here are the full logs: https://gist.githubusercontent.com/bbinet/b8e7621aed9bdc1183a7b92a05c5dcf9/raw/1dc42a316193717f33653ac869bc5e8d0d4714d9/uart.log If PPP is still up, I don't understand why I can't get any more data for OTA and why MQTT is disconnected. |
Could you please also enable What I find really strange is that the MQTT client got disconnected, while OTA still keeps trying. This definitely looks like a bug, but need to find out if it's in the networking layers or in OTA. |
PPP_DEBUG was already enabled. Here are new logs with all DEBUG logs activated except for some verbose components (nvs, gdma, CMUX): Also, here is my sdkconfig file if you want to try to reproduce the issue: https://gist.githubusercontent.com/bbinet/b0cf418a48f5851ab852d6f88d8c6dff/raw/b272e8197bc6316e3a73ec56b965f307cd148779/sdkconfig I kept MQTT task running for now as it's not easy to turn off for me because I use MQTT to trigger the OTA update. I'll send you an update with MQTT task disabled soon. I think that the MQTT client got disconnected because the MQTT keepalive timeout expires at MQTT level (see log: "mqtt_client: No PING_RESP, disconnected"). |
interesting, it looks like the link is still up and running, so the LCP echo replies are still being sent by the device.
Oh, okay. makes sense. To workaround this issue, you can enable TCP keepalive checks for OTA as well, here an example: |
I've just checked and I've already implemented a temporary workaround for the endless OTA: I've setup a timer to restart the ESP after 30min. Do you have any idea why the connection may not be working whereas the PPP link is still up? |
Weird, this line should've appeared in your log, but I didn't see it anywhere...
The TCP level keepalive mechanism should also work, but need to understand what's wrong in your configuration.
No idea, one theory could be that the internet connectivity got broken somewhere upstream? In that case you would be able to ping your remote IP, but cannot access the OTA endpoint. |
You're right, this line does not appear, so the tcp keepalive is finally not enabled, but I've double checked, and the
Ok, I will try to use my 1NCE SIM card in a 4G Mobile WiFi Hot Spot product to see if it comes from my upstream 1NCE operator, or if it was really an issue with my own esp+bg95 modem. |
So I've just tried to insert my SIM card in a 4G modem and then connect my esp32c3 to the 4G modem through WiFi: everything is running fine and OTA performs with a much higher data rate than when using my BG95 with esp_modem. |
Some updates about these issues (sorry for the late reply):
diff --git a/components/esp_http_client/esp_http_client.c b/components/esp_http_client/esp_http_client.c
index 0deb011253..f10b487224 100644
--- a/components/esp_http_client/esp_http_client.c
+++ b/components/esp_http_client/esp_http_client.c
@@ -692,6 +692,14 @@ esp_http_client_handle_t esp_http_client_init(const esp_http_client_config_t *co
goto error;
}
+ if (config->keep_alive_enable == true) {
+ client->keep_alive_cfg.keep_alive_enable = true;
+ client->keep_alive_cfg.keep_alive_idle = (config->keep_alive_idle == 0) ? DEFAULT_KEEP_ALIVE_IDLE : config->keep_alive_idle;
+ client->keep_alive_cfg.keep_alive_interval = (config->keep_alive_interval == 0) ? DEFAULT_KEEP_ALIVE_INTERVAL : config->keep_alive_interval;
+ client->keep_alive_cfg.keep_alive_count = (config->keep_alive_count == 0) ? DEFAULT_KEEP_ALIVE_COUNT : config->keep_alive_count;
+ esp_transport_ssl_set_keep_alive(ssl, &client->keep_alive_cfg);
+ }
+
if (config->crt_bundle_attach != NULL) {
#ifdef CONFIG_MBEDTLS_CERTIFICATE_BUNDLE (the problem is that the keepalive was enabled correctly for TCP transport, but wasn't enabled for SSL transport which you're using) |
Foundation transport contained TCP properties for both TCP and SSL transport, so it was enough to set the TCP connection properties (keepalive, interface binding) to one transport only. After merging 5778a7c we have separate TCP properties for these transports and need to set the same for both. This commit also fixes unnecessary allocation of 1 more byte for if_name Closes espressif/esp-protocols#322
Foundation transport contained TCP properties for both TCP and SSL transport, so it was enough to set the TCP connection properties (keepalive, interface binding) to one transport only. After merging 5778a7c we have separate TCP properties for these transports and need to set the same for both. This commit also fixes unnecessary allocation of 1 more byte for if_name Closes espressif/esp-protocols#322
Foundation transport contained TCP properties for both TCP and SSL transport, so it was enough to set the TCP connection properties (keepalive, interface binding) to one transport only. After merging 5778a7c we have separate TCP properties for these transports and need to set the same for both. This commit also fixes unnecessary allocation of 1 more byte for if_name Closes espressif/esp-protocols#322
Foundation transport contained TCP properties for both TCP and SSL transport, so it was enough to set the TCP connection properties (keepalive, interface binding) to one transport only. After merging 5778a7c we have separate TCP properties for these transports and need to set the same for both. This commit also fixes unnecessary allocation of 1 more byte for if_name Closes espressif/esp-protocols#322
Foundation transport contained TCP properties for both TCP and SSL transport, so it was enough to set the TCP connection properties (keepalive, interface binding) to one transport only. After merging 5778a7c we have separate TCP properties for these transports and need to set the same for both. This commit also fixes unnecessary allocation of 1 more byte for if_name Closes espressif/esp-protocols#322
Answers checklist.
General issue report
Context: We have some battery powered esp32c3 systems which use Quectel BG95-M3 modem to publish sensor data though MQTT.
I've noticed that when pushing OTA updates (using esp_https_ota component), the OTA update often fails and drain the battery because BG95 modem remain powered until the OTA ends, but it never ends...
Looking at the DEBUG logs of esp_https_ota component, the OTA seems blocked in an endless loop saying "ESP_ERR_HTTP_EAGAIN invoked: Call timed out before data was ready".
This is probably caused by a silent PPP disconnection.
So I think I have multiple issues here:
AT+CSQ
reports a value of 19). Any idea how can I debug the root cause of these disconnections?I've come across issue #287: may it be related to this one?
The text was updated successfully, but these errors were encountered: