Skip to content
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.

[BUG] ESP8266 crashes if different pages are loaded on high frequency #70

Closed
lumapu opened this issue Aug 5, 2024 · 26 comments
Closed
Labels
bug help wanted Extra attention is needed platform:8266

Comments

@lumapu
Copy link

lumapu commented Aug 5, 2024

Please make sure to go through the recommendations before opening a bug report:

https://github.com/mathieucarbou/ESPAsyncWebServer?tab=readme-ov-file#important-recommendations

done, set -D CONFIG_ASYNC_TCP_STACK_SIZE=4096 without any change

Description

I'm the maintainer and developer of AhoyDTU https://github.com/lumapu/ahoy. This project has some configuration pages, which are mostly communicating using AJAX.
As some of the users mention that the ESP8266 is really unstable with other forks of the AsyncWebserver, I wanted to try this fork.

To produce the issue you simply have to click in the menu 2-3 times on a high frequency to get the ESP crashed.

Board

ESP8266 Wroom

Ethernet adapter used ?

no

Stack trace

I see two different behaviors:

Trace 1
0x4022cc39 in std::_Function_handler<void (void*, AsyncClient*), AsyncEventSourceClient::AsyncEventSourceClient(AsyncWebServerRequest*, AsyncEventSource*)::{lambda(void*, AsyncClient*)#4}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at AsyncEventSource.cpp:?
0x401001e5 in std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at ??:?
0x40229dcc in AsyncClient::_close() at ??:?
0x4022a078 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at ??:?
0x40255abc in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:542 (discriminator 1)
0x4025aaf9 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210  
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x40277438 in ppRecycleRxPkt at ??:?
0x40251b81 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x40251994 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118    
 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x4027a5fd in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365   
0x4027a60f in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373   
0x40277067 in ppPeocessRxPktHdr at ??:?
0x4027b8f3 in ets_snprintf at ??:?
0x40105c4d in call_user_start_local at ??:?
0x40105c53 in call_user_start_local at ??:?
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at ??:?
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x40238b2c in esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter::addDomainCacheItem(void const*, bool, unsigned short) at ??:?
0x40101050 in malloc at ??:?
0x4023ff0b in operator new(unsigned int) at ??:?
0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:?
0x40239a2a in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSRRDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNS_RRDomain const&, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239be4 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:?
0x40239752 in esp8266::MDNSImplementation::MDNSResponder::_udpAppend8(unsigned char) at ??:?
0x4023986a in esp8266::MDNSImplementation::MDNSResponder::_write8(unsigned char, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239b99 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x402781bb in pp_attach at ??:?
0x4027820a in pp_attach at ??:?
0x40278316 in pp_attach at ??:?
0x402772cb in ppTxPkt at ??:?
0x4026050f in ieee80211_output_pbuf at ??:?
0x4010618f in wdt_feed at ??:?
0x40251581 in glue2esp_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:301
0x402517af in new_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:272  
0x40251c0e in ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:312
0x402592a5 in etharp_output_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c:897
0x40100588 in ets_post at ??:?
0x40102ce5 in rcUpdateTxDone at ??:?
0x401028b4 in pp_post at ??:?
0x40100588 in ets_post at ??:?
0x40100588 in ets_post at ??:?
0x40102ce5 in rcUpdateTxDone at ??:?
0x401028b4 in pp_post at ??:?
0x40100588 in ets_post at ??:?
0x40102ce5 in rcUpdateTxDone at ??:?
0x401028b4 in pp_post at ??:?
0x40100588 in ets_post at ??:?
0x4010343f in rcReachRetryLimit at ??:?
0x401028b4 in pp_post at ??:?
0x40105b4b in lmacRxDone at ??:?
0x4010361c in rcReachRetryLimit at ??:?
0x4010343f in rcReachRetryLimit at ??:?
0x401028b4 in pp_post at ??:?
0x4010361c in rcReachRetryLimit at ??:?
0x40103ad6 in wDev_ProcessFiq at ??:?
0x40100588 in ets_post at ??:?
0x401063e5 in ets_timer_disarm at ??:?
0x40100588 in ets_post at ??:?
0x4010101f in free at ??:?
0x4010101c in free at ??:?
0x40228584 in Communication::loop()::{lambda(bool, CommQueue<(unsigned char)100>::queue_s const*)#1}::operator()(bool, CommQueue<(unsigned char)100>::queue_s const*) const at ??:?
0x4021344b in std::queue<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s, std::deque<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s, std::allocator<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s> > >::queue<std::deque<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s, std::allocator<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s> >, void>() at ??:?
0x40100574 in ets_post at ??:?
0x401067de in system_get_time at ??:?
0x40100588 in ets_post at ??:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x4010101c in free at ??:?
0x4021b0aa in std::deque<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s, std::allocator<PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::message_s> >::~deque() at ??:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x4010101c in free at ??:?
0x40225966 in PubMqtt<HmSystem<(unsigned char)4, Inverter<float> > >::loop() at ??:?
0x4020cbae in ah::Scheduler::checkTicker() at ??:?
0x40100169 in std::function<void (bool, CommQueue<(unsigned char)100>::queue_s const*)>::operator()(bool, CommQueue<(unsigned char)100>::queue_s const*) const at ??:?
Trace 2
0x4023ff20 in operator new(unsigned int) at ??:?
0x4022f6d2 in AsyncWebHeader& std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::emplace_back<String const&, String const&>(String const&, String const&) at ??:?
0x4023ed55 in String::copy(char const*, unsigned int) at ??:?
0x4022f7a0 in AsyncWebServerResponse::addHeader(String const&, String const&) at ??:?
0x40222ed2 in RestApi<HmSystem<(unsigned char)4, Inverter<float> > >::onApi(AsyncWebServerRequest*) at ??:?
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x40248b4d in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:196 (discriminator 1)
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x4028e52c in etharp_output at ??:?
0x4028e54c in etharp_output at ??:?
0x4028e564 in etharp_output at ??:?
0x4028e574 in etharp_output at ??:?
0x4028e588 in etharp_output at ??:?
0x4028e5a4 in etharp_output at ??:?
0x4028e5bc in etharp_output at ??:?
0x4028e5cc in etharp_output at ??:?
0x4028e5dc in etharp_output at ??:?
0x4028e5f8 in etharp_output at ??:?
0x4028e620 in etharp_output at ??:?
0x4028e644 in etharp_output at ??:?
0x4028e664 in etharp_output at ??:?
0x4028e664 in etharp_output at ??:?
0x4028e644 in etharp_output at ??:?
0x4028e620 in etharp_output at ??:?
0x4028e5f8 in etharp_output at ??:?
0x4028e5dc in etharp_output at ??:?
0x4028e5cc in etharp_output at ??:?
0x4028e5bc in etharp_output at ??:?
0x4028e5a4 in etharp_output at ??:?
0x4028e588 in etharp_output at ??:?
0x4028e574 in etharp_output at ??:?
0x4028e564 in etharp_output at ??:?
0x4028e54c in etharp_output at ??:?
0x4028e52c in etharp_output at ??:?
0x4024aaf5 in _vsnprintf_r at /workdir/repo/newlib/newlib/libc/stdio/vsnprintf.c:71 (discriminator 4)
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x40101282 in realloc at ??:?
0x40248b4d in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:196 (discriminator 1)
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x4024d598 in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:179
0x4028d2a6 in etharp_output at ??:?
0x40248c7c in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:246
0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232
0x4028d2a6 in etharp_output at ??:?
0x4028d2a8 in etharp_output at ??:?
0x4028d2a6 in etharp_output at ??:?
0x4024d859 in _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:528
0x40101050 in malloc at ??:?
0x40252520 in do_memp_malloc_pool at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c:255
0x4024aaf5 in _vsnprintf_r at /workdir/repo/newlib/newlib/libc/stdio/vsnprintf.c:71 (discriminator 4)
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x40101282 in realloc at ??:?
0x4023ec7e in String::changeBuffer(unsigned int) at ??:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x4023f15a in String::concat(char const*, unsigned int) at ??:?
0x4022ebcc in std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at ??:?
0x4022f924 in AsyncWebServerResponse::_assembleHead(unsigned char) at ??:?
0x402781bb in pp_attach at ??:?
0x402781bb in pp_attach at ??:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x40101282 in realloc at ??:?
0x4026050f in ieee80211_output_pbuf at ??:?
0x4022ec44 in void std::vector<String, std::allocator<String> >::_M_realloc_insert<char const*&>(__gnu_cxx::__normal_iterator<String*, std::vector<String, std::allocator<String> > >, char const*&) at ??:?
0x40101050 in malloc at ??:?
0x4023ed55 in String::copy(char const*, unsigned int) at ??:?
0x4022ec54 in void std::vector<String, std::allocator<String> >::_M_realloc_insert<char const*&>(__gnu_cxx::__normal_iterator<String*, std::vector<String, std::allocator<String> > >, char const*&) at ??:?
0x4022edb6 in AsyncWebServerRequest::addInterestingHeader(char const*) at ??:?
0x4023ed55 in String::copy(char const*, unsigned int) at ??:?
0x4022cc80 in _ZZN21AsyncWebServerRequest28_removeNotInterestingHeadersEvENKUlRK6StringE_clES2_$constprop$0 at WebRequest.cpp:?        
0x4022cd4a in AsyncWebServerRequest::_removeNotInterestingHeaders() at ??:?
0x40230534 in AsyncCallbackWebHandler::handleRequest(AsyncWebServerRequest*) at ??:?
0x4022e4af in AsyncWebServerRequest::_parseLine() at ??:?
0x4022e5ea in AsyncWebServerRequest::_onData(void*, unsigned int) at ??:?
0x4010101c in free at ??:?
0x4022a018 in AsyncClient::_recv(std::shared_ptr<ACErrorTracker>&, tcp_pcb*, pbuf*, int) at ??:?
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x4022a078 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at ??:?
0x40255a65 in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:501 (discriminator 1)
0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210  
0x40252520 in do_memp_malloc_pool at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c:255
0x4025aaf9 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210  
0x40100cce in umm_free_core at umm_malloc.cpp:?
0x40277438 in ppRecycleRxPkt at ??:?
0x40251b81 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x40251994 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118    
 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x4027a5fd in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365   
0x4027a60f in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373   
0x40277067 in ppPeocessRxPktHdr at ??:?
0x4027b8f3 in ets_snprintf at ??:?
0x40105c4d in call_user_start_local at ??:?
0x40105c53 in call_user_start_local at ??:?
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at ??:?
0x4026bbfc in cont_ret at cont.S.o:?
0x4026bbad in cont_continue at cont.S.o:?
0x40100588 in ets_post at ??:?
0x401028b4 in pp_post at ??:?
0x40105b4b in lmacRxDone at ??:?
0x4010343f in rcReachRetryLimit at ??:?
0x4010361c in rcReachRetryLimit at ??:?
0x40103ad6 in wDev_ProcessFiq at ??:?
0x401037f8 in wDev_ProcessFiq at ??:?
0x40100588 in ets_post at ??:?
0x40238b2c in esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter::addDomainCacheItem(void const*, bool, unsigned short) at ??:?
0x40101050 in malloc at ??:?
0x4023ff0b in operator new(unsigned int) at ??:?
0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:?
0x40239a2a in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSRRDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNS_RRDomain const&, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239be4 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:?
0x40239752 in esp8266::MDNSImplementation::MDNSResponder::_udpAppend8(unsigned char) at ??:?
0x4023986a in esp8266::MDNSImplementation::MDNSResponder::_write8(unsigned char, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x40239b99 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:?
0x402781bb in pp_attach at ??:?
0x4027820a in pp_attach at ??:?
0x40278316 in pp_attach at ??:?
0x402772cb in ppTxPkt at ??:?
0x4026050f in ieee80211_output_pbuf at ??:?
0x4010618f in wdt_feed at ??:?
0x402781bb in pp_attach at ??:?
0x40100588 in ets_post at ??:?
0x401063e5 in ets_timer_disarm at ??:?
0x401028b4 in pp_post at ??:?
0x40103938 in wDev_ProcessFiq at ??:?
0x40251581 in glue2esp_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:301
0x40100588 in ets_post at ??:?
0x401028b4 in pp_post at ??:?

Additional notes

I'd like to switch to this AsyncWebserver in future. As my project is multiplatform I already tested with success on ESP32. For now I use the esphome fork, but there the ESP8266 also feels unstable (that is reported by many users).

@lumapu lumapu added the bug label Aug 5, 2024
@vortigont
Copy link
Collaborator

this is probably due to mem shortage and not a bug in server, advise to monitor your heap size and fragmentation.
Or better switch to ESP32.

@mathieucarbou
Copy link
Owner

@lumapu : I know the project (using OpenDTU myself).
The ESP8266 stack traces are quite ugly compared to the ones platformio produced for rsp32.

Questions :

  1. did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate

  2. do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?

@mathieucarbou
Copy link
Owner

mathieucarbou commented Aug 5, 2024

@lumapu correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-)
Ref: https://github.com/lumapu/ahoy/blob/main/src/platformio.ini

@mathieucarbou mathieucarbou added triage and removed bug labels Aug 5, 2024
@lumapu
Copy link
Author

lumapu commented Aug 5, 2024

did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate

no, not directly at this end, but I have some function to read it during operation via API.
For ESP8266 there is the field max_free_block which reads for your fork 9136 bytes and for the esphome 9672 bytes - both after a few clicks in the WebUI. The free heap is in the same region 9600 and 9800.

do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?

checked again the esphome fork. No crash was produceable. Then seconds after a new compiled version with your fork which crashes really fast.

correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-)

Good point. I need to do some basic test before delivering new software to the folks. It was not published using your fork - I have only a feature branch localy.
Yesterday I found your fork and wanted to test it immediately. It works like a charm on ESP32 but on ESP8266 I see some problems.

I really appreciate that you want to maintain the AsyncWebserver and completly read your discussion with @egnor. Some month ago I was searching for a better maintained fork coming from the younodebox one and found esphome. Since yesterday I know yours and hope that I can use it in near future.

@mathieucarbou
Copy link
Owner

mathieucarbou commented Aug 5, 2024

@lumapu thank you for these details!

Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0 in both your tests and you are then just swapping esphome/ESPAsyncWebServer with mathieucarbou/ESPAsyncWebServer, right ?

The Async lib behind stays the same, but just the ESPAsyncWebServer changes ?

Also, if I look at the traces, you are using SSE, not WebSocket, right ?

I suspect that the difference in heap usage is due to the recent change from @vortigont: the project included a custom-made implementation of a forward linked list, which was replaced with std::list which is bi-directional and allows for constant time additions and removal. So the little heap usage increase is expected.

We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.

If the Async lib stays the same, but only the ESPAsyncWebServer fork is swapped, it would be interesting to find the issue indeed.

The only big changes from ESPHome fork regarding SSE are in commits bb4eb89 and 48968b5 for SSE (@vortigont fyi) - not considering the more common api (request / response / handlers)

@vortigont
Copy link
Collaborator

9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions.
This is just my opinion.

@mathieucarbou
Copy link
Owner

mathieucarbou commented Aug 6, 2024

@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).

@vortigont : what I do not get is why it works with the ESPhome fork. I agree with you that the free memory is too low and this is asking for problems, but the difference between the 2 forks in terms of memory usage is low.

I was wondering if one of the 2 commits could have introduced a side effect not thought of. Sincerely I do not see any right now that is why I was asking for your second option.

@proddy
Copy link

proddy commented Aug 6, 2024

but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.

I agree, and I had to make the same hard choice on my projects. The ESP8266 is 10 years old now and you can easily swap it with an ESP32 for less then 2 euros which has a lot more memory, power, cores etc

@vortigont
Copy link
Collaborator

what I do not get is why it works with the ESPhome fork

that's what I mean - there might be something very specific indeed that could be investigated and even probably fixed or optimized, but to do this on 8266 - nah... have more things to invest time and efforts into :)
As I see from traces it fails on malloc or new and vfprintf around, so the most probable cause is mem constrains indeed, either for heap of for stack. I do not have working SSE example to test on, never used it actually, mostly done the changes heuristically. I can try dig into this a bit, but if some minimal reproducible example code provided.

@mathieucarbou
Copy link
Owner

I can try dig into this a bit, but if some minimal reproducible example code provided.

I agree: without more effort from @lumapu to pinpoint a bit more the issue and have a minimal reproductible use case proving any issue from the library, we cannot do anything but suspect a memory constrain as shown in the stack trace.

@lumapu : you should monitor your free heap at key points where memory is allocated (before these malloc / new / vfprintf calls. CONFIG_ASYNC_TCP_STACK_SIZE is for AsyncTCP, which you are not using since you are on ESP8266. There is no task and stack size to configure.

@mathieucarbou
Copy link
Owner

@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?

I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h

I am willing to help more, but the lack of information you give is not helping ;-)

Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.

So I need to know what exactly you are using for the ESpAsync API.

As I understand right now, your html pages are generated with a python script and their type is const uint8_t {}[] PROGMEM right ?

And you are using beginResponse_P to serve them ?

So the method which is called is:

AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`

which is implemented in ESPHome fork and original repo as:

AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

In our repo, this method is deprecated and redirected:

    [[deprecated("Replaced by beginResponse(...)")]]
    AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) {
      return beginResponse(code, contentType, content, len, callback);
    }

and goes to:

AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) {
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

Can you please have a deeper look at the method signatures used like this example ?

Thanks!

@lumapu
Copy link
Author

lumapu commented Aug 11, 2024

Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0 in both your tests and you are then just swapping esphome/ESPAsyncWebServer with mathieucarbou/ESPAsyncWebServer, right ?

From my understanding this comes with the Webserver, in my ´platformio.inithere is no extra point for this. The only other dependency I can see ishttps://github.com/me-no-dev/ESPAsyncUDP` which is used for NTP.

The Async lib behind stays the same, but just the ESPAsyncWebServer changes ?

I only change line 29 in my platformio.ini which points to the AsyncWebserver repositiory.

Also, if I look at the traces, you are using SSE, not WebSocket, right ?

Not completly shure what you mean, let my discribe how it's done in Ahoy:
Almost all pages are static html which loads the data dynamically using AJAX. Only the webconsole is using a websocket.

We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.

I can try to do so - give me some time - I don't want to waste too much time in ESP8266 (as you also mentioned 😉)

@lumapu
Copy link
Author

lumapu commented Aug 11, 2024

9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.

full ack - the ESP8266 was the chip where I started at and somehow it is possible to run the most recent software of Ahoy on it, but sadly not with this fork. It's not high prio for me but anyway it would be cool if it is supported.
I know that the memory is too low on ESP8266, but this by design, the chip has not more 😉. Web applications alwasys become really big once they need to be nice.

@lumapu
Copy link
Author

lumapu commented Aug 11, 2024

@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).

correct, it's measured and stored until it's transfered to WebUI by JSON-API

@lumapu
Copy link
Author

lumapu commented Aug 11, 2024

@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?

That's not that easy. I random click on different menu items in the WebUI and from time to time it crashes. I can try to do a screen video to describe better.

I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h

I am willing to help more, but the lack of information you give is not helping ;-)

I'm sorry for that - I will help as much as I can. You guys are that fast - I really apreciate it. I was talking about the development branch, which is more than 200 commits apart from main: https://github.com/lumapu/ahoy/tree/development03

Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.

Maybe this line: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L378

So I need to know what exactly you are using for the ESpAsync API.

As I understand right now, your html pages are generated with a python script and their type is const uint8_t {}[] PROGMEM right ?

yes that's correct. The python script is used to do some preprocessor and translation things. Also some generic content like menu and footer are included.

And you are using beginResponse_P to serve them ?

yes, I think so: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L248

So the method which is called is:

AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`

which is implemented in ESPHome fork and original repo as:

AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

In our repo, this method is deprecated and redirected:

    [[deprecated("Replaced by beginResponse(...)")]]
    AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) {
      return beginResponse(code, contentType, content, len, callback);
    }

and goes to:

AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) {
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

yes, I was notified by the deprecation and renamed the beginResponse_P calls to beginResponse. Maybe I missed something around this change. Do I need to change anything else than the function name?

Thank you for all your efforts, it feels really professional here

@mathieucarbou
Copy link
Owner

Do I need to change anything else than the function name?

No... Just changing the name is enough. This is the same signature and implementation behind like explained.

@lumapu
Copy link
Author

lumapu commented Sep 2, 2024

I started another (private) project using this AsyncWebserver again. This project does not include websockets for now. Even if I request pages on a high frequency no crash was seen so far.
I will further try to dig around this to get better information.

The behavior feels the same as described in newer issue:

@mathieucarbou
Copy link
Owner

@lumapu : ws implementation in this fork is relying on the std::shared_ptr<std::vector<uint8_t>> mechanism from youbox-node fork which is not in original repo and esphome fork... Maybe a lead ?

@lumapu
Copy link
Author

lumapu commented Sep 3, 2024

@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.

As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:

@mathieucarbou
Copy link
Owner

You are using SSE ?

@mathieucarbou
Copy link
Owner

mathieucarbou commented Sep 4, 2024

@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.

As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:

@lumapu I have included this patch in this version => v3.2.3

@mathieucarbou
Copy link
Owner

Hi @lumapu ,
In latest version, I fixed an issue in the method overload for ESP8266 (regarding the PGM usage).
You were using the methods with const uint8_t* content, not char*, so I guess this fix won't help much, but I wanted to drop a note just in case ;-)

@mathieucarbou
Copy link
Owner

mathieucarbou commented Oct 15, 2024

Hello,

I've just fixed a bug regarding string usages for ESP8266 (long time bug):

https://github.com/mathieucarbou/ESPAsyncWebServer/releases/tag/v3.3.17

If possible, please let me know if it solves the issue....

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@mathieucarbou
Copy link
Owner

Closing as there are no more activity on this issue.
Will reopen if someone it still impacted and willing to troubleshoot the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug help wanted Extra attention is needed platform:8266
Projects
None yet
Development

No branches or pull requests

4 participants