Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDNS Implementation can result in unbounded memory consumption. (IDFGH-12294) #13333

Open
3 tasks done
not-my-real-name opened this issue Mar 8, 2024 · 4 comments
Open
3 tasks done
Assignees
Labels
Resolution: Done Issue is done internally Status: Selected for Development Issue is selected for development

Comments

@not-my-real-name
Copy link

not-my-real-name commented Mar 8, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Hi Everyone,

I am using a lightly modified version of IDF 4.4.5 and have run into an issue with the MDNS implementation. Currently the MDNS implementation is unchanged from 4.4.5 in my repository.

Basic Summary of the issue:
Device on a busy network with lots of MDNS queries consumes an unlimited amount of RAM.

Expected behaviour:
Device shouldn't run out of RAM on a busy network.

Background
I found the issue because the PC companion app for this product recently added an MDNS service discovery that I perhaps made a little aggressive. With multiple copies of the application open and multiple network interfaces (wifi + ETH) on the PC the device was seeing a lot of MDNS queries per second (e.g. 30). It would run out of RAM in a few seconds unless I disabled MDNS.

I realise this much network load is an edge case, but it resulted in unrestrained memory consumption and would be a pretty effective DOS vector. I tried altering the MDNS implementation to use external SPI RAM but even with this setup it still consumed an unbounded amount of system memory from the external RAM. I gave up at about 500KB :)

I have since found and patched the root issue in the MDNS implementation. I'll try and describe the issue and my solution here:

As I understand it, the MDNS component uses a queue with a default of 16 items to process actions (basically an event queue). Actions are created when MDNS queries come in from the network and they generate a scheduled TX packet (reply) to go back to the network interface. This is added to a TX packet queue (simple linked list) that has no bounds on upper queue size (this is where all the memory goes).

The trouble really starts with the way the TX packet queue gets emptied. It is checked on a timer (100ms default) to see if there are any TX packets that have passed a scheduled timestamp and are ready to get sent. But it only sends ONE per loop even if multiple packets are past the scheduled time when they should be sent.

This means if events are coming in off the network faster than 10x per second tx packets can get generated faster than they can be sent out and you end up with a snowball effect and no more memory.

I confirmed this by removing the MDNS queries (killed the PC apps) and the memory usage slowly returned to normal. (Confirmed on wireshark too).

Proposed Solution
I have included a patch file that has two changes. One is an adaptation to the MDNS component to add a config option to allow the MDNS library to allocate memory from external RAM. The more significant change is to send as many packets (TX Actions) as will fit in the queue every time the tx packet queue is checked instead of just one.

I think this should do two things:

  1. This should massively speed up the TX response time as if several replies are ready to send they will all be sent closer to the scheduled time.
  2. If it is still not fast enough (>16 messages per 100ms loop) the action queue will be mostly full of TX Actions and this will block new RX actions from coming in and creating more TX packets stopping the memory loss. Incoming queries will get dropped, but memory usage is capped.

[EDIT] Fixed minor issue in patch file.
mdns_memory_fix.patch

I'd be very grateful if someone who knows a bit more about this than me can have a look at this issue and determine if this solution and my assessment of the issue is valid.

My Setup:
IDF4.4.5 (with some minor tweaks and fixes - no changes to MDNS component.)
ESP32-WROVERIE
Custom hardware (well established commercial product)
C# PC app. Actively submitting MDNS queries for a few different records. Multiple copies open leading to high MDNS traffic.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Mar 8, 2024
@github-actions github-actions bot changed the title MDNS Implementation can result in unbounded memory consumption. MDNS Implementation can result in unbounded memory consumption. (IDFGH-12294) Mar 8, 2024
@david-cermak
Copy link
Collaborator

Hi @not-my-real-name

Thanks for the detailed analysis and proposing a solution.
I took a quick look and your patch looks okay to me, would you mind posting it as a PR in https://github.com/espressif/esp-protocols/pulls where mDNS library is currently developed? It would be much easier for the code owners to review and suggest changes. (at this point I'd only suggest that you split the patch into two separate commits: one with the malloc() changes and another with the TX packet handling).

If it is still not fast enough (>16 messages per 100ms loop) the action queue will be mostly full of TX Actions and this will block new RX actions from coming in and creating more TX packets stopping the memory loss. Incoming queries will get dropped, but memory usage is capped.

It's probably a bit more complicated, as the memory would probably get consumed by the incoming Rx packets (which run separably from the action queue), but it is certainly a big improvement compared to the current status.

@not-my-real-name
Copy link
Author

Hi David,

Thanks for looking into the code. I had a go at trying to make a pull request but I couldn't directly apply my patch to the master branch after I cloned the protocols repo for some reason. Sorry, I'm a little new to the PR process.

I think the code is also at a different point to that included in IDF4.4.5 and I'm not sure how to find the correct point in the protocols repo? I guess the idea is to apply a similar fix to the latest point on the master branch of the protocols repo but as I am on the clock for work I don't have time allocated to go through and figure it out again for the new code point and make sure it all works correctly right now.

Maybe someone from the ESP dev team can put this in the backlog to have a look at at some point? Or someone else from the community might be feeling up to the challenge. :) Sorry I can't be more help!

@david-cermak
Copy link
Collaborator

Thanks for the reply and checking the PR process. I would probably then apply similar changes to the lasted mdns. I think the while loop in the Tx packet processing is more important, so I'll push a PR soon. As for the allocation option, we'd add it to our backlog (perhaps we might like to address this more systematically, e.g. use some malloc hooks for each components...)

PS: if you still feel like posting a simple PR with the Tx update (which is a quick oneliner IMO, even with the newer mdns), we'd like to hear from you so can take credit for the change.

david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Mar 20, 2024
david-cermak added a commit to david-cermak/esp-protocols that referenced this issue Mar 20, 2024
@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Mar 20, 2024
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Selected for Development Issue is selected for development labels Apr 18, 2024
@david-cermak
Copy link
Collaborator

Update:

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Done Issue is done internally labels May 2, 2024
kyvaith pushed a commit to kyvaith/mdns that referenced this issue Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Selected for Development Issue is selected for development
Projects
None yet
Development

No branches or pull requests

5 participants