Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32s3 heap capability allocator leads to memory fragmentation. (IDFGH-12587) #13588

Closed
3 tasks done
PaulMartinsen opened this issue Apr 10, 2024 · 19 comments
Closed
3 tasks done
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@PaulMartinsen
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Please rethink the capability heap's allocation strategy.

Currently malloc appears to allocate from the largest heap first. This led to a fragmentation related allocation fault. It would seem better to place smaller objects in smaller heaps first by default. I think the largest heap is just the piece of contiguous memory left after statically scoped variables are placed by the complier.

Details

Yesterday I ran out of heap memory. More precisely, I had about 100k free of MALLOC_CAP_8BIT capable memory, needed 36k but the largest free block was only 34k. I was sad.

It appears the ESP32s3 (and probably other devices):

  1. has five heaps: 1 in PSRAM, 3 in internal RAM (two of which appear equivalent) and 1 in RTC fast RAM.
  2. the internal RAM heaps were roughly 200k, 20k and 32k. The 200k and 20k are in SRAM1 while the 32k one is in SRAM2.
  3. malloc fills up the 200k ram first. This is why I was unable to obtain a 36k block of RAM.

For reference, here's a summary of my memory disposition:
image

Questions

  1. Is it possible, using heap_caps_malloc to place an object in heap 3 or 4 in the diagram above? I can't see any combination of MALLOC_CAP_* flags that would let me do this before heap 2 was too small for the allocation.
  2. Is there any documentation on replacing the default heap strategy with a custom one? I haven't been able to find it yet, unfortunately.
  3. Is it possible for heap 2 and heap 3 to be combined into a single heap? Its not clear to me why these show up as two separate heaps. I saw some kind of post-init cleanup/heap gathering in the source. Is it related to this?
@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 10, 2024
@github-actions github-actions bot changed the title ESP32s3 heap capability allocator leads to memory fragmentation. ESP32s3 heap capability allocator leads to memory fragmentation. (IDFGH-12587) Apr 10, 2024
@SoucheSouche
Copy link
Collaborator

Hi @PaulMartinsen,
thanks for reaching out to us concerning this problem. Let me first address your questions.

  1. It should be possible to allocate from either heaap2/3 or heap4 based on the CAPS from the table you shared. But unfortunately choosing between heap 2 and heap 3 is not.
  2. you can't currently replace the heap allocation strategy but you can create a new heap at runtime in e.g., heap2 and then allocate in your newly created heap region. For that you can use heap_caps_add_region_with_caps() in esp_heap_caps_init.h. That can be a workaround that gives you the choice of where to allocate.
  3. Upon startup a part of the memory has to be reserved because it is used as a stack. After the system has started, this reserved area is no longer needed and can be then used as heap. The currently implemented solution is to create heaps where possible upon startup and leave out those reserved area. When those areas are no longer needed as stacks, the heap component creates a new heap in place of those reserved areas. That is why you can observe that the RAM memory region is split into 2 different heap even though it could technically be covered by 1 heap only.

I am currently working on an alternative approach to the startup stack problem that will result in creating a single heap per memory region. That will solve the issue you are facing at the moment.

@PaulMartinsen
Copy link
Author

Thanks @SoucheSouche for your quick and helpful comments. Particularly for clarifying where heap 3 comes from. Your new approach to join heaps 2 & 3 sounds great and a lot better than allocating into heap 3 explicitly (which would be gone with the new approach). I think this will be a big step forward for us.

When I enabled support for malloc to use PSRAM (possibly a temporary work-around), I noticed there's an option to reserve a section of internal ram that can be used by the capability allocator and is not used by malloc. This does go into heap 2/3. I don't think we will need this, but I did wonder if having an option to place this in "heap 4" would be straight forward and worthwhile.

Thanks again for your comments and work on the memory management.

@SoucheSouche
Copy link
Collaborator

@PaulMartinsen, no problem. we can leave this ticket open and I will link it to the merge request for this heap initialization update so it gets closed when the new implementation gets merged in master.

The modifications are quite big so it might take a little while before I can merge.

@JamieDriver
Copy link

JamieDriver commented May 29, 2024

Also seeing similar issues - unable to start camera because it wants to make a ~30kb dma-capable allocation, but largest single block of dram is only ~28kb, even though >80kb free dram.

esp32 classic does not have this problem afaics because the smaller heaps are first, so get used first, leaving most of the final (largest) heap free.
esp32s3 has the largest heap first, so that gets (partially) used, leaving part of it, and the smaller heaps, free - but no single large block available.

Reordering the [equivalent DRAM] heaps such that the smaller ones are searched/used before the larger would be a huge win, I'm sure.

@AxelLin
Copy link
Contributor

AxelLin commented Jul 2, 2024

@SoucheSouche How is the status of this fix? espressif/tlsf#2

@AxelLin
Copy link
Contributor

AxelLin commented Sep 15, 2024

@SoucheSouche How is the status of this fix? espressif/tlsf#2

@SoucheSouche When will esp-idf update tlsf submodule to get above fix? Will you update tlsf for release branches as well?

@SoucheSouche
Copy link
Collaborator

SoucheSouche commented Sep 16, 2024

@AxelLin, the MR to update the tlsf submodule is created and under review now. No backport can be done for espressif/tlsf#2 because backporting implies patching the ROM version of the TLSF and since this PR updates static inline functions, it simply can't be done.

@PaulMartinsen the new implementation of the heap init process is stalled because of ROM TLSF backport issue as well. In the mean time, I will update the heap init to sort the heaps by increasing size to fix your issue and backport this fix.

@SoucheSouche
Copy link
Collaborator

@PaulMartinsen, I created a fix that sorts the heap by increasing size. It is currently under review.

@AxelLin
Copy link
Contributor

AxelLin commented Oct 17, 2024

@AxelLin, the MR to update the tlsf submodule is created and under review now. No backport can be done for espressif/tlsf#2 because backporting implies patching the ROM version of the TLSF and since this PR updates static inline functions, it simply can't be done.

@SoucheSouche
I don't quite understand this.

I notice the v5.4 branch includes the fix espressif/tlsf#2 .
Now I'm using v5.2 branch.
If rebuild my application using v5.4 branch and OTA firmware on the device, will it work?

@AxelLin
Copy link
Contributor

AxelLin commented Nov 2, 2024

@PaulMartinsen, I created a fix that sorts the heap by increasing size. It is currently under review.

@SoucheSouche How is the status now? Could you share the commit for this fix?

@SoucheSouche
Copy link
Collaborator

@AxelLin, the fix has been reviewed and will be merged soon. The commit will be made available when the work is merged and synced with GH.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: Opened Issue is new labels Nov 21, 2024
@AxelLin
Copy link
Contributor

AxelLin commented Nov 24, 2024

@AxelLin, the fix has been reviewed and will be merged soon. The commit will be made available when the work is merged and synced with GH.

I won't call it soon if it takes more than 3 weeks.
You mentined you have created a fix 2 Months ago (#13588 (comment)), but the fix/review is so slow.
And it will take even more time for backporting to stable branches.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels Nov 25, 2024
@AxelLin
Copy link
Contributor

AxelLin commented Jan 4, 2025

@PaulMartinsen In the mean time, I will update the heap init to sort the heaps by increasing size to fix your issue and backport this fix.

@SoucheSouche
It's 2025 now, the fix is not yet available in any release branches.

@SoucheSouche
Copy link
Collaborator

SoucheSouche commented Jan 8, 2025

@AxelLin, the update to sort heaps by increasing size has been merged on December 3rd.
Since sorting the heaps by size is not fixing a bug but rather changing the limitations of the heap component, it has not been backported.

@AxelLin
Copy link
Contributor

AxelLin commented Jan 8, 2025

@AxelLin, the update to sort heaps by increasing size has been merged on December 3rd. Since sorting the heaps by size is not fixing a bug but rather changing the limitations of the heap component, it has not been backported.

Seems you disagree with your own comment: #13588 (comment)
"In the mean time, I will update the heap init to sort the heaps by increasing size to fix your issue and backport this fix."

@SoucheSouche
Copy link
Collaborator

@AxelLin, yes, and after updating the component and submitting the work for review, it was agreed that the changes do not represent a bugfix so that is the reason why the changes were not backported.

@AxelLin
Copy link
Contributor

AxelLin commented Jan 8, 2025

@AxelLin, yes, and after updating the component and submitting the work for review, it was agreed that the changes do not represent a bugfix so that is the reason why the changes were not backported.

You mean #13588 (comment) is not a bug ?

@SoucheSouche
Copy link
Collaborator

@AxelLin, yes, and after updating the component and submitting the work for review, it was agreed that the changes do not represent a bugfix so that is the reason why the changes were not backported.

You mean #13588 (comment) is not a bug ?

Sorting heaps by increasing size does mitigate certain limitations of the heap component implementation but it does not fix an existing bug because the heap component is perfectly functional with unsorted heaps.

Running out of memory is not a bug.
heap_caps_get_largest_free_block(MALLOC_CAP_DMA) returns 28Kb so allocating 30Kb is expected to fail.
A bug would be heap_caps_get_largest_free_block(MALLOC_CAP_DMA) returning 35Kb and allocating 30Kb of memory fails.

@AxelLin
Copy link
Contributor

AxelLin commented Jan 9, 2025

@AxelLin, yes, and after updating the component and submitting the work for review, it was agreed that the changes do not represent a bugfix so that is the reason why the changes were not backported.

You mean #13588 (comment) is not a bug ?

Sorting heaps by increasing size does mitigate certain limitations of the heap component implementation but it does not fix an existing bug because the heap component is perfectly functional with unsorted heaps.

Running out of memory is not a bug.

OOM is not a bug, but a series of small allocations causes the bigger allocation failure is a bug.

Your commit log mentioned what was fixed:
This prevents the scenario where a series of small allocations would
crowd the bigger heaps making any bigger allocation fail for lack of space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

5 participants