Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/tlsf: Early initialization of memory pool. #12032

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jcarrano
Copy link
Contributor

@jcarrano jcarrano commented Aug 19, 2019

Contribution description

The TLSF allocator needs to be initialized before use. This is an issue whenit is used as a default system allocator since the user expects to be able to call malloc right away. This is made worse by the fact that the C library uses malloc internally to create buffers and that may happen before the user's code has a chance to run.

As a consequence, even doing printf when using USEMODULE=tlsf-malloc will lead to a crash.

A mechanism is needed to:

  1. Initialize the pool early.
  2. Determine which memory should be used as a heap and reserve it.

Initialization

Issue (1) is solved by adding the initializer to the C library's .preinit_array, which is a cross-file array of function pointers that run before the library is initialized -that is before _init(). See the newlib source code for more details.

Getting the heap and its size

Point (2) is important because TLSF dows not support growing the pool, only adding new ones. We would like to initialize it with a pool as big as possible.

Native

In native (2) is handled by defining a static array of fixed size (given by TLSF_NATIVE_HEAPSIZE). Memory is plentiful in native and we down't care about the overhead of zeroing out this array.

Newlib

On embedded targets using newlib (this may be working on other plaforms, I only tested ARM) sbrk() is used to find the start of the heap and reserve it and the _eheap linker symbol is used to determine the end of the usable heap.

An array is a bad choice here because the size would be board dependent and hard to determine without build-system magic and because it would be zeroed by default, making the boot sequence way longer.

sbrk() does nothing more than move a pointer that marks the fraction of the space between _sheap and _eheap that is reserved. Since we are using the whole heap it might be tempting to just use the symbols to derive the pool location and size and to sidestep sbrk(). Especially since the memory allocation functions are expected to be the only users of such a feature. That "trick" would make the OS impossible to debug in case the was a mistake and some of the original allocation functions slipped through non-overriden. If sbrk is used to reserve the entirety of the space then that rogue function will try to call it and fail as no more heap is available. In fact this is how I found out that I was overriding the wrong functions (put a breakpoint int sbrk and show a traceback.) If sbrk is sidestepped one would have nasty and impossible to debug memory corruption errors.

Other

A third option could be to use the heap space directly and not define sbrk. This is beyond the scope of this change, but is probably the route to go for platform that do not define this call (but first do a thoroug investigation of how the libc works in that platform).

Final notes

Messing with the global system allocator is not an easy thing to do. I would say that tslf-malloc is ATM only supported in native and cortex-m.

Testing procedure

Run tests/pkg_tlsf_malloc.

Issues/PRs references

Fixes: #4490
Fixes #5796
Closes: #12021
Depends on: #12031

@jcarrano jcarrano added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: pkg Area: External package ports State: waiting for other PR State: The PR requires another PR to be merged first labels Aug 19, 2019
@miri64
Copy link
Member

miri64 commented Aug 28, 2019

@haukepetersen @benpicco can you have a look?

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick grep through the newlib source code did not reveal any uses of sbrk outside nano-mallocr.c/mallocr.c/sbrkr.c, so this should be fine.

@@ -25,8 +25,24 @@
extern "C" {
#endif

#define ROUND_DOWN4(x) (((x)/4)*4) /* Is this necessary??? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x & ~0x3

@benpicco
Copy link
Contributor

This needs a rebase now.

This test is currently failing because of RIOT-OS#4490, RIOT-OS#5796 and RIOT-OS#12021.

When using TLSF as the system allocator it should be initialized

- Automatically, as that is what the user expects.
- Early in the boot process, since the C library mallocs internal buffers.

Failing to do so will lead to a crash as the issues and this test shows.

The test is blacklisted and will be whitelisted in the next commit with
the fix.
The TLSF allocator needs to be initialized before use. This is an issue when
it is used as a default system allocator since the user expects to be able to
call malloc right away. This is made worse by the fact that the C library uses
malloc internally to create buffers and that may happen before the user's code
has a chance to run.

As a consequence, even doing printf when using USEMODULE=tlsf-malloc will lead
to a crash.

A mechanism is needed to:

1. Initialize the pool early.
2. Determine which memory should be used as a heap and reserve it.

Issue (1) is solved by adding the initializer to the C library's `.preinit_array`,
which is a cross-file array of function pointers that run before the library is
initialized -that is before _init(). See the newlib source code for more details.

Point (2) is important because TLSF dows not support growing the pool, only adding
new ones. We would like to initialize it with a pool as big as possible.

In native (2) is handled by defining a static array of fixed size (given by
TLSF_NATIVE_HEAPSIZE). Memory is plentiful in native and we down't care about
the overhead of zeroing out this array.

On embedded targets using newlib (this may be working on other plaforms, I only
tested ARM) `sbrk()` is used to find the start of the heap and reserve it and
the `_eheap` linker symbol is used to determine the end of the usable heap.
An array is a bad choice here because the size would be board dependent and hard
to determine without build-system magic and because it would be zeroed by default,
making the boot sequence way longer.

sbrk() does nothing more than move a pointer that marks the fraction of the space
between _sheap and _eheap that is reserved. Since we are using the whole heap it
might be tempting to just use the symbols to derive the pool location and size and
to sidestep sbrk(). Especially since the memory allocation functions are expected to
be the only users of such a feature. That "trick" would make the OS impossible to
debug in case the was a mistake and some of the original allocation functions slipped
through non-overriden. If sbrk is used to reserve the entirety of the space then that
rogue function will try to call it and fail as no more heap is available. In fact
this is how I found out that I was overriding the wrong functions (put a breakpoint
int sbrk and show a traceback.) If sbrk is sidestepped one would have nasty and
impossible to debug memory corruption errors.

A third option could be to use the heap space directly and not define sbrk. This
is beyond the scope of this change, but is probably the route to go for platform
that do not define this call (but first do a thoroug investigation of how the libc
works in that platform).

Messing with the global system allocator is not an easy thing to do. I would say
that tslf-malloc is ATM _only_ supported in native and cortex-m.

Testing procedure:

Run `tests/pkg_tlsf_malloc`.

Fixes: RIOT-OS#4490, RIOT-OS#5796.
Closes: RIOT-OS#12021
@jcarrano
Copy link
Contributor Author

@benpicco done.

@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed State: waiting for other PR State: The PR requires another PR to be merged first labels Sep 28, 2019
@benpicco
Copy link
Contributor

benpicco commented Sep 29, 2019

Murdock isn't very happy yet.
#12328 should at least help with the RiscV failures.

@miri64 miri64 added the State: stale State: The issue / PR has no activity for >185 days label Jun 25, 2020
@benpicco benpicco removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 8, 2020
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jul 8, 2020
@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR State: stale State: The issue / PR has no activity for >185 days labels Jul 8, 2020
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jul 8, 2020
@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss
Copy link
Contributor

What is happening with this, I don't know if @jcarrano is still active. Should we close or does someone want to take this over?

@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@jcarrano
Copy link
Contributor Author

@MrKevinWeiss sorry fo the late response. I'm not active. I'd gladly help, but I have no time.

Even if this is not fixed right away, the existence of this issue in itself is good so that when people try to use TLSF and experience issues they can find this writeup.

@MrKevinWeiss
Copy link
Contributor

Good to hear from you @jcarrano, will leave it open until something properly resolves it.

@MrKevinWeiss MrKevinWeiss added the State: don't stale State: Tell state-bot to ignore this issue label Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR State: don't stale State: Tell state-bot to ignore this issue Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

native: tlsf: early malloc will lead to a crash pkg: tlsf: initialize memory pool early
4 participants