Add initial support for async stack traces to Unifex #616

ispeters · 2024-07-05T23:35:32Z

This PR copies the core of Folly's async stack trace support into include/unifex/tracing and builds on it to add support for generalized Senders.

When UNIFEX_NO_ASYNC_STACKS is falsey, unifex::connect returns a wrapped operation state that injects async stack tracing into the operation tree.

The wrapper operation:
- stores an AsyncStackFrame for the wrapped operation; and
- wraps the receiver.
In the wrapper operation's customization of unifex::start we:
- create an AsyncStackRoot on the stack;
- push the wrapper operation's AsyncStackFrame onto the current async stack;
- activate the wrapper operation's AsyncStackFrame on the current AsyncStackRoot; and
- start the wrapped operation.
In the wrapper receiver's completion methods we:
- create an AsyncStackRoot on the stack;
- copy the parent operation's AsyncStackFrame to the stack;
- activate the parent AsyncStackFrame on the current AsyncStackRoot; and
- invoke the parent operation's receiver.

The effect is that we build up a linked list (technically a DAG) of AsyncStackFrames pointing "up" toward the start of the operation as unifex::start recurses into the nested operation state and then unwind it on the way back out as the receiver completion methods are invoked. At any given time, the current thread's AsyncStackRoot is sitting on the most recently-activated "normal" stack frame that is participating in async stack management, allowing Folly's co_bt.py debugger extension to figure out when it should stop walking normal stack frames and start walking async stack frames.

As alluded to above, the behaviour of the async stack tracing machinery is controlled by the UNIFEX_NO_ASYNC_STACKS preprocessor macro. If it's truthy, async stacks are not traced; if it's falsey, they are traced. The default in unifex/config.hpp is to enable async stack tracing in non-Windows debug builds.

Why not Windows builds?
- Because there's something weird about how any_sender_of<> builds on Windows (both Clang and MSVC); the resolution is to land PR Make any_sender_of<> play nicer with MSVC #619, but that PR breaks an internal Meta build so I'll have to come back to it.
Why only debug builds?
- The additional work done to track async stacks adds non-trivial binary size to the output so I figure it should default to off for release builds. You can turn it on by defining UNIFEX_NO_ASYNC_STACKS=0 in your release build script if the extra debuggability is worth the extra binary size in production.

This iteration is an MVP:

only general senders are supported, not coroutines
the "return addresses" captured for each sender point to unifex::_get_return_address::default_return_address<T>(), where T is the type of the sender
- this is better than nothing because the resulting symbol includes the sender's fully-qualified name, but it's not great

Futures PRs will:

add support for tracing the async stacks of coroutines
improve the rendering of async stack traces by making senders capture a pointer to the call site of their factory
maybe shrink the binary size overhead of enabling this feature if I can figure out how to eliminate some of the recursion

Co-authored-by: Ján Ondrušek [email protected]
Co-authored-by: Jessica Wong [email protected]
Co-authored-by: Deniz Evrenci [email protected]

ispeters · 2024-07-09T18:28:09Z

This isn't really ready for review; I'm marking it as such to trigger some Meta-internal automation.

@janondrusek

This diff, originally by @janondrusek and @jesswong, copies the core of Folly's [async stack trace support](https://github.com/facebook/folly/tree/main/folly/tracing) into `include/unifex/tracing`.

Stop using `void*` to represent both instruction pointers and stack frame pointers and start using `unifex::instruction_ptr` and `unifex::frame_ptr`.

We need a way to restore a `ScopedAsyncStackRoot` to the "no active frame" state before destroying it on the way out of a customization of `unifex::start` but the frame we want to deactivate is a member of the operation state, which means it's likely already been destroyed. This diff adds `ScopedAsyncStackRoot::ensureFrameDeactivated()`, which performs most of the same actions as `deactivateAsyncStackFrame()` but without touching the frame. I think this still technically invokes UB by copying and comparing a zapped pointer, but it's better than what we had before.

This diff adds a new receiver query CPO that is expected to return the address of the `AsyncStackFrame` associated with the receiver's operation.

This diff adds a new sender query CPO that is expected to return the instruction pointer best representing the "return address" for the sender; the default implementation returns the return address of a function template instantiation that includes the sender's type in its signature as a kind of "better than nothing" result.

The `instruction_ptr` type is best rendered by the debugger as an "address", which will render as a symbol + offset rather than an arbitrary hexadecimal value. This diff adds a comment to the type documenting this fact.

This diff modifies `unifex::sync_wait()` to establish an `AsyncStackRoot` on the stack while the awaited operation is running.

This diff modifies `unifex::connect` to inject async stack tracking into every operation state is it's built.

The Unifex unit test suite won't build for Windows with async stack injection enabled *unless* PR #619 (Make any_sender_of<> play nicer with MSVC) is also merged, but that PR causes Windows + Clang + ASAN errors in Meta-internal builds. This diff works around the above conflict by disabling async stack injection in Windows builds by default so we don't need PR #619. We can change the default once we figure out a proper resolution to the ASAN problem.

ericniebler · 2024-07-17T18:56:14Z

cool! would be extra super duper cool if it came with debugger scripts for dumping the backtrace. but maybe that belongs in a separate PR.

do you have an example of such a backtrace? i'm curious what it looks like.

ispeters · 2024-07-17T19:59:45Z

cool! would be extra super duper cool if it came with debugger scripts for dumping the backtrace. but maybe that belongs in a separate PR.

This PR makes Unifex's runtime representation of async stacks compatible with Folly's so Folly's co_bt.py can dump Unifex's stacks, too. I've got agreement in principle with the relevant Folly folks that async stacks ought to live in some third library that both Unifex and Folly can depend upon; not sure when/if we'll get there, but I'd love to see other S/R libraries depend upon it, too.

do you have an example of such a backtrace? i'm curious what it looks like.

Here's an lldb session debugging the Nest test in libunifex/test/let_value_test.cpp with some filename redactions:

(lldb) b let_value_test.cpp:139
Breakpoint 3: where = …`Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()::operator()() const + 20 at let_value_test.cpp:139:29, address = 0x0000000000eb62b4
(lldb) r
Process 197428 launched: '…' (x86_64)
Note: Google Test filter = Let.Nested
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Let
[ RUN      ] Let.Nested
Process 197428 stopped
* thread #1, name = '…', stop reason = breakpoint 1.1
    frame #0: 0x0000000000ea9b0a unittest`Let_Nested_Test::TestBody(this=0x00007ffff601a7f0) at let_value_test.cpp:129:31
   126 	}
   127
   128 	TEST(Let, Nested) {
-> 129 	  timed_single_thread_context context;
   130 	  // More complicated 'let_value' example that shows recursive let_value-scopes,
   131 	  // additional
   132
(lldb) c
Process 197428 resuming
producing vector
Process 197428 stopped
* thread #5, name = '…', stop reason = breakpoint 3.1
    frame #0: 0x0000000000eb62b4 …`Let_Nested_Test::TestBody(this=0x00007fffffffd1c8)::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()::operator()() const at let_value_test.cpp:139:29
   136 	              asyncVector(context),
   137 	              [&](std::vector<int>& v) {
   138 	                return async(context, [&] {
-> 139 	                  std::cout << "printing vector" << std::endl;
   140 	                  for (int& x : v) {
   141 	                    std::cout << x << ", ";
   142 	                  }
(lldb) co_bt
#0  0x0000000000eb62b4 in Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()::operator()() const () at …unifex/test/let_value_test.cpp:139
#1  0x0000000000eb6295 in void std::__invoke_impl<void, Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>(std::__invoke_other, Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()&&) () at …libgcc/include/c++/trunk/bits/invoke.h:61
#2  0x0000000000eb6275 in std::__invoke_result<Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>::type std::__invoke<Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>(Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()&&) () at …libgcc/include/c++/trunk/bits/invoke.h:96
#3  0x0000000000eb6205 in std::invoke_result<Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>::type std::invoke<Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>(Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()&&) () at …libgcc/include/c++/trunk/functional:97
#4  0x0000000000eb6169 in void unifex::_then::_receiver<unifex::_inject::_rcvr_wrapper<unifex::_let_v::_successor_receiver<unifex::_let_v::_op<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, auto (anonymous namespace)::$_3::operator()<unifex::timed_single_thread_context>(unifex::timed_single_thread_context&) const::'lambda'()>::type&&, Let_Nested_Test::TestBody()::$_4, unifex::_inject::_rcvr_wrapper<unifex::_when_all::_element_receiver<0ul, unifex::_inject::_rcvr_wrapper<unifex::_then::_receiver<unifex::_inject::_rcvr_wrapper<unifex::_sync_wait::_receiver<unifex::_unit::unit>::type>::type, Let_Nested_Test::TestBody()::$_6>::type>::type, unifex::_let_v::_sender<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, auto (anonymous namespace)::$_3::operator()<unifex::timed_single_thread_context>(unifex::timed_single_thread_context&) const::'lambda'()>::type, Let_Nested_Test::TestBody()::$_4>::type&&, unifex::_let_v::_sender<unifex::_just::_sender<int>::type, Let_Nested_Test::TestBody()::$_5>::type&&>::type>::type>::type, std::vector<int, std::allocator<int>>>::type>::type, Let_Nested_Test::TestBody()::Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>::type::set_value<>() && () at …unifex/include/unifex/then.hpp:72
#5  0x0000000000eb56ad in unifex::instruction_ptr unifex::_get_return_address::default_return_address<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, Let_Nested_Test::TestBody()::$_4::operator()(std::vector<int, std::allocator<int>>&) const::'lambda'()>::type>() () at …unifex/include/unifex/tracing/get_return_address.hpp:57
#6  0x0000000000eb255d in unifex::instruction_ptr unifex::_get_return_address::default_return_address<unifex::_let_v::_sender<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, auto (anonymous namespace)::$_3::operator()<unifex::timed_single_thread_context>(unifex::timed_single_thread_context&) const::'lambda'()>::type, Let_Nested_Test::TestBody()::$_4>::type>() () at …unifex/include/unifex/tracing/get_return_address.hpp:57
#7  0x0000000000eb17bd in unifex::instruction_ptr unifex::_get_return_address::default_return_address<unifex::_when_all::_sender<unifex::_let_v::_sender<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, auto (anonymous namespace)::$_3::operator()<unifex::timed_single_thread_context>(unifex::timed_single_thread_context&) const::'lambda'()>::type, Let_Nested_Test::TestBody()::$_4>::type, unifex::_let_v::_sender<unifex::_just::_sender<int>::type, Let_Nested_Test::TestBody()::$_5>::type>::type>() () at …unifex/include/unifex/tracing/get_return_address.hpp:57
#8  0x0000000000eb155d in unifex::instruction_ptr unifex::_get_return_address::default_return_address<unifex::_then::_sender<unifex::_when_all::_sender<unifex::_let_v::_sender<unifex::_then::_sender<unifex::_timed_single_thread_context::_schedule_after_sender<std::chrono::duration<long, std::ratio<1l, 1000l>>>::type, auto (anonymous namespace)::$_3::operator()<unifex::timed_single_thread_context>(unifex::timed_single_thread_context&) const::'lambda'()>::type, Let_Nested_Test::TestBody()::$_4>::type, unifex::_let_v::_sender<unifex::_just::_sender<int>::type, Let_Nested_Test::TestBody()::$_5>::type>::type, Let_Nested_Test::TestBody()::$_6>::type>() () at …unifex/include/unifex/tracing/get_return_address.hpp:57
#9  0x0000000000ea9c46 in Let_Nested_Test::TestBody() () at …unifex/test/let_value_test.cpp:133
#10 0x0000000001351539 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) () at …gtest.cc:2677
#11 0x0000000001351269 in testing::Test::Run() () at …gtest.cc:2699
#12 0x0000000001352d93 in testing::TestInfo::Run() () at …gtest.cc:2844
#13 0x0000000001354b9c in testing::TestSuite::Run() () at …gtest.cc:3022
#14 0x0000000001368fdc in testing::internal::UnitTestImpl::RunAllTests() () at …gtest.cc:5926
#15 0x0000000001368b5e in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) () at …gtest.cc:2675
#16 0x0000000001368722 in testing::UnitTest::Run() () at …gtest.cc:5489
#17 0x000000000130f231 in RUN_ALL_TESTS() () at …gtest.h:2317
#18 0x000000000130f162 in main () at …:20
#19 0x00007ffff7c2c657 in __libc_start_call_main () at ???:0
#20 0x00007ffff7c2c718 in __libc_start_main@@GLIBC_2.34 () at ???:0
#21 0x00000000009cdba1 in _start () at …glibc…/sysdeps/x86_64/start.S:118

Things to note:

the process is stopped on thread 5 (the test schedules some work onto a non-main thread), but the stack traces back to _start() on the main thread
frames 5-8 are using the ugly, default customizations of unifex::get_return_address to figure out what instruction pointer to use to represent the suspended operation; those frames will provide more useful information when the corresponding algorithms capture their call sites
frame 9 is the call site of sync_wait in the test body

ericniebler · 2024-07-18T03:25:01Z

I'm looking for a frame that represents the transition from thread 1 to thread 5, something like a transfer. I'm not seeing it tho. Why?

ispeters · 2024-07-18T04:55:04Z

I'm looking for a frame that represents the transition from thread 1 to thread 5, something like a transfer. I'm not seeing it tho. Why?

Because the sender that did that has already completed by the time the breakpoint I selected hits. Completed operations are, in this respect, analogous to functions that have returned—they're not on the stack because the stack represents the list of suspended operations waiting to be completed.

This change extends the work in #616 to support async stack frames in `task<>` coroutines, including those that invoke `at_coroutine_exit()`. In `task<>`, when `UNIFEX_NO_ASYNC_STACKS` is falsey, the awaiter returned from `task<>`'s customization of `unifex::await_transform` stores an `AsyncStackFrame`. The awaiter pushes its frame onto the current async stack in `await_suspend()` and pops it again in `await_resume()`; since `await_resume()` is only invoked for value and error completions, this arrangement leaves it up to the waiting task to pop the awaiter's frame when the awaited task completes with done. This can be expressed as a new rule: - when a coroutine completes with a value or an error, it is responsible for popping its own `AsyncStackFrame`; but - when a coroutine completes with done, the *caller* is responsible for popping the callee's `AsyncStackFrame` as a part of the caller's `unhandled_done()` coroutine. To support this new requirement of `unhandled_done()` (that it is responsible for popping the callee's stack frame), this change introduces `popAsyncStackFrameFromCaller`, which takes the caller's stack frame by reference so that it can assert that, after popping the current async frame (whatever it is), the new top frame is the caller's frame. A `task<>` promise has an `AsyncStackFrame*` that, when it's not `nullptr`, points to the `AsyncStackFrame` in the awaiter waiting for the task. This pointer exists even when `UNIFEX_NO_ASYNC_STACKS` is truthy to help mitigate against ODR violations; linking together two TUs with `UNIFEX_NO_ASYNC_STACKS` set differently is not explicitly supported but, by ensuring this pointer always exists, some ODR problems are avoided. When a `task<>` is awaited from a TU with async stack support enabled, the awaited task's awaiter sets the promise's `AsyncStackFrame*` to point to the awaiter's frame; when a `task<>` is awaited from a TU with async stack support disabled, this assignment never happens and the promise's pointer remains null. The above description of `task<>`'s async stack maintenance only covers the recursive case of on coroutine awaiting another. The base case is handled in `connect_awaitable()`, where an `AsyncStackRoot` is set up before starting the connected awaitable. `stop_if_requested` used to model both `sender` and `awaitable` so that `co_await stop_if_requested();` could take advantage of symmetric transfer. The `stop_if_requested` sender now customizes `await_transform` to express its participation in async stack management. This means of expressing async stack awareness is unsatisfying but I don't have any better ideas right now. Lastly, `unifex::await_transform()` now wraps naturally-awaitable arguments in an `awaiter_wrapper` that ensures the `coroutine_handle<>` passed to the wrapped awaitable is one that establishes an active `AsyncStackRoot` before resuming the real waiting coroutine.

ispeters requested review from janondrusek and jesswong July 5, 2024 23:35

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 5, 2024

ispeters force-pushed the async_stack_traces branch 2 times, most recently from b578556 to 4f975a7 Compare July 9, 2024 18:22

ispeters marked this pull request as ready for review July 9, 2024 18:27

ispeters marked this pull request as draft July 9, 2024 18:59

ispeters force-pushed the async_stack_traces branch 4 times, most recently from c357920 to 9d3358f Compare July 16, 2024 01:27

Import Folly's async stack library

41d19b2

This diff, originally by @janondrusek and @jesswong, copies the core of Folly's [async stack trace support](https://github.com/facebook/folly/tree/main/folly/tracing) into `include/unifex/tracing`.

ispeters force-pushed the async_stack_traces branch from 9d3358f to 711a18e Compare July 16, 2024 05:31

Add type-safe instruction and frame pointer types

19a1bd3

Stop using `void*` to represent both instruction pointers and stack frame pointers and start using `unifex::instruction_ptr` and `unifex::frame_ptr`.

ispeters force-pushed the async_stack_traces branch 4 times, most recently from ac17292 to a37253f Compare July 16, 2024 20:00

ispeters added 7 commits July 16, 2024 21:38

Add get_async_stack_frame

af69543

This diff adds a new receiver query CPO that is expected to return the address of the `AsyncStackFrame` associated with the receiver's operation.

Add a comment re: lldb type summaries

7b0b6bd

The `instruction_ptr` type is best rendered by the debugger as an "address", which will render as a symbol + offset rather than an arbitrary hexadecimal value. This diff adds a comment to the type documenting this fact.

Establish an AsyncStackRoot in sync_wait()

436d964

This diff modifies `unifex::sync_wait()` to establish an `AsyncStackRoot` on the stack while the awaited operation is running.

Modify unifex::connect to inject async stacks

d2d0f1f

This diff modifies `unifex::connect` to inject async stack tracking into every operation state is it's built.

ispeters force-pushed the async_stack_traces branch from a37253f to 2d1d85b Compare July 17, 2024 05:30

ispeters changed the title ~~Import Folly's async stack library~~ Add initial support for async stack traces to Unifex Jul 17, 2024

ispeters marked this pull request as ready for review July 17, 2024 05:58

jesswong approved these changes Jul 17, 2024

View reviewed changes

ispeters merged commit 16740d0 into main Jul 18, 2024
151 checks passed

ispeters deleted the async_stack_traces branch July 18, 2024 05:29

ispeters mentioned this pull request Jul 19, 2024

[BUG]: unifex/config.hpp seems to be out of date compiler-explorer/compiler-explorer#6711

Closed

ispeters mentioned this pull request Aug 29, 2024

Add async stack support to coroutines #632

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for async stack traces to Unifex #616

Add initial support for async stack traces to Unifex #616

ispeters commented Jul 5, 2024 •

edited

Loading

ispeters commented Jul 9, 2024

ericniebler commented Jul 17, 2024

ispeters commented Jul 17, 2024 •

edited

Loading

ericniebler commented Jul 18, 2024

ispeters commented Jul 18, 2024

Add initial support for async stack traces to Unifex #616

Add initial support for async stack traces to Unifex #616

Conversation

ispeters commented Jul 5, 2024 • edited Loading

ispeters commented Jul 9, 2024

ericniebler commented Jul 17, 2024

ispeters commented Jul 17, 2024 • edited Loading

ericniebler commented Jul 18, 2024

ispeters commented Jul 18, 2024

ispeters commented Jul 5, 2024 •

edited

Loading

ispeters commented Jul 17, 2024 •

edited

Loading