Skip to content

Commit

Permalink
Add initial support for async stack traces to Unifex (#616)
Browse files Browse the repository at this point in the history
This PR copies the core of Folly's [async stack trace support](https://github.com/facebook/folly/tree/main/folly/tracing) into `include/unifex/tracing` and builds on it to add support for generalized *Senders*.

When `UNIFEX_NO_ASYNC_STACKS` is falsey, `unifex::connect` returns a wrapped operation state that injects async stack tracing into the operation tree.
 - The wrapper operation:
    - stores an `AsyncStackFrame` for the wrapped operation; and
    - wraps the receiver.
 - In the wrapper operation's customization of `unifex::start` we:
    - create an `AsyncStackRoot` on the stack;
    - push the wrapper operation's `AsyncStackFrame` onto the current async stack;
    - activate the wrapper operation's `AsyncStackFrame` on the current `AsyncStackRoot`; and
    - start the wrapped operation.
 - In the wrapper receiver's completion methods we:
    - create an `AsyncStackRoot` on the stack;
    - copy the *parent* operation's `AsyncStackFrame` to the stack;
    - activate the parent `AsyncStackFrame` on the current `AsyncStackRoot`; and
    - invoke the parent operation's receiver.

The effect is that we build up a linked list (technically a DAG) of `AsyncStackFrame`s pointing "up" toward the start of the operation as `unifex::start` recurses into the nested operation state and then unwind it on the way back out as the receiver completion methods are invoked. At any given time, the current thread's `AsyncStackRoot` is sitting on the most recently-activated "normal" stack frame that is participating in async stack management, allowing Folly's `co_bt.py` debugger extension to figure out when it should stop walking normal stack frames and start walking async stack frames.

As alluded to above, the behaviour of the async stack tracing machinery is controlled by the `UNIFEX_NO_ASYNC_STACKS` preprocessor macro. If it's truthy, async stacks are not traced; if it's falsey, they are traced. The default in `unifex/config.hpp` is to enable async stack tracing in non-Windows debug builds.
 - Why not Windows builds?
    - Because there's something weird about how `any_sender_of<>` builds on Windows (both Clang and MSVC); the resolution is to land PR #619, but that PR breaks an internal Meta build so I'll have to come back to it.
 - Why only debug builds?
    - The additional work done to track async stacks adds non-trivial binary size to the output so I figure it should default to off for release builds. You can turn it on by defining `UNIFEX_NO_ASYNC_STACKS=0` in your release build script if the extra debuggability is worth the extra binary size in production.

This iteration is an MVP:
 - only general senders are supported, not coroutines
 - the "return addresses" captured for each sender point to `unifex::_get_return_address::default_return_address<T>()`, where `T` is the type of the sender
    - this is better than nothing because the resulting symbol includes the sender's fully-qualified name, but it's not great

Futures PRs will:
 - add support for tracing the async stacks of coroutines
 - improve the rendering of async stack traces by making senders capture a pointer to the call site of their factory
 - maybe shrink the binary size overhead of enabling this feature if I can figure out how to eliminate some of the recursion

Original diff descriptions:
* Import Folly's async stack library

This diff, originally by @janondrusek and @jesswong, copies the core of
Folly's [async stack trace support](https://github.com/facebook/folly/tree/main/folly/tracing)
into `include/unifex/tracing`.

* Add type-safe instruction and frame pointer types

Stop using `void*` to represent both instruction pointers and stack
frame pointers and start using `unifex::instruction_ptr` and
`unifex::frame_ptr`.

* Add ScopedAsyncStackRoot::ensureFrameDeactivated()

We need a way to restore a `ScopedAsyncStackRoot` to the "no active
frame" state before destroying it on the way out of a customization of
`unifex::start` but the frame we want to deactivate is a member of the
operation state, which means it's likely already been destroyed. This
diff adds `ScopedAsyncStackRoot::ensureFrameDeactivated()`, which
performs most of the same actions as `deactivateAsyncStackFrame()` but
without touching the frame. I think this still technically invokes UB by
copying and comparing a zapped pointer, but it's better than what we had
before.

* Add get_async_stack_frame

This diff adds a new receiver query CPO that is expected to return the
address of the `AsyncStackFrame` associated with the receiver's
operation.

* Add get_return_address

This diff adds a new sender query CPO that is expected to return the
instruction pointer best representing the "return address" for the
sender; the default implementation returns the return address of a
function template instantiation that includes the sender's type in its
signature as a kind of "better than nothing" result.

* Add a comment re: lldb type summaries

The `instruction_ptr` type is best rendered by the debugger as an
"address", which will render as a symbol + offset rather than an
arbitrary hexadecimal value. This diff adds a comment to the type
documenting this fact.

* Establish an AsyncStackRoot in sync_wait()

This diff modifies `unifex::sync_wait()` to establish an
`AsyncStackRoot` on the stack while the awaited operation is running.

* Modify unifex::connect to inject async stacks

This diff modifies `unifex::connect` to inject async stack tracking into
every operation state is it's built.

* Work around Windows-only problems

The Unifex unit test suite won't build for Windows with async stack
injection enabled *unless* PR #619 (Make any_sender_of<> play nicer with
MSVC) is also merged, but that PR causes Windows + Clang + ASAN errors
in Meta-internal builds.

This diff works around the above conflict by disabling async stack
injection in Windows builds by default so we don't need PR #619. We can
change the default once we figure out a proper resolution to the ASAN
problem.

Co-authored-by: Ján Ondrušek <[email protected]>
Co-authored-by: Jessica Wong <[email protected]>
Co-authored-by: Deniz Evrenci <[email protected]>
  • Loading branch information
4 people authored Jul 18, 2024
1 parent 1916a2f commit 16740d0
Show file tree
Hide file tree
Showing 13 changed files with 1,502 additions and 36 deletions.
20 changes: 20 additions & 0 deletions include/unifex/config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -291,3 +291,23 @@
# define UNIFEX_LOG_DANGLING_STOP_CALLBACKS 0
# endif
#endif

#if defined(__has_builtin)
# define UNIFEX_HAS_BUILTIN(...) __has_builtin(__VA_ARGS__)
#else
# define UNIFEX_HAS_BUILTIN(...) 0
#endif

#if !defined(UNIFEX_NO_ASYNC_STACKS)
// default:
// - release builds do not have async stacks
// - Windows builds do not have async stacks
//
// adding async stacks adds non-trivial binary size at the moment, and I can't
// figure out how to make all the relevant Windows builds succeed
# if defined(NDEBUG) || defined(_MSC_VER)
# define UNIFEX_NO_ASYNC_STACKS 1
# else
# define UNIFEX_NO_ASYNC_STACKS 0
# endif
#endif // !defined(UNIFEX_NO_ASYNC_STACKS)
5 changes: 5 additions & 0 deletions include/unifex/detail/unifex_fwd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ extern const struct _fn submit;
} // namespace _submit_cpo
using _submit_cpo::submit;

namespace _start_cpo {
struct _fn;
}
extern const _start_cpo::_fn start;

namespace _connect::_cpo {
struct _fn;
} // namespace _connect::_cpo
Expand Down
93 changes: 74 additions & 19 deletions include/unifex/sender_concepts.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
#include <unifex/blocking.hpp>
#include <unifex/receiver_concepts.hpp>
#include <unifex/tag_invoke.hpp>
#include <unifex/tracing/async_stack.hpp>
#include <unifex/tracing/get_async_stack_frame.hpp>
#include <unifex/tracing/get_return_address.hpp>
#include <unifex/tracing/inject_async_stack.hpp>
#include <unifex/type_list.hpp>
#include <unifex/type_traits.hpp>
#include <unifex/detail/unifex_fwd.hpp>
Expand Down Expand Up @@ -195,7 +199,7 @@ template <typename S>
inline constexpr bool typed_bulk_sender = bulk_sender<S>;

namespace _start_cpo {
inline const struct _fn {
struct _fn {
template(typename Operation) //
(requires tag_invocable<_fn, Operation&>) //
auto
Expand All @@ -214,35 +218,86 @@ inline const struct _fn {
noexcept(op.start()), "start() customisation must be noexcept");
return op.start();
}
} start{};
};
} // namespace _start_cpo
using _start_cpo::start;
inline const _start_cpo::_fn start{};

namespace _connect {

template <typename Sender, typename Receiver>
using _member_connect_result_t =
decltype((UNIFEX_DECLVAL(Sender&&)).connect(UNIFEX_DECLVAL(Receiver&&)));

template <typename Sender, typename Receiver>
UNIFEX_CONCEPT_FRAGMENT( //
_has_member_connect_, //
requires()( //
typename(_member_connect_result_t<Sender, Receiver>)));

template <typename Sender, typename Receiver>
UNIFEX_CONCEPT //
_is_member_connectible = //
sender<Sender> &&
UNIFEX_FRAGMENT(_connect::_has_member_connect_, Sender, Receiver);

template <typename Sender, typename Receiver>
UNIFEX_CONCEPT //
_is_nothrow_member_connectible = //
_is_member_connectible<Sender, Receiver> &&
noexcept(UNIFEX_DECLVAL(Sender&&).connect(UNIFEX_DECLVAL(Receiver&&)));

namespace _cpo {

struct _fn {
template(typename Sender, typename Receiver) //
(requires sender<Sender> AND receiver<Receiver> AND
tag_invocable<_fn, Sender, Receiver>) //
private:
struct _impl {
template(typename S, typename R) //
(requires tag_invocable<_fn, S, R>) //
auto
operator()(S&& s, R&& r) const
noexcept(is_nothrow_tag_invocable_v<_fn, S, R>)
-> tag_invoke_result_t<_fn, S, R> {
return unifex::tag_invoke(_fn{}, std::forward<S>(s), std::forward<R>(r));
}

template(typename S, typename R) //
(requires(!tag_invocable<_fn, S, R>)
AND _is_member_connectible<S, R>) //
auto
operator()(S&& s, R&& r) const
noexcept(_is_nothrow_member_connectible<S, R>)
-> _member_connect_result_t<S, R> {
return std::forward<S>(s).connect(std::forward<R>(r));
}
};

#if UNIFEX_NO_ASYNC_STACKS
public:
template(typename S, typename R) //
(requires sender<S> AND receiver<R>) //
auto
operator()(Sender&& s, Receiver&& r) const
noexcept(is_nothrow_tag_invocable_v<_fn, Sender, Receiver>)
-> tag_invoke_result_t<_fn, Sender, Receiver> {
return unifex::tag_invoke(
_fn{}, std::forward<Sender>(s), std::forward<Receiver>(r));
operator()(S&& s, R&& r) const
noexcept(noexcept(_impl{}(std::forward<S>(s), std::forward<R>(r))))
-> decltype(_impl{}(std::forward<S>(s), std::forward<R>(r))) {
return _impl{}(std::forward<S>(s), std::forward<R>(r));
}
#else
template <typename S, typename R>
using op_t = _inject::
op_wrapper<std::invoke_result_t<_impl, S, _inject::receiver_t<R>>, R>;

template(typename Sender, typename Receiver) //
(requires sender<Sender> AND
receiver<Receiver> AND(!tag_invocable<_fn, Sender, Receiver>)) //
public:
template(typename S, typename R) //
(requires sender<S> AND receiver<R>) //
auto
operator()(Sender&& s, Receiver&& r) const
noexcept(noexcept(std::forward<Sender>(s).connect(std::forward<Receiver>(
r)))) -> decltype(std::forward<Sender>(s)
.connect(std::forward<Receiver>(r))) {
return std::forward<Sender>(s).connect(std::forward<Receiver>(r));
operator()(S&& s, R&& r) const noexcept(noexcept(_inject::make_op_wrapper(
std::forward<S>(s), std::forward<R>(r), _impl{}))) -> op_t<S, R> {
return _inject::make_op_wrapper(
std::forward<S>(s), std::forward<R>(r), _impl{});
}
#endif
};

} // namespace _cpo
} // namespace _connect
inline const _connect::_cpo::_fn connect{};
Expand Down
89 changes: 73 additions & 16 deletions include/unifex/sync_wait.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
#include <unifex/manual_lifetime.hpp>
#include <unifex/scheduler_concepts.hpp>
#include <unifex/sender_concepts.hpp>
#include <unifex/tracing/async_stack.hpp>
#include <unifex/tracing/get_async_stack_frame.hpp>
#include <unifex/with_query_value.hpp>

#include <condition_variable>
Expand Down Expand Up @@ -62,11 +64,12 @@ struct _receiver {
struct type {
promise<T>& promise_;
manual_event_loop& ctx_;
AsyncStackFrame& frame_;

template <typename... Values>
void set_value(Values&&... values) && noexcept {
UNIFEX_TRY {
unifex::activate_union_member(promise_.value_, (Values &&) values...);
unifex::activate_union_member(promise_.value_, (Values&&)values...);
promise_.state_ = promise<T>::state::value;
}
UNIFEX_CATCH(...) {
Expand All @@ -91,7 +94,7 @@ struct _receiver {

template <typename Error>
void set_error(Error&& e) && noexcept {
std::move(*this).set_error(make_exception_ptr((Error &&) e));
std::move(*this).set_error(make_exception_ptr((Error&&)e));
}

void set_done() && noexcept {
Expand All @@ -103,6 +106,11 @@ struct _receiver {
return r.ctx_.get_scheduler();
}

friend constexpr AsyncStackFrame*
tag_invoke(tag_t<get_async_stack_frame>, const type& r) noexcept {
return &r.frame_;
}

private:
void signal_complete() noexcept { ctx_.stop(); }
};
Expand All @@ -111,19 +119,40 @@ struct _receiver {
template <typename T>
using receiver_t = typename _receiver<T>::type;

struct initial_stack_root {
explicit initial_stack_root(
frame_ptr frameAddress, instruction_ptr returnAddress) noexcept
: root{frameAddress, returnAddress} {
frame.setReturnAddress(returnAddress);

root.activateFrame(frame);
}

~initial_stack_root() { deactivateAsyncStackFrame(frame); }

AsyncStackFrame frame;
unifex::detail::ScopedAsyncStackRoot root;
};

template <typename Result, typename Sender>
UNIFEX_CLANG_DISABLE_OPTIMIZATION std::optional<Result> _impl(Sender&& sender) {
UNIFEX_CLANG_DISABLE_OPTIMIZATION std::optional<Result>
_impl(Sender&& sender, frame_ptr frameAddress, instruction_ptr returnAddress) {
using promise_t = _sync_wait::promise<Result>;
promise_t promise;
manual_event_loop ctx;

// Store state for the operation on the stack.
auto operation =
connect((Sender &&) sender, _sync_wait::receiver_t<Result>{promise, ctx});
{
initial_stack_root stackRoot{frameAddress, returnAddress};

start(operation);
// Store state for the operation on the stack.
auto operation = connect(
(Sender&&)sender,
_sync_wait::receiver_t<Result>{promise, ctx, stackRoot.frame});

ctx.run();
start(operation);

ctx.run();
}

switch (promise.state_) {
case promise_t::state::done: return std::nullopt;
Expand All @@ -136,19 +165,44 @@ UNIFEX_CLANG_DISABLE_OPTIMIZATION std::optional<Result> _impl(Sender&& sender) {
} // namespace _sync_wait

namespace _sync_wait_cpo {
struct _fn {
class _fn {
struct impl_fn {
template <typename Sender>
auto operator()(
Sender&& sender,
frame_ptr frameAddress,
instruction_ptr returnAddress) const
-> std::optional<sender_single_value_result_t<remove_cvref_t<Sender>>> {
using Result = sender_single_value_result_t<remove_cvref_t<Sender>>;
return _sync_wait::_impl<Result>(
std::forward<Sender>(sender), frameAddress, returnAddress);
}
};

public:
template(typename Sender) //
(requires sender<Sender>) //
auto
operator()(Sender&& sender) const
-> std::optional<sender_single_value_result_t<remove_cvref_t<Sender>>> {
using Result = sender_single_value_result_t<remove_cvref_t<Sender>>;
return _sync_wait::_impl<Result>((Sender &&) sender);
return impl_fn{}(
std::forward<Sender>(sender),
frame_ptr::read_frame_pointer(),
instruction_ptr::read_return_address());
}
constexpr auto operator()() const
noexcept(std::is_nothrow_invocable_v<tag_t<bind_back>, _fn>)
-> bind_back_result_t<_fn> {
return bind_back(*this);

// Not constexpr anymore because __builtin_frame_address(0) (and, presumably,
// __builtin_return_address(0)) isn't constexpr in Clang constexpr
auto operator()() const noexcept(std::is_nothrow_invocable_v<
tag_t<bind_back>,
_fn,
frame_ptr,
instruction_ptr>)
-> bind_back_result_t<impl_fn, frame_ptr, instruction_ptr> {
return bind_back(
impl_fn{},
frame_ptr::read_frame_pointer(),
instruction_ptr::read_return_address());
}
};
} // namespace _sync_wait_cpo
Expand All @@ -163,7 +217,10 @@ struct _fn {
decltype(auto)
operator()(Sender&& sender) const {
using Result2 = non_void_t<wrap_reference_t<decay_rvalue_t<Result>>>;
return _sync_wait::_impl<Result2>((Sender &&) sender);
return _sync_wait::_impl<Result2>(
(Sender&&)sender,
frame_ptr::read_frame_pointer(),
instruction_ptr::read_return_address());
}
};
} // namespace _sync_wait_r_cpo
Expand Down
Loading

0 comments on commit 16740d0

Please sign in to comment.