Skip to content

Commit

Permalink
feat: introduce optional handler strategy (getsentry#1027)
Browse files Browse the repository at this point in the history
* reintroduce #ifdef symmetry regarding the usage of the handler_strategy (which must work on all UNIXes) and the query towards the handler_strategy option which must only work on Linux.

* ensure we "leave" the signal-handler when we invoke the CLR/Mono runtime handler

* ensure the page_allocator is only enabled when we have an actual native crash, we don't allocate before

* continuing the signal chain at the end is something we want for both strategies, because CHAIN_AT_START will reach this execution-path only if the runtime-handler decided that it was an actual native crash.

* add trace logs to at-start chaining, so we can see the behavior in the field when debugging is enabled

* ensure page-allocator is only referenced on UNIXes

* add integration test for managed and native crash

* ignore 32-bit Linux build in the integration test

* extract skip_condition to check what provokes the invalid syntax

* clean up sigaltstack initialization

* overwrite it only when the flag `SS_DISABLED` is set and the query didn't result in an error
* if the query was successful but the flag is anything but `SS_DISABLED`, only log the size and flags of the current stack
* if the query failed then log the corresponding error

* clean up run assertion on output, so we can actually see what's happening.

* create a non-faulty `sigaltstack` for the GHA runner

* disable the test on ASAN runs since that would require an instrumented runtime :-)

dotnet/runtime#13458

* Add changelog.
  • Loading branch information
supervacuus authored Nov 15, 2024
1 parent 67cc95c commit 7c1d428
Show file tree
Hide file tree
Showing 9 changed files with 308 additions and 33 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
**Features**:

- Provide version information for non-static Windows binaries. ([#1076](https://github.com/getsentry/sentry-native/pull/1076), [crashpad#110](https://github.com/getsentry/crashpad/pull/110))
- Add an alternative handler strategy to `inproc` to support `.NET` on Linux and `Mono` on Android (specifically, [.NET MAUI](https://github.com/dotnet/android/issues/9055#issuecomment-2261347912)). ([#1027](https://github.com/getsentry/sentry-native/pull/1027))

**Fixes**:

Expand Down
38 changes: 38 additions & 0 deletions include/sentry.h
Original file line number Diff line number Diff line change
Expand Up @@ -768,6 +768,14 @@ typedef enum {
SENTRY_USER_CONSENT_REVOKED = 0,
} sentry_user_consent_t;

/**
* The crash handler strategy.
*/
typedef enum {
SENTRY_HANDLER_STRATEGY_DEFAULT = 0,
SENTRY_HANDLER_STRATEGY_CHAIN_AT_START = 1,
} sentry_handler_strategy_t;

/**
* Creates a new options struct.
* Can be freed with `sentry_options_free`.
Expand Down Expand Up @@ -1492,6 +1500,36 @@ SENTRY_EXPERIMENTAL_API void sentry_options_set_traces_sample_rate(
SENTRY_EXPERIMENTAL_API double sentry_options_get_traces_sample_rate(
sentry_options_t *opts);

#ifdef SENTRY_PLATFORM_LINUX

/**
* Returns the currently set strategy for the handler.
*
* This option does only work for the `inproc` backend on `Linux` and `Android`.
*
* The main use-case is when the Native SDK is used in the context of the
* CLR/Mono runtimes which convert some POSIX signals into managed-code
* exceptions and discontinue the signal chain.
*
* If this happens and we invoke the previous handler at the end (i.e., after
* our handler processed the signal, which is the default strategy) we will end
* up sending a native crash in addition to the managed code exception (which
* will either generate another crash-event if uncaught or could be handled in
* the managed code and neither terminate the application nor create a crash
* event). To correctly process the signals of CLR/Mono applications, we must
* invoke the runtime handler at the start of our handler.
*/
SENTRY_EXPERIMENTAL_API sentry_handler_strategy_t
sentry_options_get_handler_strategy(const sentry_options_t *opts);

/**
* Sets the handler strategy.
*/
SENTRY_EXPERIMENTAL_API void sentry_options_set_handler_strategy(
sentry_options_t *opts, sentry_handler_strategy_t handler_strategy);

#endif // SENTRY_PLATFORM_LINUX

/* -- Session APIs -- */

typedef enum {
Expand Down
73 changes: 40 additions & 33 deletions src/backends/sentry_backend_inproc.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,6 @@
#include "transports/sentry_disk_transport.h"
#include <string.h>

/**
* Android's bionic libc seems to allocate alternate signal handler stacks for
* every thread and also references them from their internal maintenance
* structs.
*
* The way we currently set up our sigaltstack seems to interfere with this
* setup and causes crashes whenever an ART signal handler touches the thread
* that called `sentry_init()`.
*
* In addition to this problem, it also means there is no need for our own
* sigaltstack on Android since our signal handler will always be running on
* an alternate stack managed by bionic.
*
* Note: In bionic the sigaltstacks for 32-bit devices have a size of 16KiB and
* on 64-bit devices they have 32KiB. The size of our own was set to 64KiB
* independent of the device. If this is a problem, we need figure out
* together with Google if there is a way in which our configs can coexist.
*
* Both breakpad and crashpad are way more defensive in the setup of their
* signal stacks and take existing stacks into account (or reuse them).
*/
#define SIGNAL_DEF(Sig, Desc) { Sig, #Sig, Desc }

#define MAX_FRAMES 128
Expand Down Expand Up @@ -108,8 +87,8 @@ startup_inproc_backend(

// set up an alternate signal stack if noone defined one before
stack_t old_sig_stack;
if (sigaltstack(NULL, &old_sig_stack) == -1 || old_sig_stack.ss_sp == NULL
|| old_sig_stack.ss_size == 0) {
int ret = sigaltstack(NULL, &old_sig_stack);
if (ret == 0 && old_sig_stack.ss_flags == SS_DISABLE) {
SENTRY_TRACEF("installing signal stack (size: %d)", SIGNAL_STACK_SIZE);
g_signal_stack.ss_sp = sentry_malloc(SIGNAL_STACK_SIZE);
if (!g_signal_stack.ss_sp) {
Expand All @@ -118,9 +97,11 @@ startup_inproc_backend(
g_signal_stack.ss_size = SIGNAL_STACK_SIZE;
g_signal_stack.ss_flags = 0;
sigaltstack(&g_signal_stack, 0);
} else {
SENTRY_TRACEF(
"using existing signal stack (size: %d)", old_sig_stack.ss_size);
} else if (ret == 0) {
SENTRY_TRACEF("using existing signal stack (size: %d, flags: %d)",
old_sig_stack.ss_size, old_sig_stack.ss_flags);
} else if (ret == -1) {
SENTRY_WARNF("Failed to query signal stack size: %s", strerror(errno));
}

// install our own signal handler
Expand Down Expand Up @@ -554,21 +535,47 @@ handle_ucontext(const sentry_ucontext_t *uctx)
}

#ifdef SENTRY_PLATFORM_UNIX
// give us an allocator we can use safely in signals before we tear down.
sentry__page_allocator_enable();

// inform the sentry_sync system that we're in a signal handler. This will
// make mutexes spin on a spinlock instead as it's no longer safe to use a
// pthread mutex.
sentry__enter_signal_handler();
#endif

sentry_value_t event = make_signal_event(sig_slot, uctx);

SENTRY_WITH_OPTIONS (options) {
sentry__write_crash_marker(options);
#ifdef SENTRY_PLATFORM_LINUX
// On Linux (and thus Android) CLR/Mono converts signals provoked by
// AOT/JIT-generated native code into managed code exceptions. In these
// cases, we shouldn't react to the signal at all and let their handler
// discontinue the signal chain by invoking the runtime handler before
// we process the signal.
if (sentry_options_get_handler_strategy(options)
== SENTRY_HANDLER_STRATEGY_CHAIN_AT_START) {
SENTRY_TRACE("defer to runtime signal handler at start");
// there is a good chance that we won't return from the previous
// handler and that would mean we couldn't enter this handler with
// the next signal coming in if we didn't "leave" here.
sentry__leave_signal_handler();

// invoke the previous handler (typically the CLR/Mono
// signal-to-managed-exception handler)
invoke_signal_handler(
uctx->signum, uctx->siginfo, (void *)uctx->user_context);

// let's re-enter because it means this was an actual native crash
sentry__enter_signal_handler();
SENTRY_TRACE(
"return from runtime signal handler, we handle the signal");
}
#endif

#ifdef SENTRY_PLATFORM_UNIX
// use a signal-safe allocator before we tear down.
sentry__page_allocator_enable();
#endif

sentry_value_t event = make_signal_event(sig_slot, uctx);
bool should_handle = true;
sentry__write_crash_marker(options);

if (options->on_crash_func) {
SENTRY_TRACE("invoking `on_crash` hook");
Expand All @@ -580,7 +587,7 @@ handle_ucontext(const sentry_ucontext_t *uctx)
sentry_envelope_t *envelope = sentry__prepare_event(
options, event, NULL, !options->on_crash_func);
// TODO(tracing): Revisit when investigating transaction flushing
// during hard crashes.
// during hard crashes.

sentry_session_t *session = sentry__end_current_session_with_status(
SENTRY_SESSION_STATUS_CRASHED);
Expand Down
18 changes: 18 additions & 0 deletions src/sentry_options.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ sentry_options_new(void)
opts->shutdown_timeout = SENTRY_DEFAULT_SHUTDOWN_TIMEOUT;
opts->traces_sample_rate = 0.0;
opts->max_spans = 0;
opts->handler_strategy = SENTRY_HANDLER_STRATEGY_DEFAULT;

return opts;
}
Expand Down Expand Up @@ -594,3 +595,20 @@ sentry_options_set_backend(sentry_options_t *opts, sentry_backend_t *backend)
sentry__backend_free(opts->backend);
opts->backend = backend;
}

#ifdef SENTRY_PLATFORM_LINUX

sentry_handler_strategy_t
sentry_options_get_handler_strategy(const sentry_options_t *opts)
{
return opts->handler_strategy;
}

void
sentry_options_set_handler_strategy(
sentry_options_t *opts, sentry_handler_strategy_t handler_strategy)
{
opts->handler_strategy = handler_strategy;
}

#endif // SENTRY_PLATFORM_LINUX
1 change: 1 addition & 0 deletions src/sentry_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ typedef struct sentry_options_s {
long user_consent;
long refcount;
uint64_t shutdown_timeout;
sentry_handler_strategy_t handler_strategy;
} sentry_options_t;

/**
Expand Down
64 changes: 64 additions & 0 deletions tests/fixtures/dotnet_signal/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
namespace dotnet_signal;

using System;
using System.Runtime.InteropServices;

class Program
{
[DllImport("crash", EntryPoint = "native_crash")]
static extern void native_crash();

[DllImport("crash", EntryPoint = "enable_sigaltstack")]
static extern void enable_sigaltstack();

[DllImport("sentry", EntryPoint = "sentry_options_new")]
static extern IntPtr sentry_options_new();

[DllImport("sentry", EntryPoint = "sentry_options_set_handler_strategy")]
static extern IntPtr sentry_options_set_handler_strategy(IntPtr options, int strategy);

[DllImport("sentry", EntryPoint = "sentry_options_set_debug")]
static extern IntPtr sentry_options_set_debug(IntPtr options, int debug);

[DllImport("sentry", EntryPoint = "sentry_init")]
static extern int sentry_init(IntPtr options);

static void Main(string[] args)
{
var githubActions = Environment.GetEnvironmentVariable("GITHUB_ACTIONS") ?? string.Empty;
if (githubActions == "true") {
// Set up our own `sigaltstack` for this thread if we're running on GHA because of a failure to run any
// signal handler after the initial setup. This behavior is locally non-reproducible and likely runner-related.
// I ran this against .net7/8/9 on at least 10 different Linux setups, and it worked on all, but on GHA
// it only works if we __don't__ accept the already installed `sigaltstack`.
enable_sigaltstack();
}

// setup minimal sentry-native
var options = sentry_options_new();
sentry_options_set_handler_strategy(options, 1);
sentry_options_set_debug(options, 1);
sentry_init(options);

var doNativeCrash = args is ["native-crash"];
if (doNativeCrash)
{
native_crash();
}
else
{
try
{
Console.WriteLine("dereference a NULL object from managed code");
var s = default(string);
var c = s.Length;
}
catch (NullReferenceException exception)
{
Console.WriteLine("dereference another NULL object from managed code");
var s = default(string);
var c = s.Length;
}
}
}
}
19 changes: 19 additions & 0 deletions tests/fixtures/dotnet_signal/crash.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#include <signal.h>
#include <stdlib.h>
void native_crash(void)
{
*(int *)10 = 100;
}

void enable_sigaltstack(void)
{
const size_t signal_stack_size = 16384;
stack_t signal_stack;
signal_stack.ss_sp = malloc(signal_stack_size);
if (!signal_stack.ss_sp) {
return;
}
signal_stack.ss_size = signal_stack_size;
signal_stack.ss_flags = 0;
sigaltstack(&signal_stack, 0);
}
8 changes: 8 additions & 0 deletions tests/fixtures/dotnet_signal/test_dotnet.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
Loading

0 comments on commit 7c1d428

Please sign in to comment.