Skip to content

Commit

Permalink
reactor: disable stall-detector on aarch64 to fix asan failures
Browse files Browse the repository at this point in the history
Ceph crimson uses seastar on debug mode which would enable ASAN, while stall-detector uses glibc backtrace function which would cause ASAN failures on aarch64.
Reason see scylladb/scylladb#15090 (comment)

Because arm ci servers in lab are "elderly", causing stall happened often, this PR is to disable stall-detector until seastar upstream migrated to libunwind, see scylladb#1878

Signed-off-by: Rongqi Sun <[email protected]>
  • Loading branch information
Svelar committed Oct 17, 2024
1 parent 7d4ae90 commit b033b20
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions src/core/reactor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1194,14 +1194,18 @@ cpu_stall_detector_posix_timer::cpu_stall_detector_posix_timer(cpu_stall_detecto
#define sigev_notify_thread_id _sigev_un._tid
#endif
sev.sigev_notify_thread_id = syscall(SYS_gettid);
#if !defined( __aarch64__)
int err = timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &_timer);
if (err) {
throw std::system_error(std::error_code(err, std::system_category()));
}
#endif
}

cpu_stall_detector_posix_timer::~cpu_stall_detector_posix_timer() {
#if !defined( __aarch64__)
timer_delete(_timer);
#endif
}

cpu_stall_detector_config
Expand Down Expand Up @@ -1274,7 +1278,9 @@ cpu_stall_detector::reset_suppression_state(sched_clock::time_point now) {

void cpu_stall_detector_posix_timer::arm_timer() {
auto its = posix::to_relative_itimerspec(_threshold * _report_at + _slack, 0s);
#if !defined( __aarch64__)
timer_settime(_timer, 0, &its, nullptr);
#endif
}

void cpu_stall_detector::start_task_run(sched_clock::time_point now) {
Expand All @@ -1296,7 +1302,9 @@ void cpu_stall_detector::end_task_run(sched_clock::time_point now) {

void cpu_stall_detector_posix_timer::start_sleep() {
auto its = posix::to_relative_itimerspec(0s, 0s);
#if !defined( __aarch64__)
timer_settime(_timer, 0, &its, nullptr);
#endif
_rearm_timer_at = reactor::now();
}

Expand Down Expand Up @@ -1332,6 +1340,7 @@ cpu_stall_detector_linux_perf_event::arm_timer() {
// clear out any existing records in the ring buffer, so when we get interrupted next time
// we have only the stack associated with that interrupt, and so we don't overflow.
data_area_reader(*this).skip_all();
#if !defined( __aarch64__)
if (__builtin_expect(_enabled && _current_period == ns, 1)) {
// Common case - we're re-arming with the same period, the counter
// is already enabled.
Expand All @@ -1356,11 +1365,14 @@ cpu_stall_detector_linux_perf_event::arm_timer() {
_enabled = true;
_current_period = ns;
}
#endif
}

void
cpu_stall_detector_linux_perf_event::start_sleep() {
#if !defined( __aarch64__)
_fd.ioctl(PERF_EVENT_IOC_DISABLE, 0);
#endif
_enabled = false;
}

Expand Down Expand Up @@ -3273,11 +3285,13 @@ int reactor::do_run() {
_task_quota_timer.timerfd_settime(0, its);
auto& task_quote_itimerspec = its;

#if !defined( __aarch64__)
struct sigaction sa_block_notifier = {};
sa_block_notifier.sa_handler = &reactor::block_notifier;
sa_block_notifier.sa_flags = SA_RESTART;
auto r = sigaction(internal::cpu_stall_detector::signal_number(), &sa_block_notifier, nullptr);
assert(r == 0);
#endif

bool idle = false;

Expand Down

0 comments on commit b033b20

Please sign in to comment.