Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: Add a lame-duck mode to common/trace.c #8108

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented Feb 21, 2025

The tracing subsystem is useful, but not useful enough to warrent
taking the entire daemon down when one of its internal consistency
requirements arent met. This commit introduces a lame-duck mode
that the node switches to as soon as it notices an inconsistency
and it prints the line and reason for the switch.

We can then later come back and investigate. The important thing
is that while the tracing instrumentation is incomplete, and can
cause these issues, we prefer no longer emitting traces rather
than taking the node down.

The tracing subsystem has some rather strong consistency requirements
that are easily broken by incorrect use of its external API. For
example not closing a span can cause an assertion on the next call to
crash the daemon. Due to the separation in time and code, it is not
always easy to debug there, and crashing the entire system is bad.

The lame-duck mode is switched to as soon as the first error occurs,
and we just stop emitting traces in that case.

Changelog-None
@cdecker cdecker force-pushed the 202507-trace-lame-duck branch from 5e5b582 to 3cedfaa Compare February 21, 2025 16:22
@@ -532,6 +533,10 @@ def prinErrlog(node):
return 1 if errors else 0


def get_trace_assert_errors(node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for errors in stderr?
Suspect it might be better to have a function callback for tracing, so we can BROKEN log.

*/
static bool active = true;

#define TRACE_ASSERT_RET(cond, msg, ret) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macros that return are an anti pattern: just have a macro for trace_assert_fail which hands #cond, file and line to bool trace_assert_fail_ and open code it. That keeps it readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants