Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Retries #860

Merged
merged 14 commits into from
Jun 24, 2024
Merged

DB Retries #860

merged 14 commits into from
Jun 24, 2024

Conversation

neekolas
Copy link
Contributor

@neekolas neekolas commented Jun 22, 2024

tl;dr

  • Add more retries to things that might fail due to locked databases
  • Fixes the retryable! macro to not reference the error returned. A double reference will fall back to the generic RetryableError implementation for &T, instead of using the user-defined implementation
  • Fixed up the tracing so that we aren't logging extremely noisy references to self

@@ -1,3 +1,4 @@
#![recursion_limit = "256"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing this is definitely a smell that we are doing something crazy with macros. It fails to compile without raising the limit saying we are at 130.

Copy link
Contributor

@insipx insipx Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crossing the recursion limit first showed up after adding tracing::instrument attribute macros everywhere for benchmarking. These macros are nested within each other depending on the codepath resulting in us reaching the default limit. The easiest is to just remove the tracing::instrument macros. I haven't tried it but we can also specify a static verbosity level to skip compiling these macros unless we specify a trace level at compile time. This would probably get rid of the attribute for the xmtp_mls crate, but would require the attribute for benchmarks or anything making use of the instrumentation.

xmtp_mls/src/groups/mod.rs Show resolved Hide resolved
@@ -19,6 +22,12 @@ pub enum WrappedApiError {
AssociationDeserialization(#[from] AssociationDeserializationError),
}

impl RetryableError for WrappedApiError {
fn is_retryable(&self) -> bool {
matches!(self, Self::Api(_))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API errors werent being retried either

@neekolas neekolas changed the title [WIP] DB Retries DB Retries Jun 22, 2024
@neekolas neekolas marked this pull request as ready for review June 22, 2024 22:19
@neekolas neekolas requested a review from a team as a code owner June 22, 2024 22:19
Copy link
Contributor

@richardhuaaa richardhuaaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice reliability improvement

@@ -146,6 +146,8 @@ impl EncryptedMessageStore {
conn.batch_execute(&format!("PRAGMA key = \"x'{}'\";", hex::encode(key)))?;
}

conn.batch_execute("PRAGMA busy_timeout = 5000;")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we seeing busy errors that take more than 5s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, this change did nothing. We aren't seeing busy errors at all, since the DB locks are a different error code altogether.

@neekolas neekolas merged commit 1e09690 into main Jun 24, 2024
5 checks passed
@neekolas neekolas deleted the nm/db-retries branch June 24, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants