Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry PoolNeedsConnection Errors #1010

Merged
merged 7 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions xmtp_mls/src/groups/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,8 @@ pub enum GroupError {
SqlKeyStore(#[from] sql_key_store::SqlKeyStoreError),
#[error("No pending commit found")]
MissingPendingCommit,
#[error("Sync failed to wait for intent: {0}")]
insipx marked this conversation as resolved.
Show resolved Hide resolved
SyncFailedToWait(String),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neekolas adding this so it's easier to catch the error that is causing group creates to fail.

insipx marked this conversation as resolved.
Show resolved Hide resolved
}

impl RetryableError for GroupError {
Expand Down
4 changes: 3 additions & 1 deletion xmtp_mls/src/groups/sync.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,9 @@ impl MlsGroup {
num_attempts += 1;
}

Err(last_err.unwrap_or(GroupError::Generic("failed to wait for intent".to_string())))
Err(last_err.unwrap_or(GroupError::SyncFailedToWait(
insipx marked this conversation as resolved.
Show resolved Hide resolved
"failed to wait for intent".to_string(),
insipx marked this conversation as resolved.
Show resolved Hide resolved
)))
insipx marked this conversation as resolved.
Show resolved Hide resolved
}

fn is_valid_epoch(
Expand Down
39 changes: 31 additions & 8 deletions xmtp_mls/src/identity_updates.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
use std::collections::{HashMap, HashSet};

use crate::{retry::RetryableError, retryable, storage::association_state::StoredAssociationState};
use crate::{
retry::{Retry, RetryableError},
retry_async, retryable,
storage::association_state::StoredAssociationState,
};
use prost::Message;
use thiserror::Error;
use xmtp_id::associations::{
Expand Down Expand Up @@ -239,9 +243,13 @@ where
wallets_to_revoke: Vec<String>,
) -> Result<SignatureRequest, ClientError> {
let inbox_id = self.inbox_id();
let current_state = self
.get_association_state(&self.store().conn()?, &inbox_id, None)
.await?;
let current_state = retry_async!(
Retry::default(),
(async {
self.get_association_state(&self.store().conn()?, &inbox_id, None)
.await
})
)?;
let mut builder = SignatureRequestBuilder::new(inbox_id);

for wallet in wallets_to_revoke {
Expand All @@ -259,9 +267,14 @@ where
installation_ids: Vec<Vec<u8>>,
) -> Result<SignatureRequest, ClientError> {
let inbox_id = self.inbox_id();
let current_state = self
.get_association_state(&self.store().conn()?, &inbox_id, None)
.await?;

let current_state = retry_async!(
Retry::default(),
(async {
self.get_association_state(&self.store().conn()?, &inbox_id, None)
.await
})
)?;
Comment on lines +270 to +277
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover the case converse is hitting when the database is trying to reconnect after getting a signature from a different app for revoking installations.


let mut builder = SignatureRequestBuilder::new(inbox_id);

Expand Down Expand Up @@ -291,7 +304,17 @@ where
.await?;

// Load the identity updates for the inbox so that we have a record in our DB
load_identity_updates(&self.api_client, &self.store().conn()?, vec![inbox_id]).await?;
retry_async!(
Retry::default(),
(async {
load_identity_updates(
&self.api_client,
&self.store().conn()?,
vec![inbox_id.clone()],
)
.await
})
)?;

Ok(())
}
Expand Down
1 change: 1 addition & 0 deletions xmtp_mls/src/storage/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ impl RetryableError for StorageError {
Self::Pool(_) => true,
Self::Lock(_) => true,
Self::SqlCipherNotLoaded => true,
Self::PoolNeedsConnection => true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the error will retry 5 times before erroring. That might be okay? I also took a pass at just doing it in the code and retrying 3 times here 45adbfe open to others thoughts on how we should handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely feels like it should be retryable. Thanks for fixing

_ => false,
}
}
Expand Down