-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-18532: Retry Transient Errors in Flow Engine #5337
Conversation
Jenkins build for PR 5337 build 21 Build Successful: |
db9781d
to
19bd760
Compare
Building E2E Tests on PR-5337 |
libs/utilities/src/main/kotlin/net/corda/utilities/retry/RetryUtils.kt
Outdated
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/BackoffStrategy.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bunch of comments, which need not blocking merging so if you want to disregard them I can approve anyway.
libs/utilities/src/main/kotlin/net/corda/utilities/retry/BackoffStrategy.kt
Outdated
Show resolved
Hide resolved
libs/crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/BackoffStrategy.kt
Outdated
Show resolved
Hide resolved
libs/crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoBackoffStrategy.kt
Outdated
Show resolved
Hide resolved
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/BackoffStrategy.kt
Show resolved
Hide resolved
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/RetryUtils.kt
Outdated
Show resolved
Hide resolved
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Outdated
Show resolved
Hide resolved
libs/crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoBackoffStrategy.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. The crypto bits look good to me in principle.
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/RetryUtils.kt
Outdated
Show resolved
Hide resolved
libs/crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoBackoffStrategy.kt
Outdated
Show resolved
Hide resolved
libs/crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoBackoffStrategy.kt
Outdated
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/RetryUtils.kt
Outdated
Show resolved
Hide resolved
libs/utilities/src/main/kotlin/net/corda/utilities/retry/BackoffStrategy.kt
Show resolved
Hide resolved
Did you consider using an off-the-shelf library like Resilience4J rather than rolling our own? (not saying we should, but we should evaluate pros/cons). Also, I thought we'd look at moving away from crypto doing its own bespoke thing and consolidate how we do retries etc with the RPC pattern? |
I did at the beginning but our use case is quite simplistic at the moment, adding
Not sure what the background context is here, I've simply created a retry utility to be shared across the code base and refactored the existing crypto retry mechanism to use the new one. Any other work to replace by RPC pattern should likely be its own epic/task, not included within this |
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Show resolved
Hide resolved
.../crypto/crypto-impl/src/main/kotlin/net/corda/crypto/impl/retrying/CryptoRetryingExecutor.kt
Show resolved
Hide resolved
Yeah I don't remember us discussing or agreeing anything. But as I'm working on handling transient errors for external processors the crypto retry logic breaks the mould. If @dickon agrees we could add a story to remove crypto retry handling and fall back on the RPC client's retry logic, the epic is here. |
Instead of automatically publishing the original event back to the "flow.event" topic whenever a transient exception occurs when executing the flow pipeline, automatically retry the exceptions using an exponential backoff retry mechanism and permanently fail the flow if the configured time or attemps is reached. - Create internal retry utility to manage retries in an automated fashion with plugable backoff strategy (constant, linear and exponential provided out of the box). - Replace existing utilities in the crypto components and tests with the new one and delete unused classes. - Update FlowEventProcessor to automatically retry transient exceptions through the new retry utility using an exponential backoff with a growth factor of 250ms. - Remove unnecessary code and update tests to accommodate for the new internal retry.
…cs, Extra Paramter to Recoverable Function
36207ba
to
1256290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (for crypto)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look fine to me.
...nts/flow/flow-service/src/main/kotlin/net/corda/flow/pipeline/impl/FlowEventProcessorImpl.kt
Show resolved
Hide resolved
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Instead of automatically publishing the original event back to the
"flow.event" topic whenever a transient exception occurs when executing
the flow pipeline, automatically retry the exceptions using an
exponential backoff retry mechanism and permanently fail the flow if
the configured time or attemps is reached.
fashion with plugable backoff strategy (constant, linear and
exponential provided out of the box).
the new one and delete unused classes.
through the new retry utility using an exponential backoff with a
growth factor of 250ms.
internal retry.