-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-20867 Implement retry topic to handle persistent transient RPC Client errors #6385
Conversation
…-20867/retry-topic
…id replay logic and set request id correctly
…ct when not to save.
Jenkins build for PR 6385 build 14 Build Successful: |
...saging/src/main/kotlin/net/corda/messaging/api/mediator/config/EventMediatorConfigBuilder.kt
Outdated
Show resolved
Hide resolved
...ging/messaging-impl/src/main/kotlin/net/corda/messaging/mediator/processor/EventProcessor.kt
Outdated
Show resolved
Hide resolved
...ow-service/src/main/kotlin/net/corda/flow/messaging/mediator/FlowEventMediatorFactoryImpl.kt
Show resolved
Hide resolved
Quality Gate passedIssues Measures |
Design: https://github.com/corda/platform-eng-design/pull/658
API: corda/corda-api#1710
The current mediator messaging pattern in Corda can encounter an retry loop when transient errors are received from other Corda workers. This retry loop blocks flow topic partitions from progressing and it has been observed that the corda cluster affected can become permanently unstable due to the effects of consumer lag. This pattern is used by the flow worker to perform synchronous HTTP calls to various workers, including verification, token, crypto, uniqueness, and persistence workers.
To address this issue, a separate Kafka topic is dedicated to handling retries. This will allow the primary ingestion topics to continue processing unaffected flows, while introducing finite retry logic for flows impacted by transient errors.
Additionally AVRO version is bumped to fix a vulnerability