-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add new retry automated tests #1505
Conversation
7dc74f3
to
5497145
Compare
…n different test suites
681e0fd
to
87375f5
Compare
It looks like the test middleware is still being referenced explicitly in the cache data client. This really needs to be resolved before we can merge this PR. |
packages/client-sdk-nodejs/Makefile
Outdated
test-full-network-outage-retry-service: | ||
@echo "Testing full network outage retry service..." | ||
@npm run integration-test-retry-full-network-outage | ||
|
||
test-temporary-network-outage-retry-service: | ||
@echo "Testing temporary network outage retry service..." | ||
@npm run integration-test-retry-temporary-network-outage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need separate targets for these? Also the -service
suffix is to distinguish services in the canaries, eg leaderboard service vs cache vs topics. You can omit that unless this is something we're going to carry over to the canaries.
"integration-test-retry-full-network-outage": "jest retry/automated-retry-full-network-outage.test.ts --maxWorkers 1 -- useConsistentReads", | ||
"integration-test-retry-temporary-network-outage": "jest retry/automated-retry-temporary-network-outage.test.ts --maxWorkers 1 -- useConsistentReads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pull these out vs include as part of the respective services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can include these as part of cache service in future. As of now, the docker image for momento-local is not published limiting us to include these from running in the ci and hence the cache service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I can get rid of the scripts from the package.json to eliminate the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests will always be run separately from the other integration tests that run against the live service though. I think separate test targets makes sense so they don't accidentally run as part of the canaries or something, but we will eventually run them in github ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were planning on running our integration tests using momento local too in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the existing cache integration tests against momento-local will be a separate change the team can decide on later. And I think even then there will be some component of CI/CD that runs them against alpha.
But for the automated retry tests, they will only ever run against momento-local in CI, and in that case I think it would be useful to have separate Makefile and/or package.json test targets for starting them.
Example github actions workflows:
// on push to main, against alpha
make test-cache-service
// on pull request
make test-cache-service (could be against momento-local or alpha)
make test-retries-full-outage (against momento-local WITHOUT error count CLI arg)
make test-retries-temporary-outage (against momento-local WITH error count CLI arg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am adding the targets back. We can discuss this synchronously if needed?
@@ -35,7 +35,7 @@ export class FixedCountRetryStrategy implements RetryStrategy { | |||
// null means do not retry | |||
return null; | |||
} | |||
if (props.attemptNumber > this.maxAttempts) { | |||
if (props.attemptNumber >= this.maxAttempts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was maxAttempts
intended to include the initial request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me and Anita tried running the tests such that max retries are 3, but it always retried 4 times which was confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think initial request is included in retry count. I have some sample output from testing a toy program against momento-local which delayed get requests for 500ms before returning an error, and I observed ~2500ms total indicating 5 get requests, the initial one + the observed 4 (when it should've been an observed 3 retries).
logs copied here for reference
Client debug logs from above example:
Key stored successfully!
Time taken for set: 3ms
[2025-01-14T18:11:54.342Z] DEBUG (Momento: FixedCountRetryStrategy): Determining whether request is eligible for retry; status code: 14, request type: /cache_client.Scs/Get, attemptNumber: 0, maxAttempts: 3
[2025-01-14T18:11:54.342Z] DEBUG (Momento: FixedCountRetryStrategy): Request is eligible for retry (attempt 0 of 3, retrying immediately.
[2025-01-14T18:11:54.342Z] DEBUG (Momento: RetryInterceptor): Request eligible for retry: path: /cache_client.Scs/Get; response status code: 14; number of attempts (1); will retry in 0ms
[2025-01-14T18:11:54.344Z] DEBUG (Momento: RetryInterceptor): Retrying request: path: /cache_client.Scs/Get; deadline was: 2025-01-14T18:11:58.838Z, overall deadline is: 2025-01-14T18:11:58.838Z
[2025-01-14T18:11:54.344Z] DEBUG (Momento: RetryInterceptor): Setting next deadline (via offset of 5000 ms) to: 2025-01-14T18:11:59.344Z
[2025-01-14T18:11:54.848Z] DEBUG (Momento: FixedCountRetryStrategy): Determining whether request is eligible for retry; status code: 14, request type: /cache_client.Scs/Get, attemptNumber: 1, maxAttempts: 3
[2025-01-14T18:11:54.848Z] DEBUG (Momento: FixedCountRetryStrategy): Request is eligible for retry (attempt 1 of 3, retrying immediately.
[2025-01-14T18:11:54.848Z] DEBUG (Momento: RetryInterceptor): Request eligible for retry: path: /cache_client.Scs/Get; response status code: 14; number of attempts (2); will retry in 0ms
[2025-01-14T18:11:54.850Z] DEBUG (Momento: RetryInterceptor): Retrying request: path: /cache_client.Scs/Get; deadline was: 2025-01-14T18:11:59.344Z, overall deadline is: 2025-01-14T18:11:58.838Z
[2025-01-14T18:11:54.850Z] DEBUG (Momento: RetryInterceptor): Setting next deadline (via offset of 5000 ms) to: 2025-01-14T18:11:59.850Z
[2025-01-14T18:11:55.358Z] DEBUG (Momento: FixedCountRetryStrategy): Determining whether request is eligible for retry; status code: 14, request type: /cache_client.Scs/Get, attemptNumber: 2, maxAttempts: 3
[2025-01-14T18:11:55.358Z] DEBUG (Momento: FixedCountRetryStrategy): Request is eligible for retry (attempt 2 of 3, retrying immediately.
[2025-01-14T18:11:55.358Z] DEBUG (Momento: RetryInterceptor): Request eligible for retry: path: /cache_client.Scs/Get; response status code: 14; number of attempts (3); will retry in 0ms
[2025-01-14T18:11:55.359Z] DEBUG (Momento: RetryInterceptor): Retrying request: path: /cache_client.Scs/Get; deadline was: 2025-01-14T18:11:59.850Z, overall deadline is: 2025-01-14T18:11:58.838Z
[2025-01-14T18:11:55.359Z] DEBUG (Momento: RetryInterceptor): Setting next deadline (via offset of 5000 ms) to: 2025-01-14T18:12:00.359Z
[2025-01-14T18:11:55.868Z] DEBUG (Momento: FixedCountRetryStrategy): Determining whether request is eligible for retry; status code: 14, request type: /cache_client.Scs/Get, attemptNumber: 3, maxAttempts: 3
[2025-01-14T18:11:55.868Z] DEBUG (Momento: FixedCountRetryStrategy): Request is eligible for retry (attempt 3 of 3, retrying immediately.
[2025-01-14T18:11:55.868Z] DEBUG (Momento: RetryInterceptor): Request eligible for retry: path: /cache_client.Scs/Get; response status code: 14; number of attempts (4); will retry in 0ms
[2025-01-14T18:11:55.869Z] DEBUG (Momento: RetryInterceptor): Retrying request: path: /cache_client.Scs/Get; deadline was: 2025-01-14T18:12:00.359Z, overall deadline is: 2025-01-14T18:11:58.838Z
[2025-01-14T18:11:55.869Z] DEBUG (Momento: RetryInterceptor): Setting next deadline (via offset of 5000 ms) to: 2025-01-14T18:12:00.869Z
[2025-01-14T18:11:56.376Z] DEBUG (Momento: FixedCountRetryStrategy): Determining whether request is eligible for retry; status code: 14, request type: /cache_client.Scs/Get, attemptNumber: 4, maxAttempts: 3
[2025-01-14T18:11:56.376Z] DEBUG (Momento: FixedCountRetryStrategy): Exceeded max attempt count (3)
[2025-01-14T18:11:56.376Z] DEBUG (Momento: RetryInterceptor): Request not eligible for retry: path: /cache_client.Scs/Get; retryable status code: 14; number of attempts (4).
Error: The server was unable to handle the request; consider retrying. If the error persists, please contact us at [email protected]: 14 UNAVAILABLE: Unavailable
Time taken for get: 2542ms
Finished example!
packages/client-sdk-nodejs/test/integration/retry/automated-retry-full-network-outage.test.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good overall! just a couple of requests
...es/client-sdk-nodejs/test/integration/retry/automated-retry-temporary-network-outage.test.ts
Outdated
Show resolved
Hide resolved
...es/client-sdk-nodejs/test/integration/retry/automated-retry-temporary-network-outage.test.ts
Outdated
Show resolved
Hide resolved
…use that in retry tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
would be good to get @pgautier404 and/or @malandis to sign off on the shouldLoadLate
bool changes too though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks! I mostly concentrated on the middleware loading portion of the code, so I'll defer to the other reviewers on the test setup and the tests themselves.
PR Description:
Testing
1. Full-network outage:
Run
momento-local
on this branch using the following params:and
Run the test using the following command:
2. Temporary-network outage:
Run
momento-local
on this branch using the following params:and
Run the test using the following command:
Issue
closes https://github.com/momentohq/dev-eco-issue-tracker/issues/1093