-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional logs for the grpc conn. #1319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the timeout isn't too small, but let's. test the PR image on our Botkube environments to see if that helps for a longer period of time 👍
Also, consider also adding https://grpc.io/docs/guides/deadlines/ to the server-side too if that makes sense 👍
As a summary;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising!
if !lastFailureTimestamp.IsZero() && time.Since(lastFailureTimestamp) >= successIntervalDuration { | ||
// if the last run was long enough, we treat is as success, so we reset failures | ||
log.Infof("Resetting failures counter as last failure was more than %s ago", successIntervalDuration) | ||
b.failuresNo = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also reset the failure msg here, or it doesn't make sense?
}, | ||
retry.OnRetry(func(_ uint, err error) { | ||
log.Warnf("Retrying Cloud Teams startup (attempt no %d/%d): %s", b.failuresNo, maxRetries, err) | ||
}), | ||
retry.Delay(retryDelay), | ||
retry.DelayType(resettableBackoff), | ||
retry.Attempts(0), // infinite, we cancel that by our own | ||
retry.LastErrorOnly(true), | ||
retry.Context(ctx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add something similar here to
b.failuresNo = 0 // Reset the failures to start exponential back-off from the beginning
b.setFailureReason("")
b.log.Info("Botkube connected to Slack!")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get the point here, could you please elaborate a bit more? My bad, yes now I get it, I will modify teams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it wasnt the actual reason, imho it would be good to add the same what we have in cloud teams:
ctxTimeout, cancelFn := context.WithTimeout(ctx, cloudTeamsConnectTimeout)
defer cancelFn()
err = svc.Start(ctxTimeout)
if err != nil {
return fmt.Errorf("while starting gRPC connector %w", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I should remove this from teams too, I was testing slack first. So the problem with ctxTimeout, it cancels the existing streaming connection event after the start is initiated. Assume, I provided 15 secs timeout, connection is created successfully, but during the streaming it just cancelled since it is not only used in connection, it is also used in grpc operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, ok, makes sense 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on the code review and testing on dev 👍
I think we can merge this PR, run Botkube from latest main image for all our envs and observe behavior 👍
0a42954
to
e3545a3
Compare
Description
Changes proposed in this pull request:
Testing
Related issue(s)