-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: fix new watcher to get both old good update and nack error (if exist) from the cache #7851
xdsclient: fix new watcher to get both old good update and nack error (if exist) from the cache #7851
Conversation
a3c89bb
to
7e4c73d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7851 +/- ##
==========================================
+ Coverage 81.74% 81.89% +0.15%
==========================================
Files 375 375
Lines 37980 37986 +6
==========================================
+ Hits 31045 31110 +65
+ Misses 5622 5581 -41
+ Partials 1313 1295 -18
|
30597fe
to
d2aae7d
Compare
d2aae7d
to
5da90a2
Compare
a9ee1a6
to
f57c599
Compare
I think this might be worth release noting. |
|
||
// Register another watch for the same resource. This should get the update | ||
// and error from the cache. | ||
lw2 := newListenerWatcherMultiple(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this new listener watcher type? Why can't we handle this case with the existing listenerWatcher
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current listenerWatcher
has channel size of 1 and notifications gets replaced. For this fix we need both good update and error as 2 different notifications to be verified so we need a channel with buffer size > 1. But yeah we don't need a new listenerWatcher
struct, we can just modify the current one to have another constructor which accept the size and update OnError to not replace if size > 1, which is what I did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the way i used existing listenerWatcher struct, it was missing resource update during race test. I have added the separate struct back for handling variable size update channel and the race went away. Didn't get a chance to fully debug why it was happening. But may be its fine to have separate struct to hold multiple updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct way would be to change the listenerWatcher
to have multiple channels: one each for update, error and resource not found. That way one callback will not interfere with another callback. But this change would touch a lot of tests.
I wanted to do this change when I was working on some of the refactors recently, but never got around to doing that. I would recommend making that change in a separate PR though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah i think i can send a separate PR for that. The idea of having 3 channels for each callback is a good idea. Should this fix be blocked for that though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be Ok if we create an issue for the same and add a TODO in here to remove this new listener watcher type once that issue is taken care of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue #7864. It should be simple as well. Added TODO for the new watcher in this PR.
Actually, looks like there is some race condition in the test but feel free to review. Debugging the race condition. |
d4bf100
to
8ed2a3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still missing a test case for the scenario where the first response from the management server is NACKed, and a second watcher is registered for that same resource (and we expect the error callback to be invoked on that watcher). You could enhance the existing TestLDSWatch_NACKError
test for this.
t.Fatalf("timeout when waiting for a listener resource from the management server: %v", err) | ||
} | ||
gotErr = u.(listenerUpdateErrTuple).err | ||
if gotErr == nil || !strings.Contains(gotErr.Error(), wantListenerNACKErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error type in the xdsresource
package provides a way to get to the underlying error type. See: https://github.com/grpc/grpc-go/blob/master/xds/internal/xdsclient/xdsresource/errors.go#L61
I think we should use that instead of string comparison.
I agree existing tests are not doing that. But that is not a good enough reason to make new code not do that. In fact, it would be wonderful in existing code could be cleaned up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for NACK error of no routing, we don't have any specific error type. So, type will always be unknown. That's why we are verifying the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a new type for NACK errors? If you think it makes sense to do that, we should file an issue to track it and eventually get to it. Might be good to get to (and fixing tests) before making the client public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue #7863. I think it should be simple change as i described in the issue to set the NACK error type while decoding. Though it still can be separate PR because we will have to update all the tests.
|
||
// Register another watch for the same resource. This should get the update | ||
// and error from the cache. | ||
lw2 := newListenerWatcherMultiple(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct way would be to change the listenerWatcher
to have multiple channels: one each for update, error and resource not found. That way one callback will not interfere with another callback. But this change would touch a lot of tests.
I wanted to do this change when I was working on some of the refactors recently, but never got around to doing that. I would recommend making that change in a separate PR though. What do you think?
I had done the same #7851 (comment) and this is the change https://github.com/grpc/grpc-go/pull/7851/files#diff-33ea1a548fc69853905a83ab6f29daba065a266e4145ff1db859e74ca8064ad3R939. Did i miss anything? |
1d8bbe7
to
df88c19
Compare
if err != nil { | ||
t.Fatalf("Timeout when waiting for a listener resource from the management server: %v", err) | ||
} | ||
gotErr := u.(listenerUpdateErrTuple).err | ||
if gotErr == nil || !strings.Contains(gotErr.Error(), wantListenerNACKErr) { | ||
t.Fatalf("Update received with error: %v, want %q", gotErr, wantListenerNACKErr) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement a helper for this?
func verifyListenerError(ctx context.Context, updateCh *testutils.Channel, wantErr string) error {
u, err := updateCh.Receive(ctx)
if err != nil {
return fmt.Errorf("timeout when waiting for a listener update from the management server: %v", err)
}
gotErr := u.(listenerUpdateErrTuple).err
if gotErr == nil || !strings.Contains(gotErr.Error(), wantErr) {
return fmt.Errorf("update received with error: %v, want %q", gotErr, wantErr)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Though we need the same helper for all other resource types as well. Will send a separate PR.
@@ -94,6 +94,8 @@ type listenerWatcherMultiple struct { | |||
updateCh *testutils.Channel | |||
} | |||
|
|||
// TODO: delete this once `newListenerWatcher` is modified to handle multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please link the issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…r, add cache check
86a241c
to
742da1b
Compare
Addresses: #7819
If a watch is registered for a listener resource which is already present in the cache with an old good update as well latest NACK error, the new watcher should receive both good update and error, without a new resource request being sent to the management server.
RELEASE NOTES: