-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update maintenance check to passing before removing #11332
base: main
Are you sure you want to change the base?
Conversation
7471743
to
e03677c
Compare
@@ -3475,6 +3475,9 @@ func (a *Agent) DisableServiceMaintenance(serviceID structs.ServiceID) error { | |||
return nil | |||
} | |||
|
|||
// Update check to trigger an event for watchers | |||
a.State.UpdateCheck(checkID, api.HealthPassing, "") | |||
a.State.SyncChanges() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question to reviewer: should SyncChanges
or SyncFull
be used here? This is to ensure the that watch events are fired before the check is removed.
(See thread for more context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakubgs: I'd wait to address this until you get review feedback from engineering, but as a part of those changes...
I recommend adding a comment explaining why SyncChanges
is necessary, because it's non-obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
e03677c
to
875ab2a
Compare
Any feedback? |
Hi @jakubgs, The engineering team is taking a look at your questions from the description (e.g., any guidance on how to add a test). Additionally, they may consider whether this "update to passing before check removal" change should be done in a more central location. Doing so would affect other check types, which have benefits and/or unintended side effects. I'll get back to you at the end of this week if you haven't heard from us already by then. |
Hi @jakubgs, I think you can build a test based on those tests Let me know if you have questions or need more details on how to test this. |
Yes, that's exactly what I'm asking how to do. How can I possibly check the state of a check that was removed? This is not trivial. |
@jakubgs Yes a good point, let me dig a bit to see if we can intercept the update somehow, or if we can test at a lower level by mocking the update. |
If this fix is related to |
Yes, considering the purpose is to have |
I tried something like this: diff --git a/agent/agent_test.go b/agent/agent_test.go
index 6638fa35c..d2da02ef5 100644
--- a/agent/agent_test.go
+++ b/agent/agent_test.go
@@ -3355,9 +3355,17 @@ func TestAgent_NodeMaintenanceMode(t *testing.T) {
t.Fatalf("bad: %#v", check)
}
+ nodeMaintCheck := a.State.Check(structs.NodeMaintCheckID)
+
+ // Ensure the check state was updated
+ require.Equal(t, api.HealthCritical, nodeMaintCheck.Status)
+
// Leave maintenance mode
a.DisableNodeMaintenance()
+ // Ensure the check state was updated
+ require.Equal(t, api.HealthPassing, nodeMaintCheck.Status)
+
// Ensure the check was deregistered
requireCheckMissing(t, a, structs.NodeMaint) But it fails, since the state is unchanged:
Which I guess means I'm either checking too soon, or the instance I'm checking is outdated. |
I also tried moving the second check after |
It appears the Lines 704 to 707 in cc2abb7
Which appears to be set here to a.sync.SyncChanges.Trigger :Lines 558 to 563 in 3666401
Which appears to be some kind of internal dark magic for service/client syncing. |
Lines 76 to 78 in 3666401
Lines 3 to 19 in 3666401
Yeah, I really don't know what I'm looking at. |
One notable thing is that for the Lines 3444 to 3445 in 3666401
And while normal Agent defines its TriggerSyncChanges :Line 563 in 3666401
The TestAgent defined no such thing in its Start function:Lines 134 to 136 in 3666401
So I'm not sure how this tracking of events is even supposed to work. |
I tried adding some prints in Lines 660 to 670 in 3666401
Which means we can't really monitor a copy previously fetched with a.State.Check() .
|
I'm confused, where in the definition of Lines 38 to 44 in 3666401
I can't find it. |
Yeah. I don't get it, how exactly is |
@dhiaayachi can you explain to me how exactly |
Hi @jakubgs, thanks for working on these tests! The I looked at the watch command, and I found the two API queries it is using here: https://github.com/hashicorp/consul/blob/v1.10.4/api/watch/funcs.go#L197-L199. I'm not sure which one of those two we need to use , but they should both work about the same. Since we are testing To test the "watch" behaviour I think we'll probably need to use at least one goroutine. We can start a blocking query in a goroutine using That should block until the update is received. Then in the main test goroutine we can check to see if the event is received within some time window (probably 100ms is plenty of time to wait). I found an example of something like that here: I hope this information helps. Please do continue to ask questions if anything is not clear. We'll do our best to get back to you promptly, but some times there may be a few days lag. |
Hey @jakubgs, Was the information given above able to help out? As Daniel said we're happy to answer any additional questions |
I just came back from a Christmas break, and this is not high priority for me, more like nice-to-have. Thanks for the help. I'll get to it when I'll get to it. |
113628c
to
02c65d6
Compare
Thanks for the input @jakubgs For the reviewers : I still need feedback on the approach in the commit here -> 02c65d6 cc @blake @zalimeni @dhiaayachi @dnephin @Amier3 @rboyer next steps : I'll add few failed attempts at writing tests and maybe we can find a way to prove our change actually works. Thank you! |
2337e2b
to
0e910fa
Compare
0e910fa
to
2365f0d
Compare
Can we please get some feedback on this PR ? Thank you |
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions. |
It's not inactive, we are waiting for a review from the team. |
2365f0d
to
4de87e1
Compare
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions. |
It's not inactive because of us, we're waiting for a review. |
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions. |
4de87e1
to
4e80718
Compare
I've just rebased this PR on latest main, This is not stale we are waiting on review form the team at hashicorp. |
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions. |
Without this update service using `consul watch` to monitor check changes will never see maintenance mode being disabled. Resolves: hashicorp#11330 Signed-off-by: Jakub Sokołowski <[email protected]>
Instead of deregistering the check on disabling maintenance mode It seemed better to just update its status as passing. This makes it easier to know when maintenance mode was disabled.
This reverts commit 02c65d6.
4e80718
to
ba79811
Compare
Without this update service using
consul watch
to monitor check changes will never see maintenance mode being disabled.Resolves: #11330
Question for reviewer:
SyncChanges
orSyncFull
be used to force the health check update event to fire before the health check is removed? (And are there any negative repercussions of that?)