-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added all_down flag #52
base: master
Are you sure you want to change the base?
Conversation
Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state. Proper error message will be appearing in the error log.
Looks like TEST 11 is failing because there is only one peer in upstream foo.com and it is actually down. So, if this request has to be accepted I think test can be modified to expect status 'up' instead of 'DOWN'. |
Switched expected status from DOWN to up.
lib/resty/upstream/healthcheck.lua
Outdated
@@ -134,7 +138,15 @@ local function peer_fail(ctx, is_backup, id, peer) | |||
-- print("ctx fall: ", ctx.fall, ", peer down: ", peer.down, | |||
-- ", fails: ", fails) | |||
|
|||
if not peer.down and fails >= ctx.fall then | |||
local u_key = gen_upstream_key(u,"all_down") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: needs a space after the comma.
Also, it is a bit sad to introduce a new string concat and creation operation here. Given that upstream should be fixed, can we recycle this key string instead of trying to generate a new one every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's a bit confusing to call the "all_down" key an "upstream key".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed style issues.
gen_upstream_key() is used 5 more times exactly like this. It does simple concatenation. So, code looks cleaner with it. I made it with the same idea as already existing gen_peer_key() . Just use suffix instead of prefix.
Not sure how to call this key then. Maybe, all_peers_down ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed gen_upstream_key() from peer_fail(). Instead added extra arg all_peers_down_key.
Since peer_fail() is called only once from peer_error() function and we already have this key in peer_error().
Please, check if you find this better?
Now peer_fail() and peer_ok() have different number of arguments, which doesn't look very nice I think.
Since this key is global for an upstream it is possible to call gen_upstream_key() once in do_check(). But, then we need to extend all the following functions (check_peers(), check_peer() .. ) with this extra arg. Not sure, what is better.
lib/resty/upstream/healthcheck.lua
Outdated
|
||
local all_down, err = dict:get(u_key) | ||
if not all_down then | ||
errlog("Failed to get all down flag for upstream " .. u .. ". ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not use upper case letter at the beginning of our error messages. Also, do we really want this to be fatal? The health check service would be down when the shm zone is running out of storage, which does not look very graceful to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed style issues.
Agree. In theory we can never put peer into down state. Removed return
Also put return nil into initialization of this key into spawn_checker()
lib/resty/upstream/healthcheck.lua
Outdated
return | ||
end | ||
|
||
if not peer.down and fails >= ctx.fall and all_down ~= 1 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the all_down
flag should be a boolean instead of a number. The lua shared dict does support boolean typed values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thought about this. But, ngx.shared.DICT.get returns nil if key does not exists. So, all checks like "if not key" after each dict:get() have to be adapted then. That's why I chose {0,1}
I mean that "if not key" will be true for both false and nill values.
lib/resty/upstream/healthcheck.lua
Outdated
@@ -193,11 +205,40 @@ local function peer_ok(ctx, is_backup, id, peer) | |||
peer.down = nil | |||
set_peer_down_globally(ctx, is_backup, id, nil) | |||
end | |||
|
|||
local u_key = gen_upstream_key(u,"all_down") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always add a space after commas (,
). There are other places having the same style issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
lib/resty/upstream/healthcheck.lua
Outdated
end | ||
|
||
if all_down == 0 then | ||
local result = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use more meaningful name for this variable. "result" carries no information about the meaning of its true
or false
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "is_all_peers_down"
lib/resty/upstream/healthcheck.lua
Outdated
|
||
for i = 1, #ppeers do | ||
set_peer_down_globally(ctx, false, ppeers[i].id, nil) | ||
--Flush local cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use a space after --
and avoid using upper case letter at the beginning of a comment text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
lib/resty/upstream/healthcheck.lua
Outdated
@@ -435,6 +474,49 @@ local function get_lock(ctx) | |||
return true | |||
end | |||
|
|||
local function set_all_down_flag(ctx, ppeers, bpeers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more comment to this function since I have difficulties understanding what this function tries to do. It definitely does WAY more magic than simply setting an "all down" flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added brief description of the function.
lib/resty/upstream/healthcheck.lua
Outdated
|
||
for j = 1, #bpeers do | ||
result = result and bpeers[j].down | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain a counter for the number of up'd nodes so that we do not have to scan all the peers' state upon every checking operation? The current way looks too much expensive to me. An O(1)
operation now becomes an O(n)
operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the status of any peer can change after any check. Here we really need to go through all of them.
The only one improvement I can see here is to stop when we found the first 'up' peer.
Added break condition into both loops.
I think it doesn't change the theoretical complexity, but of course in normal situation ( when all peers are up) it will be O(1).
Added break condition into its loops.
Now it gets there as an extra arg from peer_error().
Any chance that you would evaluate the changes ? ) |
Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state instead.
Proper error message will be appearing in the error log.
Related to the Issue #51