Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added all_down flag #52

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

eglinux
Copy link

@eglinux eglinux commented Feb 26, 2018

Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state instead.
Proper error message will be appearing in the error log.

Related to the Issue #51

Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state.
Proper error message will be appearing in the error log.
@eglinux
Copy link
Author

eglinux commented Feb 26, 2018

Looks like TEST 11 is failing because there is only one peer in upstream foo.com and it is actually down.

So, if this request has to be accepted I think test can be modified to expect status 'up' instead of 'DOWN'.
Since it is the point to achieve this.

Switched expected status from DOWN to up.
@@ -134,7 +138,15 @@ local function peer_fail(ctx, is_backup, id, peer)
-- print("ctx fall: ", ctx.fall, ", peer down: ", peer.down,
-- ", fails: ", fails)

if not peer.down and fails >= ctx.fall then
local u_key = gen_upstream_key(u,"all_down")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: needs a space after the comma.

Also, it is a bit sad to introduce a new string concat and creation operation here. Given that upstream should be fixed, can we recycle this key string instead of trying to generate a new one every time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's a bit confusing to call the "all_down" key an "upstream key".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed style issues.

gen_upstream_key() is used 5 more times exactly like this. It does simple concatenation. So, code looks cleaner with it. I made it with the same idea as already existing gen_peer_key() . Just use suffix instead of prefix.

Not sure how to call this key then. Maybe, all_peers_down ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed gen_upstream_key() from peer_fail(). Instead added extra arg all_peers_down_key.
Since peer_fail() is called only once from peer_error() function and we already have this key in peer_error().
Please, check if you find this better?

Now peer_fail() and peer_ok() have different number of arguments, which doesn't look very nice I think.

Since this key is global for an upstream it is possible to call gen_upstream_key() once in do_check(). But, then we need to extend all the following functions (check_peers(), check_peer() .. ) with this extra arg. Not sure, what is better.


local all_down, err = dict:get(u_key)
if not all_down then
errlog("Failed to get all down flag for upstream " .. u .. ". ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not use upper case letter at the beginning of our error messages. Also, do we really want this to be fatal? The health check service would be down when the shm zone is running out of storage, which does not look very graceful to me :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed style issues.

Agree. In theory we can never put peer into down state. Removed return

Also put return nil into initialization of this key into spawn_checker()

return
end

if not peer.down and fails >= ctx.fall and all_down ~= 1 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the all_down flag should be a boolean instead of a number. The lua shared dict does support boolean typed values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thought about this. But, ngx.shared.DICT.get returns nil if key does not exists. So, all checks like "if not key" after each dict:get() have to be adapted then. That's why I chose {0,1}

I mean that "if not key" will be true for both false and nill values.

@@ -193,11 +205,40 @@ local function peer_ok(ctx, is_backup, id, peer)
peer.down = nil
set_peer_down_globally(ctx, is_backup, id, nil)
end

local u_key = gen_upstream_key(u,"all_down")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always add a space after commas (,). There are other places having the same style issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

end

if all_down == 0 then
local result = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use more meaningful name for this variable. "result" carries no information about the meaning of its true or false values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "is_all_peers_down"


for i = 1, #ppeers do
set_peer_down_globally(ctx, false, ppeers[i].id, nil)
--Flush local cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always use a space after -- and avoid using upper case letter at the beginning of a comment text.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -435,6 +474,49 @@ local function get_lock(ctx)
return true
end

local function set_all_down_flag(ctx, ppeers, bpeers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comment to this function since I have difficulties understanding what this function tries to do. It definitely does WAY more magic than simply setting an "all down" flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added brief description of the function.


for j = 1, #bpeers do
result = result and bpeers[j].down
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maintain a counter for the number of up'd nodes so that we do not have to scan all the peers' state upon every checking operation? The current way looks too much expensive to me. An O(1) operation now becomes an O(n) operation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the status of any peer can change after any check. Here we really need to go through all of them.
The only one improvement I can see here is to stop when we found the first 'up' peer.
Added break condition into both loops.
I think it doesn't change the theoretical complexity, but of course in normal situation ( when all peers are up) it will be O(1).

@eglinux
Copy link
Author

eglinux commented Mar 19, 2018

Any chance that you would evaluate the changes ? )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants