Added all_down flag #52

eglinux · 2018-02-26T10:09:24Z

Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state instead.
Proper error message will be appearing in the error log.

Related to the Issue #51

Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state. Proper error message will be appearing in the error log.

eglinux · 2018-02-26T10:51:09Z

Looks like TEST 11 is failing because there is only one peer in upstream foo.com and it is actually down.

So, if this request has to be accepted I think test can be modified to expect status 'up' instead of 'DOWN'.
Since it is the point to achieve this.

Switched expected status from DOWN to up.

agentzh · 2018-02-26T21:19:22Z

lib/resty/upstream/healthcheck.lua

@@ -134,7 +138,15 @@ local function peer_fail(ctx, is_backup, id, peer)
    -- print("ctx fall: ", ctx.fall, ", peer down: ", peer.down,
          -- ", fails: ", fails)

-    if not peer.down and fails >= ctx.fall then
+    local u_key = gen_upstream_key(u,"all_down")


Style: needs a space after the comma.

Also, it is a bit sad to introduce a new string concat and creation operation here. Given that upstream should be fixed, can we recycle this key string instead of trying to generate a new one every time?

And it's a bit confusing to call the "all_down" key an "upstream key".

Fixed style issues.

gen_upstream_key() is used 5 more times exactly like this. It does simple concatenation. So, code looks cleaner with it. I made it with the same idea as already existing gen_peer_key() . Just use suffix instead of prefix.

Not sure how to call this key then. Maybe, all_peers_down ?

I removed gen_upstream_key() from peer_fail(). Instead added extra arg all_peers_down_key.
Since peer_fail() is called only once from peer_error() function and we already have this key in peer_error().
Please, check if you find this better?

Now peer_fail() and peer_ok() have different number of arguments, which doesn't look very nice I think.

Since this key is global for an upstream it is possible to call gen_upstream_key() once in do_check(). But, then we need to extend all the following functions (check_peers(), check_peer() .. ) with this extra arg. Not sure, what is better.

agentzh · 2018-02-26T21:21:46Z

lib/resty/upstream/healthcheck.lua

+
+    local all_down, err = dict:get(u_key)
+    if not all_down then
+        errlog("Failed to get all down flag for upstream " .. u .. ". ", err)


I think we do not use upper case letter at the beginning of our error messages. Also, do we really want this to be fatal? The health check service would be down when the shm zone is running out of storage, which does not look very graceful to me :)

Fixed style issues.

Agree. In theory we can never put peer into down state. Removed return

Also put return nil into initialization of this key into spawn_checker()

agentzh · 2018-02-26T21:22:21Z

lib/resty/upstream/healthcheck.lua

+        return
+    end
+
+    if not peer.down and fails >= ctx.fall and all_down ~= 1 then


I think the all_down flag should be a boolean instead of a number. The lua shared dict does support boolean typed values.

Also thought about this. But, ngx.shared.DICT.get returns nil if key does not exists. So, all checks like "if not key" after each dict:get() have to be adapted then. That's why I chose {0,1}

I mean that "if not key" will be true for both false and nill values.

agentzh · 2018-02-26T21:22:53Z

lib/resty/upstream/healthcheck.lua

@@ -193,11 +205,40 @@ local function peer_ok(ctx, is_backup, id, peer)
        peer.down = nil
        set_peer_down_globally(ctx, is_backup, id, nil)
    end
+
+    local u_key = gen_upstream_key(u,"all_down")


Please always add a space after commas (,). There are other places having the same style issue.

agentzh · 2018-02-26T21:30:35Z

lib/resty/upstream/healthcheck.lua

+    end
+
+    if all_down == 0 then
+        local result = true


Please use more meaningful name for this variable. "result" carries no information about the meaning of its true or false values.

Changed to "is_all_peers_down"

agentzh · 2018-02-26T21:31:23Z

lib/resty/upstream/healthcheck.lua

+
+            for i = 1, #ppeers do
+                set_peer_down_globally(ctx, false, ppeers[i].id, nil)
+                --Flush local cache


Please always use a space after -- and avoid using upper case letter at the beginning of a comment text.

agentzh · 2018-02-26T21:32:19Z

lib/resty/upstream/healthcheck.lua

@@ -435,6 +474,49 @@ local function get_lock(ctx)
    return true
 end

+local function set_all_down_flag(ctx, ppeers, bpeers)


Please add more comment to this function since I have difficulties understanding what this function tries to do. It definitely does WAY more magic than simply setting an "all down" flag.

Added brief description of the function.

agentzh · 2018-02-26T21:33:48Z

lib/resty/upstream/healthcheck.lua

+
+        for j = 1, #bpeers do
+            result = result and bpeers[j].down
+        end


Can we maintain a counter for the number of up'd nodes so that we do not have to scan all the peers' state upon every checking operation? The current way looks too much expensive to me. An O(1) operation now becomes an O(n) operation.

But the status of any peer can change after any check. Here we really need to go through all of them.
The only one improvement I can see here is to stop when we found the first 'up' peer.
Added break condition into both loops.
I think it doesn't change the theoretical complexity, but of course in normal situation ( when all peers are up) it will be O(1).

Added break condition into its loops.

Now it gets there as an extra arg from peer_error().

eglinux · 2018-03-19T08:46:33Z

Any chance that you would evaluate the changes ? )

Added all_down flag

e451b65

Added new all_down flag (upstream:all_down) to the global dict structure. Needed to detect situations when all peers are down for the upstream. If such has happened - we switch all peers to up state. Proper error message will be appearing in the error log.

Adjusted TEST 11 to be compatible

1371459

Switched expected status from DOWN to up.

eglinux mentioned this pull request Feb 26, 2018

Routing requests to all peers when they all down #51

Open

agentzh requested changes Feb 26, 2018

View reviewed changes

eglinux added 6 commits February 27, 2018 09:38

Fixed style issues

2315350

Fixes all_down in peer_fail() function

e5822f5

Changed some local variables names to be more meaningfull

bf8703a

Added some comments and small changes in log-messages.

77ccc69

Improved performance of set_all_peers_down_flag function

1ee9a67

Added break condition into its loops.

Removed gen_upstream_key from peer_fail()

82f7d99

Now it gets there as an extra arg from peer_error().

fixed some idents

170f4f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added all_down flag #52

Added all_down flag #52

eglinux commented Feb 26, 2018

eglinux commented Feb 26, 2018

agentzh Feb 26, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

eglinux Feb 28, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

agentzh Feb 26, 2018

eglinux Feb 27, 2018

eglinux commented Mar 19, 2018

Added all_down flag #52

Are you sure you want to change the base?

Added all_down flag #52

Conversation

eglinux commented Feb 26, 2018

eglinux commented Feb 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eglinux commented Mar 19, 2018