Stuck in probe state after force leave #129

martinsumner · 2019-10-29T09:55:50Z

The riak_test ensemble_ring_changes attempts to test ensemble when subject to various cluster admin changes. Unfortunately the original test didn't wait for some of the cluster operations to complete before making its assertions, and so the test would pass without the scenario actually being proven.

There is now an updated version of this test, which fails most of the time - https://github.com/basho/riak_test/pulls.

The failure occurs after the force-replace operation. After this some ensembles are stuck in the probe state. Generally the situation is as follows:

There are peers probing, but those peers think that there is a quorum of nodes on a now unreachable node (the one which has left under the force-leave).
The probe state constantly returns with timeout errors (unavailable nodes lead to immediate nacks - that re then interpreted as a quorum for timeout), the peer then does a probe delay and re-probe.
The riak_ensemble_manager has a more correct view (i.e. one that doesn't show ensemble peers on the unavailable node) and this is checked after the probe has failed.
The riak_ensemble_manager has a lower {epoch, sqn} than that of the peer view - so the peers persist in using its incorrect view and the cycle of probe failures continue.
Despite there being no nodes down, multiple ensembles are un usable.

This is true on both the current develop-3.0 branch as well as the develop-3.0-lastgaspring branch which corrects the issue in #128.

The text was updated successfully, but these errors were encountered:

martinsumner · 2019-10-29T10:21:21Z

It is not obvious how the riak_ensemble_manager and riak_ensemble_peer get out of sync, and whether the riak_ensemble_manager having a lower vsn is the genuine bug, or whether the riak_ensemble_manager having amore correct view is a fortunate accident.

It does seem though, that the riak_ensemble_peer should be able to handle the repeated probe state better. the riak_ensemble_peer should be able to tell from the ring that its view is wrong (as the view contains nodes not in the ring). Is not consulting the ring a deliberate choice though - to try and maintain abstraction between riak_core and riak_ensemble? riak_ensemble currently has no dependency on riak_core.

It might be that we can fix this for force-leave by getting a better signal from the riak_core_claimant to the riak_ensemble_peer on completion . But is this a more generic problem, where we need a general solution for peers getting erroneously stuck in the probe state due to out of date configuration (that looks to it to be up to date).

martinsumner · 2019-10-29T17:34:41Z

Trying to chase what happens with an update from riak_core_claimant to ensemble:

Either a join or leave are sent as separate requests using riak_ensemble_manager:join/2 or remove/2 to (and from) the local node of the singleton claimant:

https://github.com/basho/riak_core/blob/8f6cb273b9001e3afaeb41816dfe3fe22502759e/src/riak_core_claimant.erl#L874-L876

https://github.com/basho/riak_core/blob/8f6cb273b9001e3afaeb41816dfe3fe22502759e/src/riak_core_claimant.erl#L883-L885

There is a check that there is no attempt to remove or add itself:

riak_ensemble/src/riak_ensemble_manager.erl

Line 91 in 0303d7e

join(Same, Same) ->

riak_ensemble/src/riak_ensemble_manager.erl

Line 96 in 0303d7e

remove(Same, Same) ->

This is then converted via typed_call/3 into a gen_server:call to {riak_ensemble_manager, Node}. For a join the destination Node for the call should be the node() which is being added, which will be told to join the claimant node. For the remove the destination Node is the local Node of the claimant, and the message contains the node of the node to be removed.

riak_ensemble/src/riak_ensemble_manager.erl

Line 275 in 0303d7e

gen_server:call({?MODULE, Node}, Msg, Timeout).

On receiving a join call, the node the be added must request the remote cluster state of the claimant node (the node already in the cluster) to check if join allowed:

riak_ensemble/src/riak_ensemble_manager.erl

Lines 315 to 317 in 0303d7e

    
           case gen_server:call({?MODULE, Node}, get_cluster_state, 60000) of 
        
               {ok, RemoteCS} -> 
        
                   case join_allowed(LocalCS, RemoteCS) of

The join_allowed/2 appears to be just a check things are enabled and that we're not self-joining:

riak_ensemble/src/riak_ensemble_manager.erl

Lines 521 to 532 in 0303d7e

    
           join_allowed(LocalCS, RemoteCS) -> 
        
               LocalEnabled = riak_ensemble_state:enabled(LocalCS), 
        
               RemoteEnabled = riak_ensemble_state:enabled(RemoteCS), 
        
               LocalId = riak_ensemble_state:id(LocalCS), 
        
               RemoteId = riak_ensemble_state:id(RemoteCS), 
        
               if not RemoteEnabled -> 
        
                       remote_not_enabled; 
        
                  LocalEnabled and (LocalId =/= RemoteId) -> 
        
                       already_enabled; 
        
                  true -> 
        
                       true 
        
               end.

The joining node (manager) is then given the same cluster_state as the claimant node it is joining:

riak_ensemble/src/riak_ensemble_manager.erl

Line 391 in 0303d7e

%% io:format("Tracking remote peer: ~p :: ~p~n", [Peer, Pid]),

This state is saved, and we call riak_ensemble_root:join/1 passing in the claimant node as the node - the one already in the cluster).

riak_ensemble/src/riak_ensemble_manager.erl

Line 322 in 0303d7e

Reply = riak_ensemble_root:join(Node),

Once there is a reply form the join, this node will prompt all the state changes required in the local ETS table and the local peers (adding and deleting as required):

riak_ensemble/src/riak_ensemble_manager.erl

Lines 610 to 641 in 0303d7e

    
           -spec state_changed(state()) -> state(). 
        
           state_changed(State=#state{ensemble_data=EnsData, cluster_state=CS}) -> 
        
               true = ets:insert(?ETS, {cluster_state, CS}), 
        
               true = ets:insert(?ETS, {cluster, riak_ensemble_state:members(CS)}), 
        
               true = ets:insert(?ETS, {enabled, riak_ensemble_state:enabled(CS)}), 
        
               {NewEnsData, EnsChanges} = check_ensembles(EnsData, CS), 
        
               _ = [case Change of 
        
                        {add, Obj} -> 
        
                            true = ets:insert(?ETS, Obj); 
        
                        {del, Obj} -> 
        
                            true = ets:delete(?ETS, Obj); 
        
                        {ins, Obj} -> 
        
                            true = ets:insert(?ETS, Obj) 
        
                    end || Change <- EnsChanges], 
        
               PeerChanges = check_peers(CS), 
        
               _ = [case Change of 
        
                        {add, {Ensemble, Id}, Info} -> 
        
                            #ensemble_info{mod=Mod, args=Args} = Info, 
        
                            %% Maybe start new peer 
        
                            case Mod:ready_to_start() of 
        
                                true -> 
        
                                    riak_ensemble_peer_sup:start_peer(Mod, Ensemble, 
        
                                                                      Id, Args); 
        
                                _ -> 
        
                                    ok 
        
                            end; 
        
                        {del, {Ensemble, Id}} -> 
        
                            %% Stop running peer 
        
                            %% io:format("Should stop: ~p~n", [{Ensemble, Id}]), 
        
                            ok = riak_ensemble_peer_sup:stop_peer(Ensemble, Id) 
        
                    end || Change <- PeerChanges], 
        
               State#state{ensemble_data=NewEnsData}.

This is based on the remote state of the claimant node (State2) i.e. it does not currently reflect the join. I'm not sure what peers can therefore be created at this stage, as none of the ensembles in the remote state will have changed to reflect the join.

The remove is relatively simple:

riak_ensemble/src/riak_ensemble_manager.erl

Lines 335 to 338 in 0303d7e

    
           handle_call({remove, Node}, _From, State) -> 
        
               Reply = riak_ensemble_root:remove(Node), 
        
               State2 = state_changed(State), 
        
               {reply, Reply, State2};

The call to riak_ensemble_root:remove/1 will be passed the Node to be removed.

The 'riak_ensemble_root:join/1' will result in a call to the passed in Node of claimant (the one in the cluster), with the message of the call containing the local node.

The riak_ensemble_root:remove/1 will result in a call to the local node, passing in the Node reference of the removed node.

In other words riak_ensemble root and join both lead to calls to a node in the cluster with a message containing a reference to the node joining or leaving.

riak_ensemble/src/riak_ensemble_root.erl

Lines 47 to 65 in 0303d7e

    
           -spec join(node()) -> ok | {error, term()}. 
        
           join(Node) -> 
        
               case call(Node, {join, node()}, 60000) of 
        
                   ok -> 
        
                       _ = lager:info("JOIN: success"), 
        
                       ok; 
        
                   Error -> 
        
                       {error, Error} 
        
               end. 
        
           -spec remove(node()) -> ok | {error, term()}. 
        
           remove(Node) -> 
        
               case call(node(), {remove, Node}, 60000) of 
        
                   ok -> 
        
                       _ = lager:info("REMOVE: success"), 
        
                       ok; 
        
                   Error -> 
        
                       {error, Error} 
        
               end.

The call will result in:

riak_ensemble/src/riak_ensemble_root.erl

Lines 78 to 84 in 0303d7e

    
           Default = root_init(), 
        
           Result = riak_ensemble_peer:kmodify(Node, 
        
                                               root, 
        
                                               cluster_state, 
        
                                               {?MODULE, do_root_call, Cmd}, 
        
                                               Default, 
        
                                               Timeout),

riak_ensemble_peer:kmodify/6

-spec kmodify(node(), target(), key(), modify_fun(), term(), timeout()) -> std_reply(). kmodify(Node, Target, Key, ModFun, Default, Timeout)

Which will send an event:

riak_ensemble/src/riak_ensemble_peer.erl

Line 297 in 0303d7e

    
           Result = riak_ensemble_router:sync_send_event(Node, Target, {put, Key, F, [ModFun, Default]}, Timeout),

The target here is the atom root and not a pid(), so this will lead to:

riak_ensemble/src/riak_ensemble_router.erl

Line 84 in 0303d7e

spawn_link(?MODULE, sync_proxy, [self(), Ref, Node, Target, Event, Timeout]),

Which we can unpack as:

spawn_link(
    riak_ensemble_router:sync_proxy/6,
          [self(),
            make_ref(),
            Node (to join or leave),
            root, 
            {put,
               cluster_state,
               fun riak_ensemble_peer:do_kmodify/4,
               [{riak_ensemble_root, do_root_call, {join|remove, node()}},
                   riak_ensemble_root:root_init()]},
            infinity])

the Target is still root and not a pid, so this results in a call to sync_proxy_router/6:

riak_ensemble/src/riak_ensemble_router.erl

Line 96 in 0303d7e

sync_proxy_router(From, Ref, Node, Target, Event, Timeout).

This is going to cast to Node:

riak_ensemble/src/riak_ensemble_router.erl

Line 131 in 0303d7e

-spec cast(node(), ensemble_id(), msg()) -> error | ok.

Where the Node is the Node which is joining or leaving. The ensemble is root.

There appear to be 7 routers which have bene generated, so the code next picks one of these routers in the pool at random, to use as a destination for a noconnect_cast:

riak_ensemble/src/riak_ensemble_router.erl

Line 139 in 0303d7e

case noconnect_cast({Router, Node}, {ensemble_cast, Ensemble, Msg}) of

This appears to be just and erlang:send/3 with `[noconnect]' as the options, and a wrapper to do something if the node is not connected.

The message being sent was {ensemble_cast, root, Msg}, where the Msg is:

{put,
               cluster_state,
               fun riak_ensemble_peer:do_kmodify/4,
               [{riak_ensemble_root, do_root_call, {join|remove, node()}},
                   riak_ensemble_root:root_init()]}

The riak_ensemble_router process on the receiving node (the node that is joining or leaving) will then call the riak_ensemble_router:ensemble_cast/2 function:

riak_ensemble/src/riak_ensemble_router.erl

Lines 219 to 235 in 0303d7e

    
           -spec ensemble_cast(ensemble_id(), msg()) -> error | ok. 
        
           ensemble_cast(Ensemble, Msg) -> 
        
               case riak_ensemble_manager:get_leader(Ensemble) of 
        
                   {_, Node}=Leader -> 
        
                       %% io:format("L: ~p~n", [Leader]), 
        
                       if Node =:= node() -> 
        
                               Pid = riak_ensemble_manager:get_peer_pid(Ensemble, Leader), 
        
                               %% io:format("Sending to ~p~n", [Pid]), 
        
                               handle_ensemble_cast(Msg, Pid), 
        
                               ok; 
        
                          true -> 
        
                               riak_ensemble_router:cast(Node, Ensemble, Msg), 
        
                               ok 
        
                       end; 
        
                   undefined -> 
        
                       error 
        
               end.

The purpose of this appears to be to find the leader of the root ensemble - be it on this node, or any other node - and forward message to the riak_ensemble_peer which is the leader of the root ensemble on that node.

If we assume that peer is in the leading state, the PUT will be handled by the leading_kv function:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 715 to 721 in 0303d7e

    
           leading(Msg, From, State) -> 
        
               case leading_kv(Msg, From, State) of 
        
                   false -> 
        
                       common(Msg, From, State, leading); 
        
                   Return -> 
        
                       Return 
        
               end.

This will then select a worker from a pool (the async function) to run the do_put_fsm function (which is a simple function not a FSM):

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1286 to 1289 in 0303d7e

    
           leading_kv({put, Key, Fun, Args}, From, State) -> 
        
               Self = self(), 
        
               async(Key, State, fun() -> do_put_fsm(Key, Fun, Args, From, Self, State) end), 
        
               {next_state, leading, State};

The do_put_fsm function

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1369 to 1416 in 0303d7e

    
           do_put_fsm(Key, Fun, Args, From, Self, State=#state{tree=Tree}) -> 
        
               case riak_ensemble_peer_tree:get(Key, Tree) of 
        
                   corrupted -> 
        
                       %% io:format("Tree corrupted (put)!~n"), 
        
                       send_reply(From, failed), 
        
                       gen_fsm_compat:sync_send_event(Self, tree_corrupted, infinity); 
        
                   KnownHash -> 
        
                       do_put_fsm(Key, Fun, Args, From, Self, KnownHash, State) 
        
               end. 
        
           do_put_fsm(Key, Fun, Args, From, Self, KnownHash, State) -> 
        
               %% TODO: Timeout should be configurable per request 
        
               Local = local_get(Self, Key, ?LOCAL_GET_TIMEOUT), 
        
               State2 = State#state{self=Self}, 
        
               case is_current(Local, Key, KnownHash, State2) of 
        
                   local_timeout -> 
        
                       %% TODO: Should this send a request_failed? 
        
                       %% gen_fsm_compat:sync_send_event(Self, request_failed, infinity), 
        
                       send_reply(From, unavailable); 
        
                   true -> 
        
                       do_modify_fsm(Key, Local, Fun, Args, From, State2); 
        
                   false -> 
        
                       case update_key(Key, Local, KnownHash, State2) of 
        
                           {ok, Current, _State3} -> 
        
                               do_modify_fsm(Key, Current, Fun, Args, From, State2); 
        
                           {corrupted, _State2} -> 
        
                               send_reply(From, failed), 
        
                               gen_fsm_compat:sync_send_event(Self, tree_corrupted, infinity); 
        
                           {failed, _State3} -> 
        
                               gen_fsm_compat:sync_send_event(Self, request_failed, infinity), 
        
                               send_reply(From, unavailable) 
        
                       end 
        
               end. 
        
           %% -spec do_modify_fsm(_,_,fun((_,_) -> any()),{_,_},state()) -> ok. 
        
           do_modify_fsm(Key, Current, Fun, Args, From, State=#state{self=Self}) -> 
        
               case modify_key(Key, Current, Fun, Args, State) of 
        
                   {ok, New, _State2} -> 
        
                       send_reply(From, {ok, New}); 
        
                   {corrupted, _State2} -> 
        
                       send_reply(From, failed), 
        
                       gen_fsm_compat:sync_send_event(Self, tree_corrupted, infinity); 
        
                   {precondition, _State2} -> 
        
                       send_reply(From, failed); 
        
                   {failed, _State2} -> 
        
                       gen_fsm_compat:sync_send_event(Self, request_failed, infinity), 
        
                       send_reply(From, timeout) 
        
               end.

At this stage:

Key is cluster_state
Fun is fun riak_ensemble_peer:do_kmodify/4
Args is [{riak_ensemble_root, do_root_call, {join|remove, node()}}, riak_ensemble_root:root_init()]

The PUT starts with an is_current/4 check based on the response from a local_get/3 call. The local get is to fetch the Object (i.e. the current stored cluster_state) from the local riak_kv_ensemble_backend .. which is found by using riak_ensemble_router to send a message back to the peer (bearing in mind that that this function is being run async by a worker, not bey the peer itself):

riak_ensemble/src/riak_ensemble_peer.erl

Lines 340 to 341 in 0303d7e

    
           local_get(Pid, Key, Timeout) when is_pid(Pid) -> 
        
               riak_ensemble_router:sync_send_event(Pid, {local_get, Key}, Timeout).

The is_current check requires the hash (which isn't a hash) associated with the key to be taken from the AAE store (synctree) and compared with the metadata on the object in the store.

The hash is in fact a binary containing the epoch and sqn for the object. Note that this must equal the equivalent information in the store. Note the comments here - the AAE store (synctree) is different in purpose to the usual KV AAE store. There is then a check that the epoch of the object is the same as the epoch in the fact in the state of the peer.

If is_current isn't true, then it has to find the latest object from a quorum of peers using update_key/4:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1566 to 1567 in 0303d7e

    
           NumPeers = length(get_peers(State#state.members, State)), 
        
           case get_latest_obj(Key, Local, KnownHash, State) of

Assuming that the store is up to date, then the key can be modified using do_modify_fsm/6 which is a function that sends a reply after obtaining the result of modify_key/5.

Included in the Args of modify_key are a modification function and arguments, and that modification function is now run:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1602 to 1608 in 0303d7e

    
           Seq = obj_sequence(State), 
        
           FunResult = case Args of 
        
                           [] -> 
        
                               Fun(Current, Seq, State); 
        
                           _ -> 
        
                               Fun(Current, Seq, State, Args) 
        
                       end,

In this case being:

riak_ensemble_peer:do_kmodify(CurrentObject, NextObjectSequence, PeerState, [{riak_ensemble_root, do_root_call, {join|remove, node()}}, riak_ensemble_root:root_init()])

The next object sequence number is the one returned from this function:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1776 to 1791 in 0303d7e

    
           -spec obj_sequence(state()) -> seq(). 
        
           obj_sequence(State=#state{ets=ETS}) -> 
        
               Epoch = epoch(State), 
        
               obj_sequence(ETS, Epoch). 
        
           -spec obj_sequence(atom() | ets:tid(), epoch()) -> seq(). 
        
           obj_sequence(ETS, Epoch) -> 
        
               try 
        
                   Seq = ets:update_counter(ETS, seq, 0), 
        
                   ObjSeq = ets:update_counter(ETS, {obj_seq, Epoch}, 1), 
        
                   Seq + ObjSeq 
        
               catch 
        
                   _:_ -> 
        
                       %% io:format("EE: ~p~n", [ets:tab2list(ETS)]), 
        
                       throw(die) 
        
               end.

The do_kmodify function which was passed in is then going to in turn function within its arguments to get the updated object to be stored:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 302 to 309 in 0303d7e

    
           Value = get_value(Obj, Default, State), 
        
           Vsn = {epoch(State), NextSeq}, 
        
           New = case ModFun of 
        
                     {Mod, Fun, Args} -> 
        
                         Mod:Fun(Vsn, Value, Args); 
        
                     _ -> 
        
                         ModFun(Vsn, Value) 
        
                 end,

riak_ensemble_root:do_root_call(Vsn, Value, {join|remove, node()})

do_root_call is a function to re-order the arguments for root_call:

riak_ensemble/src/riak_ensemble_root.erl

Lines 110 to 111 in 0303d7e

    
           do_root_call(Seq, State, Cmd) -> 
        
               root_call(Cmd, Seq, State).

riak_ensemble/src/riak_ensemble_root.erl

Lines 123 to 138 in 0303d7e

    
           root_call({join, Node}, Vsn, State) -> 
        
               _ = lager:info("join(Vsn): ~p :: ~p :: ~p", [Vsn, Node, riak_ensemble_state:members(State)]), 
        
               case riak_ensemble_state:add_member(Vsn, Node, State) of 
        
                   {ok, State2} -> 
        
                       State2; 
        
                   error -> 
        
                       failed 
        
               end; 
        
           root_call({remove, Node}, Vsn, State) -> 
        
               _ = lager:info("remove(Vsn): ~p :: ~p :: ~p", [Vsn, Node, riak_ensemble_state:members(State)]), 
        
               case riak_ensemble_state:del_member(Vsn, Node, State) of 
        
                   {ok, State2} -> 
        
                       State2; 
        
                   error -> 
        
                       failed 
        
               end;

With a modified object the actual put can now be performed:

riak_ensemble/src/riak_ensemble_peer.erl

Lines 1670 to 1698 in 0303d7e

    
           put_obj(Key, Obj, Seq, State=#state{id=Id, members=Members, self=Self}) -> 
        
               Epoch = epoch(State), 
        
               Obj2 = increment_obj(Key, Obj, Seq, State), 
        
               Peers = get_peers(Members, State), 
        
               {Future, State2} = blocking_send_all({put, Key, Obj2, Id, Epoch}, Peers, State), 
        
               case local_put(Self, Key, Obj2, ?LOCAL_PUT_TIMEOUT) of 
        
                   failed -> 
        
                       lager:warning("Failed local_put for Key ~p, Id = ~p", [Key, Id]), 
        
                       gen_fsm_compat:sync_send_event(Self, request_failed, infinity), 
        
                       {failed, State2}; 
        
                   Local -> 
        
                       case wait_for_quorum(Future) of 
        
                           {quorum_met, _Replies} -> 
        
                               ObjHash = get_obj_hash(Key, Local, State2), 
        
                               case update_hash(Key, ObjHash, State2) of 
        
                                   {ok, State3} -> 
        
                                       case send_update_hash(Key, ObjHash, State3) of 
        
                                           {ok, State4} -> 
        
                                               {ok, Local, State4}; 
        
                                           {failed, State4} -> 
        
                                               {failed, State4} 
        
                                       end; 
        
                                   {corrupted, State3} -> 
        
                                       {corrupted, State3} 
        
                               end; 
        
                           {timeout, _Replies} -> 
        
                               {failed, State2} 
        
                       end 
        
               end.

The increment_object/4 function applies the epoch and sequence number to the object.

There will then be put message events sent to all the peers in this ensemble, and a local_put message will be sent back to this peer (the leader). The put message will require the peers to be in the following state, and the the local_put will require the peer to be in the leading state

Both local_put and put will result in a Mod:put. In riak_kv, the Mod is riak_kv_ensemble_backend:

https://github.com/basho/riak_kv/blob/develop-3.0/src/riak_kv_ensemble_backend.erl#L147-L149

The backend itself is not doing the storage, it is forwarding an ensemble_put request to the riak_kv_vnode for storage.

If the vnode is forwarding, it will forward the put, otherwise it will do an actual put:

https://github.com/basho/riak_kv/blob/develop-3.0/src/riak_kv_vnode.erl#L2105-L2125

If a handoff target it is specified it will do a raw_put to the handoff node:

https://github.com/basho/riak_kv/blob/develop-3.0/src/riak_kv_vnode.erl#L2248-L2252

Note that in one riak_test run of the ensemble_sync test a failure was seen in raw_put because of a badarg on the message send.

After completing the PUT, the root leader needs to send and updated hash to all sync'd trees to make sure that the AAE state is universally updated with the object state.

A reply is then sent back with the updated object (the cluster state). Although the object is just ignored:

riak_ensemble/src/riak_ensemble_root.erl

Lines 86 to 87 in 0303d7e

    
           {ok, _Obj} -> 
        
               ok;

So this has updated the root ensemble's stored view of the cluster state. However, I don't believe at this stage it has actually done anything with regards to updating the operational state to reflect the change.

This must be prompted via the state_changed function in Stage (3), which is called after all these ensemble updates have completed. This calculates any peer changes, and requests them via the peer supervisor:

riak_ensemble/src/riak_ensemble_manager.erl

Lines 697 to 737 in 0303d7e

    
           -spec check_peers(cluster_state()) -> [Change] when 
        
                 Change :: {add, {ensemble_id(), peer_id()}, ensemble_info()} 
        
                         | {del, {ensemble_id(), peer_id()}}. 
        
           check_peers(CS) -> 
        
               Peers = orddict_from_list(riak_ensemble_peer_sup:peers()), 
        
               NewPeers = wanted_peers(CS), 
        
               Delta = orddict_from_list(riak_ensemble_util:orddict_delta(Peers, NewPeers)), 
        
               Changes = 
        
                   [case Change of 
        
                        {'$none', Info} -> 
        
                            {add, PeerId, Info}; 
        
                        {_Pid, '$none'} -> 
        
                            {del, PeerId}; 
        
                        _ -> 
        
                            nothing 
        
                    end || {PeerId, Change} <- Delta], 
        
               Changes2 = [Change || Change <- Changes, 
        
                                     Change =/= nothing], 
        
               Changes2. 
        
           -spec ensemble_data(cluster_state()) -> ensembles(). 
        
           ensemble_data(CS) -> 
        
               Ensembles = riak_ensemble_state:ensembles(CS), 
        
               Pending = riak_ensemble_state:pending(CS), 
        
               Data = [{Ensemble, {Ensemble, Leader, {Vsn,Views}, pending(Ensemble, Pending)}} 
        
                       || {Ensemble, #ensemble_info{leader=Leader, views=Views, vsn=Vsn}} <- Ensembles], 
        
               orddict_from_list(Data). 
        
           -spec wanted_peers(cluster_state()) -> orddict(Peer, Info) when 
        
                 Peer :: {ensemble_id(), peer_id()}, 
        
                 Info :: ensemble_info(). 
        
           wanted_peers(CS) -> 
        
               Ensembles = riak_ensemble_state:ensembles(CS), 
        
               Pending = riak_ensemble_state:pending(CS), 
        
               ThisNode = node(), 
        
               Wanted = [{{Ensemble, PeerId}, Info} 
        
                         || {Ensemble, Info=#ensemble_info{views=EViews}} <- Ensembles, 
        
                            EPeers <- [compute_all_members(Ensemble, Pending, EViews)], 
        
                            PeerId={_, Node} <- EPeers, 
        
                            Node =:= ThisNode], 
        
               orddict_from_list(Wanted).

.... work in progress ... still more to come

martinsumner · 2019-10-30T20:08:36Z

Looking at examples of the test failing, having added some logs, the JOIN and LEAVE requests are all see, and all report success:

test log:

19:26:33.905 [info] <0.170.0> Wait until nodes are ready : ['[email protected]','[email protected]','[email protected]']
19:26:33.906 [info] <0.170.0> Wait until nodes agree about ownership ['[email protected]','[email protected]','[email protected]']
19:26:35.911 [info] <0.170.0> Wait until no pending changes on ['[email protected]','[email protected]','[email protected]']
...
19:26:39.930 [info] <0.170.0> Waiting until riak_ensemble cluster includes all nodes
19:27:09.964 [info] <0.170.0> ....cluster ready
19:27:09.964 [info] <0.170.0> Waiting until ensemble membership matches ring ownership
19:27:10.965 [info] <0.170.0> ....ownership matches
19:27:10.965 [info] <0.170.0> Waiting until all ensembles are stable
...
19:27:19.025 [info] <0.170.0> All ensembles have quorum
19:27:35.089 [info] <0.170.0> All ensembles have quorum count 3 confirmed
19:27:35.089 [info] <0.170.0> ....all stable

dev1:

2019-10-30 19:27:09.363 [info] <0.3494.0>@riak_ensemble_root:root_call:124 join(Vsn): {1,29} :: '[email protected]' :: ['[email protected]']

dev2:

2019-10-30 19:27:09.379 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

dev1:

2019-10-30 19:27:09.384 [info] <0.3494.0>@riak_ensemble_root:root_call:124 join(Vsn): {1,30} :: '[email protected]' :: ['[email protected]','[email protected]']

dev3:

2019-10-30 19:27:09.399 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

test log:

19:28:08.396 [info] <0.170.0> [join] '[email protected]' to ('[email protected]')
19:28:08.419 [info] <0.170.0> [join] '[email protected]' to ('[email protected]')
19:28:08.452 [info] <0.170.0> [join] '[email protected]' to ('[email protected]')
19:28:09.506 [info] <0.170.0> planning cluster change
19:28:09.508 [info] <0.170.0> plan: done
19:28:09.508 [info] <0.170.0> planning cluster commit
...
19:28:32.216 [info] <0.170.0> Waiting until riak_ensemble cluster includes all nodes
19:28:40.226 [info] <0.170.0> ....cluster ready
19:28:40.226 [info] <0.170.0> Waiting until ensemble membership matches ring ownership
19:29:02.264 [info] <0.170.0> ....ownership matches
19:29:02.264 [info] <0.170.0> Waiting until all ensembles are stable
...
19:29:07.339 [info] <0.170.0> All ensembles have quorum
19:29:23.407 [info] <0.170.0> All ensembles have quorum count 3 confirmed
19:29:23.407 [info] <0.170.0> ....all stable

dev1:

2019-10-30 19:28:39.414 [info] <0.4418.0>@riak_ensemble_root:root_call:124 join(Vsn): {2,286} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]']
2019-10-30 19:28:39.459 [info] <0.4418.0>@riak_ensemble_root:root_call:124 join(Vsn): {2,287} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]']
2019-10-30 19:28:39.493 [info] <0.4418.0>@riak_ensemble_root:root_call:124 join(Vsn): {2,288} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']

dev4:

2019-10-30 19:28:39.452 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

dev5:

2019-10-30 19:28:39.485 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

dev6:

2019-10-30 19:28:39.519 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

test log:

19:29:23.420 [info] <0.170.0> Replacing '[email protected]' with '[email protected]'
19:29:23.420 [info] <0.170.0> [join] '[email protected]' to ('[email protected]')
19:29:23.447 [info] <0.170.0> Wait until ring converged on ['[email protected]','[email protected]']
19:29:25.526 [info] <0.170.0> planning cluster change
19:29:25.528 [info] <0.170.0> plan: done
19:29:25.528 [info] <0.170.0> planning cluster commit
19:29:26.051 [info] <0.170.0> Wait until nodes are ready : ['[email protected]','[email protected]']
19:29:26.053 [info] <0.170.0> Wait until '[email protected]' is not pingable
19:31:43.205 [info] <0.170.0> Waiting until all ensembles are stable
19:31:43.220 [info] <0.170.0> All ensembles have quorum
19:31:59.278 [info] <0.170.0> All ensembles have quorum count 3 confirmed
19:31:59.278 [info] <0.170.0> ....all stable

dev7:

2019-10-30 19:29:39.563 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

dev1:

2019-10-30 19:31:29.634 [info] <0.1488.0>@riak_ensemble_root:remove:61 REMOVE: success

dev3 (which now appears to be the claimant):

2019-10-30 19:29:39.535 [info] <0.3724.0>@riak_ensemble_root:root_call:124 join(Vsn): {3,163} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']
2019-10-30 19:31:29.581 [info] <0.3724.0>@riak_ensemble_root:root_call:132 remove(Vsn): {3,599} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']

test log:

19:31:59.291 [info] <0.170.0> Force Replacing '[email protected]' with '[email protected]'
19:31:59.291 [info] <0.170.0> [join] '[email protected]' to ('[email protected]')
19:31:59.316 [info] <0.170.0> Wait until ring converged on ['[email protected]','[email protected]']
19:32:01.320 [info] <0.170.0> planning cluster change
19:32:01.324 [info] <0.170.0> plan: done
19:32:01.324 [info] <0.170.0> planning cluster commit
19:32:01.844 [info] <0.170.0> Wait until nodes are ready : ['[email protected]','[email protected]']
19:32:01.846 [info] <0.170.0> Wait until '[email protected]' is not pingable
19:32:09.858 [info] <0.170.0> Waiting until all ensembles are stable
19:32:09.862 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:10.864 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:11.869 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:12.873 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:13.878 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:14.882 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:15.885 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
19:32:16.889 [info] <0.170.0> Quorum not ready: {kv,91343852333181432387730302044767688728495783936,3}
...

Note here is the fault - quorum is never ready

dev3:

2019-10-30 19:32:09.660 [info] <0.3724.0>@riak_ensemble_root:root_call:124 join(Vsn): {3,756} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']
2019-10-30 19:32:09.686 [info] <0.3724.0>@riak_ensemble_root:root_call:132 remove(Vsn): {3,757} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']
2019-10-30 19:32:19.715 [info] <0.3724.0>@riak_ensemble_root:root_call:132 remove(Vsn): {3,799} :: '[email protected]' :: ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']

dev8:

2019-10-30 19:32:09.683 [info] <0.1488.0>@riak_ensemble_root:join:51 JOIN: success

dev1:

2019-10-30 19:32:09.710 [info] <0.1488.0>@riak_ensemble_root:remove:61 REMOVE: success
2019-10-30 19:32:19.782 [info] <0.1488.0>@riak_ensemble_root:remove:61 REMOVE: success

It is odd that there are two remove messages for the final force replace. This appears to be consistent across different runs of the test

martinsumner · 2019-10-30T20:46:44Z

The full update cycle in the previous comment appears to be completed.

Some extra logging has been added in to riak_ensemble_manager:state_changed/1 to see the peer changes prompted. Focusing on the ensemble for {kv,91343852333181432387730302044767688728495783936,3} which ultimately leads to the failure we can track the peer changes.

Initial cluster formed

dev1:

2019-10-30 19:27:09.709 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{ensemble_info,{0,0},riak_kv_ensemble_backend,[],undefined,[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

dev2:

2019-10-30 19:27:11.353 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'}},{ensemble_info,{0,0},riak_kv_ensemble_backend,[],undefined,[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

dev3:

2019-10-30 19:27:11.352 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{ensemble_info,{0,0},riak_kv_ensemble_backend,[],undefined,[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}}

Cluster expanded

dev1:

2019-10-30 19:28:48.543 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {del,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}}}

dev4:

2019-10-30 19:28:42.540 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{ensemble_info,{1,-1},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

Replace

dev4:

2019-10-30 19:29:44.577 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {del,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}}}

dev7:

2019-10-30 19:29:40.575 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{ensemble_info,{1,181},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

Force Replace

dev2:

2019-10-30 19:32:03.046 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{ensemble_info,{1,299},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

2019-10-30 19:32:03.046 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{ensemble_info,{1,299},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}

So these final 2 peer changes are wrong. The dev2 node is being removed here, and it tries to add itself twice to the ensemble as it is closing down.

This happens before the join and the leave messages associated with the force replace .. and these join and leave messages do not generate any peer changes associated with this ensemble.

martinsumner · 2019-10-30T20:52:45Z

These additions occur, and not prompted by a join/leave message, whilst riak is in the process of shutting down on this node:

2019-10-30 19:32:02.513 [notice] <0.1452.0>@riak_core:stop:51 "node removal completed, exiting."
2019-10-30 19:32:02.514 [info] <0.2540.0>@yz_app:prep_stop:74 Stopping application yokozuna.
2019-10-30 19:32:02.529 [error] <0.2540.0>@yz_app:prep_stop:82 Stopping application yokozuna - exit:{noproc,{gen_server,call,[yz_solrq_drain_mgr,{drain,[]},infinity]}}.
2019-10-30 19:32:02.529 [info] <0.2540.0>@yz_app:stop:88 Stopped application yokozuna.
2019-10-30 19:32:02.531 [info] <0.1775.0>@riak_repl_app:prep_stop:336 Redirecting realtime replication traffic
2019-10-30 19:32:02.534 [info] <0.1775.0>@riak_repl_app:prep_stop:339 Stopping application riak_repl - marked service down.
2019-10-30 19:32:02.535 [info] <0.13808.0>@riak_repl_migration:init:31 Riak replication migration server started
2019-10-30 19:32:02.535 [info] <0.1775.0>@riak_repl_app:prep_stop:343 Started migration server
2019-10-30 19:32:02.535 [info] <0.13808.0>@riak_repl_migration:handle_call:35 Realtime Repl queue migration sleeping
2019-10-30 19:32:02.635 [info] <0.13808.0>@riak_repl_migration:handle_info:48 Queue empty, no replication queue migration required
2019-10-30 19:32:02.736 [info] <0.3250.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{add,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}}] id {kv,91343852333181432387730302044767688728495783936,3} vsn {1,299}
2019-10-30 19:32:02.816 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:02.816 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:02.871 [info] <0.3268.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{add,{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}},{del,{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{del,{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}}] id {kv,913438523331814323877303020447676887284957839360,3} vsn {4,109}
2019-10-30 19:32:02.983 [info] <0.3258.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'}},{add,{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'}},{del,{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'}},{del,{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'}}] id {kv,456719261665907161938651510223838443642478919680,3} vsn {3,35}
2019-10-30 19:32:03.045 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.045 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{1,299},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]},{{1,580},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}],[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]}}}
2019-10-30 19:32:03.046 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{ensemble_info,{1,299},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.046 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{ensemble_info,{1,299},riak_kv_ensemble_backend,[],{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.047 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.055 [info] <0.1468.0>@riak_core_vnode_manager:ensure_vnodes_started:532 Don't start vnodes - last gasp ring
2019-10-30 19:32:03.120 [info] <0.3267.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'}},{add,{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{del,{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'}},{del,{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}}] id {kv,822094670998632891489572718402909198556462055424,3} vsn {1,301}
2019-10-30 19:32:03.129 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.129 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,913438523331814323877303020447676887284957839360,3},{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{4,109},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]},{{4,390},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}],[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]}}}
2019-10-30 19:32:03.129 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,913438523331814323877303020447676887284957839360,3},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{ensemble_info,{4,109},riak_kv_ensemble_backend,[],{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.129 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,913438523331814323877303020447676887284957839360,3},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}},{ensemble_info,{4,109},riak_kv_ensemble_backend,[],{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.130 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.202 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.202 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,456719261665907161938651510223838443642478919680,3},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{3,35},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]},{{3,434},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}],[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]}}}
2019-10-30 19:32:03.203 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,456719261665907161938651510223838443642478919680,3},{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'}},{ensemble_info,{3,35},riak_kv_ensemble_backend,[],{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.203 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,456719261665907161938651510223838443642478919680,3},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'}},{ensemble_info,{3,35},riak_kv_ensemble_backend,[],{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.203 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.234 [info] <0.3250.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{add,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}}] id {kv,91343852333181432387730302044767688728495783936,3} vsn {1,299}
2019-10-30 19:32:03.257 [info] <0.3277.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}},{add,{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'}},{del,{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}},{del,{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'}}] id {kv,1370157784997721485815954530671515330927436759040,3} vsn {1,-1}
2019-10-30 19:32:03.360 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.360 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,822094670998632891489572718402909198556462055424,3},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{1,301},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]]},{{1,582},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}],[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]]}}}
2019-10-30 19:32:03.360 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,822094670998632891489572718402909198556462055424,3},{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'}},{ensemble_info,{1,301},riak_kv_ensemble_backend,[],{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.360 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,822094670998632891489572718402909198556462055424,3},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{ensemble_info,{1,301},riak_kv_ensemble_backend,[],{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.361 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.361 [info] <0.3268.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{add,{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}},{del,{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{del,{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}}] id {kv,913438523331814323877303020447676887284957839360,3} vsn {4,109}
2019-10-30 19:32:03.475 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.476 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,1370157784997721485815954530671515330927436759040,3},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{1,-1},[[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]},{{1,579},[[{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}],[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]}}}
2019-10-30 19:32:03.476 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,1370157784997721485815954530671515330927436759040,3},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'}},{ensemble_info,{1,-1},riak_kv_ensemble_backend,[],{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.476 [info] <0.1488.0>@riak_ensemble_manager:state_changed:628 Peer changes {add,{{kv,1370157784997721485815954530671515330927436759040,3},{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}},{ensemble_info,{1,-1},riak_kv_ensemble_backend,[],{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},[[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]],{0,0}}}
2019-10-30 19:32:03.477 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.486 [info] <0.3258.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'}},{add,{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'}},{del,{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'}},{del,{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'}}] id {kv,456719261665907161938651510223838443642478919680,3} vsn {3,35}
2019-10-30 19:32:03.514 [info] <0.1488.0>@riak_ensemble_manager:state_changed:612 State changes starting
2019-10-30 19:32:03.515 [info] <0.1488.0>@riak_ensemble_manager:state_changed:647 State changes completed
2019-10-30 19:32:03.598 [info] <0.3267.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'}},{add,{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}},{del,{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'}},{del,{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}}] id {kv,822094670998632891489572718402909198556462055424,3} vsn {1,301}
2019-10-30 19:32:03.639 [error] <0.2509.0> Supervisor {<0.2509.0>,poolboy_sup} had child riak_repl_fullsync_worker started with riak_repl_fullsync_worker:start_link([{name,{local,riak_repl2_fssink_pool}},{worker_module,riak_repl_fullsync_worker},{worker_args,[]},...]) at undefined exit with reason shutdown in context shutdown_error
2019-10-30 19:32:03.639 [info] <0.1775.0>@riak_repl_app:stop:137 Stopped application riak_repl
2019-10-30 19:32:03.639 [error] <0.2497.0> Supervisor {<0.2497.0>,poolboy_sup} had child riak_repl_fullsync_worker started with riak_repl_fullsync_worker:start_link([{name,{local,riak_repl2_rtsink_pool}},{worker_module,riak_repl_fullsync_worker},{worker_args,[]},...]) at undefined exit with reason shutdown in context shutdown_error
2019-10-30 19:32:03.642 [info] <0.1572.0>@riak_kv_app:prep_stop:265 Stopping application riak_kv - marked service down.
2019-10-30 19:32:03.643 [info] <0.1572.0>@riak_kv_app:prep_stop:269 Unregistered pb services
2019-10-30 19:32:03.644 [info] <0.1572.0>@riak_kv_app:prep_stop:274 unregistered webmachine routes
2019-10-30 19:32:03.644 [info] <0.1572.0>@riak_kv_app:prep_stop:276 all active put FSMs completed
2019-10-30 19:32:03.645 [info] <0.1572.0>@riak_kv_app:stop:287 Stopped  application riak_kv.
2019-10-30 19:32:08.143 [error] <0.13864.0>@riak_kv_vnode:init:791 Failed to start riak_kv_bitcask_backend backend for index 182687704666362864775460604089535377456991567872 crash: {timeout,{gen_server,call,[application_controller,which_applications]}}
2019-10-30 19:32:08.143 [notice] <0.13864.0>@riak:stop:43 "backend module failed to start."
2019-10-30 19:32:08.143 [error] <0.13864.0>@riak_kv_vnode:terminate:2048 gen_fsm <0.13864.0> in state started terminated with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{timeout,{gen_server,call,[application_controller,which_applications]}}}}, undefined) line 2048
2019-10-30 19:32:08.143 [info] <0.1437.0>@riak_core_app:stop:116 Stopped  application riak_core.
2019-10-30 19:32:08.143 [error] <0.13864.0>@riak_kv_vnode:terminate:2048 CRASH REPORT Process <0.13864.0> with 1 neighbours exited with reason: no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{timeout,{gen_server,call,[application_controller,which_applications]}}}}, undefined) line 2048 in gen_fsm_compat:terminate/7 line 566
2019-10-30 19:32:08.144 [error] <0.1441.0>@riak_kv_vnode:terminate:2048 Supervisor riak_core_vnode_sup had child undefined started with riak_core_vnode:start_link() at undefined exit with reason no function clause matching riak_kv_vnode:terminate({bad_return_value,{stop,{timeout,{gen_server,call,[application_controller,which_applications]}}}}, undefined) line 2048 in context shutdown_error
2019-10-30 19:32:08.148 [info] <0.1340.0> alarm_handler: {clear,system_memory_high_watermark}

martinsumner · 2019-10-30T20:56:45Z

This is tested with the last gasp ring code - basho/riak_core#943. This should stop riak from using a fresh ring which was generated after leaving and before closing. Perhaps a fresh ring is generated through a different path here?

However, there is no join or leave message until 6s later (presumably at the next schedule tick). So I don't believe these peer changes are as a result of riak_core triggering them.

martinsumner · 2019-10-30T21:13:37Z

The following ensemble_changes are logged on node2 at this point:

2019-10-30 19:32:03.045 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{1,299},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]},{{1,580},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}],[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]}}}
2019-10-30 19:32:03.129 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,913438523331814323877303020447676887284957839360,3},{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{4,109},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]},{{4,390},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}],[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]}}}
2019-10-30 19:32:03.202 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,456719261665907161938651510223838443642478919680,3},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{3,35},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]},{{3,434},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}],[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]}}}
2019-10-30 19:32:03.360 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,822094670998632891489572718402909198556462055424,3},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{1,301},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]]},{{1,582},[[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}],[{{kv,822094670998632891489572718402909198556462055424,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,822094670998632891489572718402909198556462055424,3,1004782375664995756265033322492444576013453623296},'[email protected]'}]]}}}
2019-10-30 19:32:03.476 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,1370157784997721485815954530671515330927436759040,3},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{1,-1},[[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]},{{1,579},[[{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}],[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]}}}```

martinsumner · 2019-10-30T21:16:25Z

Then on dev3 (and other nodes) there are Ensemble changes after the initial join/remove messages:

2019-10-30 19:32:10.432 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,-1},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,598},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]}}}
2019-10-30 19:32:10.481 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,-1},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,596},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]}}}
2019-10-30 19:32:10.681 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,-1},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,595},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]}}}
2019-10-30 19:32:10.820 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,456719261665907161938651510223838443642478919680,3},{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{4,-1},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]},{{3,436},[[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}],[{{kv,456719261665907161938651510223838443642478919680,3,456719261665907161938651510223838443642478919680},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,456719261665907161938651510223838443642478919680,3,639406966332270026714112114313373821099470487552},'[email protected]'}]]}}}
2019-10-30 19:32:11.518 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,639406966332270026714112114313373821099470487552,3},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{1,-1},[[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}]]},{{1,594},[[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}],[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}]]}}}
2019-10-30 19:32:11.580 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,-1},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,598},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]}}}
2019-10-30 19:32:11.580 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,599},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,598},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]}}}
2019-10-30 19:32:11.581 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,-1},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,600},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]}}}
2019-10-30 19:32:11.582 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,601},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,600},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]}}}
2019-10-30 19:32:11.866 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,-1},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,597},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]}}}
2019-10-30 19:32:11.866 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,598},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,597},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]}}}
2019-10-30 19:32:12.009 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,91343852333181432387730302044767688728495783936,3},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'},{{2,-1},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]},{{1,582},[[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}],[{{kv,91343852333181432387730302044767688728495783936,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}]]}}}
2019-10-30 19:32:12.009 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,913438523331814323877303020447676887284957839360,3},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{5,-1},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]},{{4,392},[[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}],[{{kv,913438523331814323877303020447676887284957839360,3,913438523331814323877303020447676887284957839360},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1004782375664995756265033322492444576013453623296},'[email protected]'},{{kv,913438523331814323877303020447676887284957839360,3,1096126227998177188652763624537212264741949407232},'[email protected]'}]]}}}
2019-10-30 19:32:12.742 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,603},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,600},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]}}}
2019-10-30 19:32:12.742 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,601},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,598},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]}}}
2019-10-30 19:32:13.029 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,600},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,597},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]}}}
2019-10-30 19:32:13.401 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,604},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,600},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}],[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]}}}
2019-10-30 19:32:13.401 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,602},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,598},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}],[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]}}}
2019-10-30 19:32:13.520 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,639406966332270026714112114313373821099470487552,3},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{1,599},[[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}],[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}]]},{{1,596},[[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}],[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}]]}}}
2019-10-30 19:32:13.520 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,1370157784997721485815954530671515330927436759040,3},{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{2,-1},[[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]},{{1,581},[[{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'}],[{{kv,1370157784997721485815954530671515330927436759040,3,1370157784997721485815954530671515330927436759040},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,0},'[email protected]'},{{kv,1370157784997721485815954530671515330927436759040,3,91343852333181432387730302044767688728495783936},'[email protected]'}]]}}}
2019-10-30 19:32:13.694 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,601},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,597},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}],[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]}}}
2019-10-30 19:32:14.584 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,548063113999088594326381812268606132370974703616,3},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,602},[[{{kv,548063113999088594326381812268606132370974703616,3,548063113999088594326381812268606132370974703616},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,548063113999088594326381812268606132370974703616,3,730750818665451459101842416358141509827966271488},'[email protected]'}]]},{{1,604},[]}}}
2019-10-30 19:32:14.584 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,0,3},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'},{{1,604},[[{{kv,0,3,0},'[email protected]'},{{kv,0,3,91343852333181432387730302044767688728495783936},'[email protected]'},{{kv,0,3,182687704666362864775460604089535377456991567872},'[email protected]'}]]},{{1,606},[]}}}
2019-10-30 19:32:14.779 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,730750818665451459101842416358141509827966271488,3},{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{1,601},[[{{kv,730750818665451459101842416358141509827966271488,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{kv,730750818665451459101842416358141509827966271488,3,913438523331814323877303020447676887284957839360},'[email protected]'}]]},{{1,603},[]}}}
2019-10-30 19:32:15.520 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{{kv,639406966332270026714112114313373821099470487552,3},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'},{{1,600},[[{{kv,639406966332270026714112114313373821099470487552,3,639406966332270026714112114313373821099470487552},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,730750818665451459101842416358141509827966271488},'[email protected]'},{{kv,639406966332270026714112114313373821099470487552,3,822094670998632891489572718402909198556462055424},'[email protected]'}]]},{{1,602},[]}}}
2019-10-30 19:32:20.008 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{root,{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{3,305},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]},{{3,402},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}],[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]}}}
2019-10-30 19:32:21.227 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{root,{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{3,404},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}],[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]},{{3,402},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}],[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]}}}
2019-10-30 19:32:22.478 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{root,{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{3,406},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]},{{3,402},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}],[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]}}}
2019-10-30 19:32:23.662 [info] <0.1488.0>@riak_ensemble_manager:state_changed:617 Ensemble change {ins,{root,{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{3,406},[[{root,'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'},{{'[email protected]',{1572,463569,236852}},'[email protected]'}]]},{{3,408},[]}}}

martinsumner · 2019-10-30T21:23:39Z

The fact that we see ensemble changes but not peer changes after the join/remove from the force_replace perhaps explains why the riak_ensemble_manager reports the correct membership, but the peers do not.

The peer changes generated by the replaced node may have advanced the Vsn so that is no ahead of the Vsn in the manager stored ensemble - and so check_views/1 doesn't see the manager information as authoritative and correct the state of peers.

martinsumner · 2019-10-30T21:54:53Z

There is an extra log in the leaving state on riak_ensemble_peer that generates this:

2019-10-30 19:32:02.736 [info] <0.3250.0>@riak_ensemble_peer:leading:665 Success state update [{add,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{add,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,182687704666362864775460604089535377456991567872},'[email protected]'}},{del,{{kv,91343852333181432387730302044767688728495783936,3,274031556999544297163190906134303066185487351808},'[email protected]'}}] id {kv,91343852333181432387730302044767688728495783936,3} vsn {1,299} 2019-10-30 19:32:02.816 [info]

-spec leading(_, fsm_from(), state()) -> sync_next_state().
leading({update_members, Changes}, From, State=#state{fact=Fact,
                                                      members=Members}) ->
    Cluster = riak_ensemble_manager:cluster(),
    Views = Fact#fact.views,
    case update_view(Changes, Members, hd(Views), Cluster) of
        {[], NewView} ->
            Views2 = [NewView|Views],
            NewFact = change_pending(Views2, State),
            case try_commit(NewFact, State) of
                {ok, State2} ->
                    lager:info("Success state update ~w id ~w vsn ~w",
                                [Changes,
                                    State2#state.ensemble,
                                    NewFact#fact.view_vsn]),
                    {reply, ok, leading, State2};

Something is calling update_members here, that is not prompted by a join or leave on riak_ensemble_root.

martinsumner · 2019-10-30T22:02:12Z

This is it I think. the riak_kv_ensemble_backend can make changes to the cluster state off the back of a tick:

https://github.com/basho/riak_kv/blob/aecf7f745766b1c64e7f976fb81bf920bc354f96/src/riak_kv_ensemble_backend.erl#L193-L225

There are two potential problems here:

This might be a last gasp ring, and so the update is erroneous.
As it updates the peer directly, it looks like the riak_ensemble_manager is not updated, and hence we have the situation where the Vsn on the peer is ahead of the Vsn on the manager - and in the future the manager is not trusted as having superior knowledge to the peer.

See basho/riak_ensemble#129

The tick on the backend is very aggressive. So wait 20 ticks of a stable ring before reacting. It would be preferable if other processes react to significant ring changes because of basho/riak_ensemble#129. Perhaps this shouldn't do anything at all?

martinsumner · 2019-10-31T10:33:11Z

There is also another process in riak_kv monitoring the ring fro ensemble:

https://github.com/basho/riak_kv/blob/9693f1ac1a606f44b8999d24b8ee9519a01e578c/src/riak_kv_ensembles.erl

I don't know why there is a need for three separate ticks to monitor here.

Appear to have stability if:

All are changed to check the lastgasp status on the ring;
The riak_kv_ensemble_backend (which has an aggressive tick) will only prompt changes if there has been a stable ring for 20 ticks

martinsumner added a commit to basho/riak_kv that referenced this issue Oct 30, 2019

Add last gasp check

d03b786

See basho/riak_ensemble#129

martinsumner mentioned this issue Oct 31, 2019

Develop 3.0 lastgaspring basho/riak_core#945

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck in probe state after force leave #129

Stuck in probe state after force leave #129

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019

martinsumner commented Oct 31, 2019

Stuck in probe state after force leave #129

Stuck in probe state after force leave #129

Comments

martinsumner commented Oct 29, 2019 • edited Loading

martinsumner commented Oct 29, 2019 • edited Loading

martinsumner commented Oct 29, 2019 • edited Loading

martinsumner commented Oct 30, 2019 • edited Loading

martinsumner commented Oct 30, 2019 • edited Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019 • edited Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019 • edited Loading

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019

martinsumner commented Oct 30, 2019

martinsumner commented Oct 31, 2019

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 29, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading

martinsumner commented Oct 30, 2019 •

edited

Loading