From 80fa494bb2e1a9c666a22c95bdd66315a327da6a Mon Sep 17 00:00:00 2001 From: Adam Spiers Date: Wed, 9 Mar 2016 17:06:34 +0000 Subject: [PATCH] interleave Pacemaker clones to minimise disruption (bsc#965886) By default, Pacemaker clones aren't interleaved. This means that if Pacemaker wants to restart a dead clone instance, and there is an order constraint on that clone, it will do the same restart on all other nodes, even if all the others are healthy. More details on interleaving are here: https://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones/ This behaviour is far more disruptive than we want. For example, in https://bugzilla.suse.com/show_bug.cgi?id=965886 we saw that when a network node dies and Pacemaker wants to stop the instance of cl-g-neutron-agents on that node, it also stops and restarts the same clone instances on the healthy nodes. This means there is a small window in which there are no neutron agents running anywhere. If neutron-ha-tool attempts a router migration during this window, it will fail, at which point things start to go badly wrong. In general, the cloned (i.e. active/active) services on our controller and compute nodes should all behave like independent vertical stacks, so that a failure on one node should not cause ripple effects on other nodes. So we interleave all our clones. (There is a corresponding commit to crowbar-ha for the Apache clone.) (cherry picked from commit bdde4b4dc2534e91bf1f2869a66491463134f8c1) --- chef/cookbooks/neutron/recipes/network_agents_ha.rb | 3 +++ chef/cookbooks/neutron/recipes/server_ha.rb | 3 +++ 2 files changed, 6 insertions(+) diff --git a/chef/cookbooks/neutron/recipes/network_agents_ha.rb b/chef/cookbooks/neutron/recipes/network_agents_ha.rb index cfb422f..e17053a 100644 --- a/chef/cookbooks/neutron/recipes/network_agents_ha.rb +++ b/chef/cookbooks/neutron/recipes/network_agents_ha.rb @@ -113,6 +113,9 @@ pacemaker_clone agents_clone_name do rsc agents_group_name action [ :create, :start ] + meta ({ + "interleave" => "true", + }) only_if { CrowbarPacemakerHelper.is_cluster_founder?(node) } end diff --git a/chef/cookbooks/neutron/recipes/server_ha.rb b/chef/cookbooks/neutron/recipes/server_ha.rb index 7d4d720..ae42bc4 100644 --- a/chef/cookbooks/neutron/recipes/server_ha.rb +++ b/chef/cookbooks/neutron/recipes/server_ha.rb @@ -41,6 +41,9 @@ pacemaker_clone "cl-#{primitive_name}" do rsc primitive_name action [:create, :start] + meta ({ + "interleave" => "true", + }) only_if { CrowbarPacemakerHelper.is_cluster_founder?(node) } end