suse_ha: state gets stuck on quorum warning #19

tacerus · 2023-05-17T17:06:40Z

If the cluster is lacking quorum (i.e. only one node is active), a state.apply suse_ha gets stuck forever during the ha_add_node_utilization_primitive run. Inspecting the process list reveals the culprit to be a stuck crm process, killing it reveals:

----------
          ID: ha_add_node_utilization_primitive
    Function: cmd.run
        Name: crm configure primitive p-node-utilization ocf:pacemaker:NodeUtilization op start timeout=90 interval=0 op stop timeout=100 interval=0 op monitor timeout=20s interval=60s meta targe
t-role=Started
      Result: False
     Comment: Command "crm configure primitive p-node-utilization ocf:pacemaker:NodeUtilization op start timeout=90 interval=0 op stop timeout=100 interval=0 op monitor timeout=20s interval=60s m
eta target-role=Started" run
     Started: 14:57:35.486354
    Duration: 3748496.266 ms
     Changes:
              ----------
              pid:
                  172035
              retcode:
                  -15
              stderr:
                  ?[31mERROR?[0m: (unpack_resources)    error: Resource start-up disabled since no STONITH resources have been defined
                  ?[31mERROR?[0m: (unpack_resources)    error: Either configure some or disable STONITH with the stonith-enabled option
                  ?[31mERROR?[0m: (unpack_resources)    error: NOTE: Clusters with shared data need STONITH to ensure data integrity
                  ?[31mERROR?[0m: crm_verify: Errors found during check: config not valid
              stdout:
                  ?[33mWARNING?[0m: (cluster_status)    warning: Fencing and resource management disabled due to lack of quorum
                  Do you still want to commit (y/n)? Do you still want to commit (y/n)? Do you still want to commit (y/n)? Do you still want to commit (y/n)?

A way to have it automatically answer with "n" or an alternative non-interactive configuration call needs to be implemented.

The text was updated successfully, but these errors were encountered:

cboltz · 2023-05-20T21:54:42Z

echo n | crm configure ... - or yes n | crm configure ... if you potentially need more than one n.

That said - a more sane handling in crm would be better.

tacerus · 2023-05-20T22:33:42Z

That's a neat idea, thank you! I'll use that if I don't find a native way.

tacerus · 2023-06-19T19:38:43Z

During my testing for #26, I did not face this issue, despite applying on a clean cluster. I will have to investigate some more if there are any conditions in the Salt logic which could trigger this.

tacerus added bug Something isn't working suse_ha-formula Everything related to the suse_ha formula labels May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suse_ha: state gets stuck on quorum warning #19

suse_ha: state gets stuck on quorum warning #19

tacerus commented May 17, 2023

cboltz commented May 20, 2023

tacerus commented May 20, 2023

tacerus commented Jun 19, 2023

suse_ha: state gets stuck on quorum warning #19

suse_ha: state gets stuck on quorum warning #19

Comments

tacerus commented May 17, 2023

cboltz commented May 20, 2023

tacerus commented May 20, 2023

tacerus commented Jun 19, 2023