Doc: recovery an infra node on RHOSP10 #369

ioggstream · 2017-07-07T13:08:10Z

Note: the following procedure just fix the heat part of node recovery.
There is another issue with fragments/bastion-node-cleanup.sh, that is run against the new node: I will fix this in another patch / issue

I wish

to document a recovery procedure for RHOSP10 in README.md
together with

https://github.com/redhat-openstack/openshift-on-openstack/#removing-or-replacing-specific-nodes

When

A node is removed or is failed at application level, eg: simulate with

openstack server remove shift-infra-1.example.com

The following partially worked for me

Find the nested infra stack as $INFRA_STACK_ID

openstack stack resource list --nested 3 shift | grep infra


$ o stack resource list shift-openshift_infra_nodes-iiwbo4jwsizr
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                              | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| 1             | 7b42c7ba-0c12-4860-bd25-be5b9404cca3 | file:///home/shift/ooo-35/infra.yaml | CHECK_FAILED    | 2017-07-07T09:02:00Z |
| 0             | b4e0a752-3bce-46f5-96c5-7cc9dde21ec8 | file:///home/shift/ooo-35/infra.yaml | UPDATE_COMPLETE | 2017-07-07T09:02:00Z |

Mark the unhealthy node

 openstack  stack resource mark unhealthy \
    shift-openshift_infra_nodes-iiwbo4jwsizr \
   1 \
   "node has a broken disk"

Update the stack

o stack update shift --existing

Check status

 o stack resource list  shift-35v2-openshift_infra_nodes-iiwbo4jwsizr
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                              | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| 1             | 1ea7050f-08fe-4892-9908-36dfa4cffad2 | file:///home/shift/ooo-35/infra.yaml | CREATE_COMPLETE | 2017-07-07T12:51:21Z |
| 0             | b4e0a752-3bce-46f5-96c5-7cc9dde21ec8 | file:///home/shift/ooo-35/infra.yaml | UPDATE_COMPLETE | 2017-07-07T12:51:20Z |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+

The text was updated successfully, but these errors were encountered:

…ment without destroying associated resources. redhat-openstack#369

ioggstream added a commit to ioggstream/openshift-on-openstack that referenced this issue Sep 29, 2017

Replace WaitCondition with UpdateWaitConditio to support host replace…

018cc88

…ment without destroying associated resources. redhat-openstack#369

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: recovery an infra node on RHOSP10 #369

Doc: recovery an infra node on RHOSP10 #369

ioggstream commented Jul 7, 2017 •

edited

Loading

Doc: recovery an infra node on RHOSP10 #369

Doc: recovery an infra node on RHOSP10 #369

Comments

ioggstream commented Jul 7, 2017 • edited Loading

I wish

When

The following partially worked for me

ioggstream commented Jul 7, 2017 •

edited

Loading