From 3690d764f3fb3d2755fdd7573ab311d12ff13308 Mon Sep 17 00:00:00 2001 From: github-actions Date: Fri, 2 Feb 2024 13:17:52 +0000 Subject: [PATCH] Rendered docs --- dev/.timestamp-images | 0 dev/images/.gitkeep | 0 dev/index.html | 1331 +++++++ index.html | 9 + user/.timestamp-images | 0 user/images/.gitkeep | 0 user/index.html | 7909 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 9249 insertions(+) create mode 100644 dev/.timestamp-images create mode 100644 dev/images/.gitkeep create mode 100644 dev/index.html create mode 100644 index.html create mode 100644 user/.timestamp-images create mode 100644 user/images/.gitkeep create mode 100644 user/index.html diff --git a/dev/.timestamp-images b/dev/.timestamp-images new file mode 100644 index 000000000..e69de29bb diff --git a/dev/images/.gitkeep b/dev/images/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/dev/index.html b/dev/index.html new file mode 100644 index 000000000..5acf258f8 --- /dev/null +++ b/dev/index.html @@ -0,0 +1,1331 @@ + + + + + + + +Data Plane Adoption contributor documentation + + + + + + +
+
+

Development environment

+
+
+

The Adoption development environment utilizes +install_yamls +for CRC VM creation and for creation of the VM that hosts the source +Wallaby (or OSP 17.1) OpenStack in Standalone configuration.

+
+
+

Environment prep

+
+

Get dataplane adoption repo:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/data-plane-adoption.git ~/data-plane-adoption
+
+
+
+

Get install_yamls:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/install_yamls.git ~/install_yamls
+
+
+
+

Install tools for operator development:

+
+
+
+
cd ~/install_yamls/devsetup
+make download_tools
+
+
+
+
+

Deployment of CRC with network isolation

+
+
+
cd ~/install_yamls/devsetup
+PULL_SECRET=$HOME/pull-secret.txt CPUS=12 MEMORY=40000 DISK=100 make crc
+
+eval $(crc oc-env)
+oc login -u kubeadmin -p 12345678 https://api.crc.testing:6443
+
+make crc_attach_default_interface
+
+
+
+
+

Development environment with Openstack ironic

+
+

Create the BMaaS network (crc-bmaas) and virtual baremetal nodes controlled by +a RedFish BMC emulator.

+
+
+
+
cd ~/install_yamls
+make nmstate
+make namespace
+cd devsetup  # back to install_yamls/devsetup
+make bmaas
+
+
+
+

A node definition YAML file to use with the openstack baremetal +create <file>.yaml command can be generated for the virtual baremetal +nodes by running the bmaas_generate_nodes_yaml make target. Store it +in a temp file for later.

+
+
+
+
make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml
+
+
+
+

Set variables to deploy edpm Standalone with additional network +(baremetal) and compute driver ironic.

+
+
+
+
cat << EOF > /tmp/addtional_nets.json
+[
+  {
+    "type": "network",
+    "name": "crc-bmaas",
+    "standalone_config": {
+      "type": "ovs_bridge",
+      "name": "baremetal",
+      "mtu": 1500,
+      "vip": true,
+      "ip_subnet": "172.20.1.0/24",
+      "allocation_pools": [
+        {
+          "start": "172.20.1.100",
+          "end": "172.20.1.150"
+        }
+      ],
+      "host_routes": [
+        {
+          "destination": "192.168.130.0/24",
+          "nexthop": "172.20.1.1"
+        }
+      ]
+    }
+  }
+]
+EOF
+export EDPM_COMPUTE_ADDITIONAL_NETWORKS=$(jq -c . /tmp/addtional_nets.json)
+export STANDALONE_COMPUTE_DRIVER=ironic
+export NTP_SERVER=pool.ntp.org  # Only neccecary if not on the RedHat network ...
+export EDPM_COMPUTE_CEPH_ENABLED=false  # Optional
+
+
+
+
+

Use the install_yamls devsetup +to create a virtual machine (edpm-compute-0) connected to the isolated networks.

+
+
+

To use OSP 17.1 content to deploy TripleO Standalone, follow the +guide for setting up downstream content +for make standalone.

+
+
+

To use Wallaby content instead, run the following:

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone
+
+
+
+
+
+

Install the openstack-k8s-operators (openstack-operator)

+
+
+
cd ..  # back to install_yamls
+make crc_storage
+make input
+make openstack
+
+
+
+

Convenience steps

+
+

To make our life easier we can copy the deployment passwords we’ll be using +in the backend services deployment phase of the data plane adoption.

+
+
+
+
scp -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100:/root/tripleo-standalone-passwords.yaml ~/
+
+
+
+

If we want to be able to easily run openstack commands from the host without +actually installing the package and copying the configuration file from the VM +we can create a simple alias:

+
+
+
+
alias openstack="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 OS_CLOUD=standalone openstack"
+
+
+
+
+

Route networks

+
+

Route VLAN20 to have access to the MariaDB cluster:

+
+
+
+
EDPM_BRIDGE=$(sudo virsh dumpxml edpm-compute-0 | grep -oP "(?<=bridge=').*(?=')")
+sudo ip link add link $EDPM_BRIDGE name vlan20 type vlan id 20
+sudo ip addr add dev vlan20 172.17.0.222/24
+sudo ip link set up dev vlan20
+
+
+
+
+

Snapshot/revert

+
+

When the deployment of the Standalone OpenStack is finished, it’s a +good time to snapshot the machine, so that multiple Adoption attempts +can be done without having to deploy from scratch.

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_snapshot
+
+
+
+

And when you wish to revert the Standalone deployment to the +snapshotted state:

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_revert
+
+
+
+

Similar snapshot could be done for the CRC virtual machine, but the +developer environment reset on CRC side can be done sufficiently via +the install_yamls *_cleanup targets. This is further detailed in +the section: +Reset the environment to pre-adoption state

+
+
+
+

Create a workload to adopt

+
+
+
Ironic Steps
+
+
+
# Enroll baremetal nodes
+make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml
+scp -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa /tmp/ironic_nodes.yaml root@192.168.122.100:
+ssh -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100
+
+export OS_CLOUD=standalone
+openstack baremetal create /root/ironic_nodes.yaml
+export IRONIC_PYTHON_AGENT_RAMDISK_ID=$(openstack image show deploy-ramdisk -c id -f value)
+export IRONIC_PYTHON_AGENT_KERNEL_ID=$(openstack image show deploy-kernel -c id -f value)
+for node in $(openstack baremetal node list -c UUID -f value); do
+  openstack baremetal node set $node \
+    --driver-info deploy_ramdisk=${IRONIC_PYTHON_AGENT_RAMDISK_ID} \
+    --driver-info deploy_kernel=${IRONIC_PYTHON_AGENT_KERNEL_ID} \
+    --resource-class baremetal \
+    --property capabilities='boot_mode:uefi'
+done
+
+# Create a baremetal flavor
+openstack flavor create baremetal --ram 1024 --vcpus 1 --disk 15 \
+  --property resources:VCPU=0 \
+  --property resources:MEMORY_MB=0 \
+  --property resources:DISK_GB=0 \
+  --property resources:CUSTOM_BAREMETAL=1 \
+  --property capabilities:boot_mode="uefi"
+
+# Create image
+IMG=Fedora-Cloud-Base-38-1.6.x86_64.qcow2
+URL=https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/$IMG
+curl -o /tmp/${IMG} -L $URL
+DISK_FORMAT=$(qemu-img info /tmp/${IMG} | grep "file format:" | awk '{print $NF}')
+openstack image create --container-format bare --disk-format ${DISK_FORMAT} Fedora-Cloud-Base-38 < /tmp/${IMG}
+
+export BAREMETAL_NODES=$(openstack baremetal node list -c UUID -f value)
+# Manage nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal node manage $node
+done
+
+# Wait for nodes to reach "manageable" state
+watch openstack baremetal node list
+
+# Inspect baremetal nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal introspection start $node
+done
+
+# Wait for inspection to complete
+watch openstack baremetal introspection list
+
+# Provide nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal node provide $node
+done
+
+# Wait for nodes to reach "available" state
+watch openstack baremetal node list
+
+# Create an instance on baremetal
+openstack server show baremetal-test || {
+    openstack server create baremetal-test --flavor baremetal --image Fedora-Cloud-Base-38 --nic net-id=provisioning --wait
+}
+
+# Check instance status and network connectivity
+openstack server show baremetal-test
+ping -c 4 $(openstack server show baremetal-test -f json -c addresses | jq -r .addresses.provisioning[0])
+
+
+
+
+
+
Virtual Machine Steps
+
+
+
cd ~/data-plane-adoption
+bash tests/roles/development_environment/templates/pre_launch.bash
+
+
+
+
+
+
Ceph Storage Steps
+
+

Make sure a cinder-volume backend is properly configured, or skip below steps +to create a test workload without volume attachments.

+
+
+

Confirm the image UUID can be seen in Ceph’s images pool.

+
+
+
+
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l
+
+
+
+

Create a Cinder volume, a backup from it, and snapshot it.

+
+
+
+
openstack volume create --image cirros --bootable --size 1 disk
+openstack volume backup create --name backup disk
+openstack volume snapshot create --volume disk snapshot
+
+
+
+

Add volume to the test VM

+
+
+
+
openstack server add volume test disk
+
+
+
+
+
+
+

Performing the Data Plane Adoption

+
+

The development environment is now set up, you can go to the Adoption +documentation +and perform adoption manually, or run the test +suite +against your environment.

+
+
+
+

Reset the environment to pre-adoption state

+
+

The development environment must be rolled back in case we want to execute another Adoption run.

+
+
+

Delete the data-plane and control-plane resources from the CRC vm

+
+
+
+
oc delete --ignore-not-found=true --wait=false openstackdataplanedeployment/openstack
+oc delete --ignore-not-found=true --wait=false openstackdataplanedeployment/openstack-nova-compute-ffu
+oc delete --ignore-not-found=true --wait=false openstackcontrolplane/openstack
+oc patch openstackcontrolplane openstack --type=merge --patch '
+metadata:
+  finalizers: []
+' || true
+
+while oc get pod | grep rabbitmq-server-0; do
+    sleep 2
+done
+while oc get pod | grep openstack-galera-0; do
+    sleep 2
+done
+
+oc delete --wait=false pod ovn-copy-data || true
+oc delete secret osp-secret || true
+
+
+
+

Revert the standalone vm to the snapshotted state

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_revert
+
+
+
+

Clean up and initialize the storage PVs in CRC vm

+
+
+
+
cd ..
+for i in {1..3}; do make crc_storage_cleanup crc_storage && break || sleep 5; done
+
+
+
+
+

Experimenting with an additional compute node

+
+

The following is not on the critical path of preparing the development +environment for Adoption, but it shows how to make the environment +work with an additional compute node VM.

+
+
+

The remaining steps should be completed on the hypervisor hosting crc +and edpm-compute-0.

+
+
+

Deploy NG Control Plane with Ceph

+
+

Export the Ceph configuration from edpm-compute-0 into a secret.

+
+
+
+
SSH=$(ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100)
+KEY=$($SSH "cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0")
+CONF=$($SSH "cat /etc/ceph/ceph.conf | base64 -w 0")
+
+cat <<EOF > ceph_secret.yaml
+apiVersion: v1
+data:
+  ceph.client.openstack.keyring: $KEY
+  ceph.conf: $CONF
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+type: Opaque
+EOF
+
+oc create -f ceph_secret.yaml
+
+
+
+

Deploy the NG control plane with Ceph as backend for Glance and +Cinder. As described in +the install_yamls README, +use the sample config located at +https://github.com/openstack-k8s-operators/openstack-operator/blob/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml +but make sure to replace the FSID in the sample with the one from +the secret created in the previous step.

+
+
+
+
curl -o /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml https://raw.githubusercontent.com/openstack-k8s-operators/openstack-operator/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data."ceph.conf"' | base64 -d | grep fsid | sed -e 's/fsid = //') && echo $FSID
+sed -i "s/_FSID_/${FSID}/" /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+oc apply -f /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+
+
+
+

A NG control plane which uses the same Ceph backend should now be +functional. If you create a test image on the NG system to confirm +it works from the configuration above, be sure to read the warning +in the next section.

+
+
+

Before beginning adoption testing or development you may wish to +deploy an EDPM node as described in the following section.

+
+
+
+

Warning about two OpenStacks and one Ceph

+
+

Though workloads can be created in the NG deployment to test, be +careful not to confuse them with workloads from the Wallaby cluster +to be migrated. The following scenario is now possible.

+
+
+

A Glance image exists on the Wallaby OpenStack to be adopted.

+
+
+
+
[stack@standalone standalone]$ export OS_CLOUD=standalone
+[stack@standalone standalone]$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 33a43519-a960-4cd0-a593-eca56ee553aa | cirros | active |
++--------------------------------------+--------+--------+
+[stack@standalone standalone]$
+
+
+
+

If you now create an image with the NG cluster, then a Glance image +will exsit on the NG OpenStack which will adopt the workloads of the +wallaby.

+
+
+
+
[fultonj@hamfast ng]$ export OS_CLOUD=default
+[fultonj@hamfast ng]$ export OS_PASSWORD=12345678
+[fultonj@hamfast ng]$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 4ebccb29-193b-4d52-9ffd-034d440e073c | cirros | active |
++--------------------------------------+--------+--------+
+[fultonj@hamfast ng]$
+
+
+
+

Both Glance images are stored in the same Ceph pool.

+
+
+
+
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l
+Inferring fsid 7133115f-7751-5c2f-88bd-fbff2f140791
+Using recent ceph image quay.rdoproject.org/tripleowallabycentos9/daemon@sha256:aa259dd2439dfaa60b27c9ebb4fb310cdf1e8e62aa7467df350baf22c5d992d8
+NAME                                       SIZE     PARENT  FMT  PROT  LOCK
+33a43519-a960-4cd0-a593-eca56ee553aa         273 B            2
+33a43519-a960-4cd0-a593-eca56ee553aa@snap    273 B            2  yes
+4ebccb29-193b-4d52-9ffd-034d440e073c       112 MiB            2
+4ebccb29-193b-4d52-9ffd-034d440e073c@snap  112 MiB            2  yes
+
+
+
+

However, as far as each Glance service is concerned each has one +image. Thus, in order to avoid confusion during adoption the test +Glance image on the NG OpenStack should be deleted.

+
+
+
+
openstack image delete 4ebccb29-193b-4d52-9ffd-034d440e073c
+
+
+
+

Connecting the NG OpenStack to the existing Ceph cluster is part of +the adoption procedure so that the data migration can be minimized +but understand the implications of the above example.

+
+
+
+

Deploy edpm-compute-1

+
+

edpm-compute-0 is not available as a standard EDPM system to be +managed by edpm-ansible +or +dataplane-operator +because it hosts the wallaby deployment which will be adopted +and after adoption it will only host the Ceph server.

+
+
+

Use the install_yamls devsetup +to create additional virtual machines and be sure +that the EDPM_COMPUTE_SUFFIX is set to 1 or greater. +Do not set EDPM_COMPUTE_SUFFIX to 0 or you could delete +the Wallaby system created in the previous section.

+
+
+

When deploying EDPM nodes add an extraMounts like the following in +the OpenStackDataPlaneNodeSet CR nodeTemplate so that they will be +configured to use the same Ceph cluster.

+
+
+
+
    edpm-compute:
+      nodeTemplate:
+        extraMounts:
+        - extraVolType: Ceph
+          volumes:
+          - name: ceph
+            secret:
+              secretName: ceph-conf-files
+          mounts:
+          - name: ceph
+            mountPath: "/etc/ceph"
+            readOnly: true
+
+
+
+

A NG data plane which uses the same Ceph backend should now be +functional. Be careful about not confusing new workloads to test the +NG OpenStack with the Wallaby OpenStack as described in the previous +section.

+
+
+
+

Begin Adoption Testing or Development

+
+

We should now have:

+
+
+
    +
  • +

    An NG glance service based on Antelope running on CRC

    +
  • +
  • +

    An TripleO-deployed glance serviced running on edpm-compute-0

    +
  • +
  • +

    Both services have the same Ceph backend

    +
  • +
  • +

    Each service has their own independent database

    +
  • +
+
+
+

An environment above is assumed to be available in the +Glance Adoption documentation. You +may now follow other Data Plane Adoption procedures described in the +documentation. +The same pattern can be applied to other services.

+
+
+
+
+
+
+

Contributing to documentation

+
+
+

Rendering documentation locally

+
+

Install docs build requirements into virtualenv:

+
+
+
+
python3 -m venv local/docs-venv
+source local/docs-venv/bin/activate
+pip install -r docs/doc_requirements.txt
+
+
+
+

Serve docs site on localhost:

+
+
+
+
mkdocs serve
+
+
+
+

Click the link it outputs. As you save changes to files modified in your editor, +the browser will automatically show the new content.

+
+
+
+

Patterns and tips for contributing to documentation

+
+
    +
  • +

    Pages concerning individual components/services should make sense in +the context of the broader adoption procedure. While adopting a +service in isolation is an option for developers, let’s write the +documentation with the assumption the adoption procedure is being +done in full, going step by step (one doc after another).

    +
  • +
  • +

    The procedure should be written with production use in mind. This +repository could be used as a starting point for product +technical documentation. We should not tie the documentation to +something that wouldn’t translate well from dev envs to production.

    +
    +
      +
    • +

      This includes not assuming that the source environment is +Standalone, and the destination is CRC. We can provide examples for +Standalone/CRC, but it should be possible to use the procedure +with fuller environments in a way that is obvious from the docs.

      +
    • +
    +
    +
  • +
  • +

    If possible, try to make code snippets copy-pastable. Use shell +variables if the snippets should be parametrized. Use oc rather +than kubectl in snippets.

    +
  • +
  • +

    Focus on the "happy path" in the docs as much as possible, +troubleshooting info can go into the Troubleshooting page, or +alternatively a troubleshooting section at the end of the document, +visibly separated from the main procedure.

    +
  • +
  • +

    The full procedure will inevitably happen to be quite long, so let’s +try to be concise in writing to keep the docs consumable (but not to +a point of making things difficult to understand or omitting +important things).

    +
  • +
  • +

    A bash alias can be created for long command however when implementing +them in the test roles you should transform them to avoid command not +found errors. +From:

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +openstack endpoint list | grep network
    +
    +
    +
    +

    To:

    +
    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +${BASH_ALIASES[openstack]} endpoint list | grep network
    +
    +
    +
  • +
+
+
+
+
+
+

Tests

+
+
+

Test suite information

+
+

The adoption docs repository also includes a test suite for Adoption. +There are targets in the Makefile which can be used to execute the +test suite:

+
+
+
    +
  • +

    test-minimal - a minimal test scenario, the eventual set of +services in this scenario should be the "core" services needed to +launch a VM. This scenario assumes local storage backend for +services like Glance and Cinder.

    +
  • +
  • +

    test-with-ceph - like 'minimal' but with Ceph storage backend for +Glance and Cinder.

    +
  • +
+
+
+
+

Configuring the test suite

+
+
    +
  • +

    Create tests/vars.yaml and tests/secrets.yaml by copying the +included samples (tests/vars.sample.yaml, +tests/secrets.sample.yaml).

    +
  • +
  • +

    Walk through the tests/vars.yaml and tests/secrets.yaml files +and see if you need to edit any values. If you are using the +documented development environment, majority of the defaults should +work out of the box. The comments in the YAML files will guide you +regarding the expected values. You may want to double check that +these variables suit your environment:

    +
    +
      +
    • +

      install_yamls_path

      +
    • +
    • +

      tripleo_passwords

      +
    • +
    • +

      controller*_ssh

      +
    • +
    • +

      edpm_privatekey_path

      +
    • +
    • +

      timesync_ntp_servers

      +
    • +
    +
    +
  • +
+
+
+
+

Running the tests

+
+

The interface between the execution infrastructure and the test suite +is an Ansible inventory and variables files. Inventory and variable +samples are provided. To run the tests, follow this procedure:

+
+
+
    +
  • +

    Install dependencies and create a venv:

    +
    +
    +
    sudo dnf -y install python-devel
    +python3 -m venv venv
    +source venv/bin/activate
    +pip install openstackclient osc_placement jmespath
    +ansible-galaxy collection install community.general
    +
    +
    +
  • +
  • +

    Run make test-with-ceph (the documented development environment +does include Ceph).

    +
    +

    If you are using Ceph-less environment, you should run make +test-minimal.

    +
    +
  • +
+
+
+
+

Making patches to the test suite

+
+

Please be aware of the following when changing the test suite:

+
+
+
    +
  • +

    The test suite should follow the docs as much as possible.

    +
    +

    The purpose of the test suite is to verify what the user would run +if they were following the docs. We don’t want to loosely rewrite +the docs into Ansible code following Ansible best practices. We want +to test the exact same bash commands/snippets that are written in +the docs. This often means that we should be using the shell +module and do a verbatim copy/paste from docs, instead of using the +best Ansible module for the task at hand.

    +
    +
  • +
+
+
+
+
+
+ + + + + + + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 000000000..108b3e977 --- /dev/null +++ b/index.html @@ -0,0 +1,9 @@ + + +

Data Plane Adoption documentation

+ + + diff --git a/user/.timestamp-images b/user/.timestamp-images new file mode 100644 index 000000000..e69de29bb diff --git a/user/images/.gitkeep b/user/images/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/user/index.html b/user/index.html new file mode 100644 index 000000000..8212d26ee --- /dev/null +++ b/user/index.html @@ -0,0 +1,7909 @@ + + + + + + + +OpenStack adoption user documentation + + + + + + +
+
+

OpenStack adoption

+
+
+

Planning the new deployment

+
+

Just like you did back when you installed your Director deployed OpenStack, the +upgrade/migration to the podified OpenStack requires planning various aspects +of the environment such as node roles, planning your network topology, and +storage.

+
+
+

In this document we cover some of this planning, but it is recommended to read +the whole adoption guide before actually starting the process to be sure that +there is a global understanding of the whole process.

+
+
+

Configurations

+
+

There is a fundamental difference between the Director and Operator deployments +regarding the configuration of the services.

+
+
+

In Director deployments many of the service configurations are abstracted by +Director specific configuration options. A single Director option may trigger +changes for multiple services and support for drivers (for example Cinder’s) +required patches to the Director code base.

+
+
+

In Operator deployments this has changed to what we believe is a simpler +approach: reduce the installer specific knowledge and leverage OpenShift and +OpenStack service specific knowledge whenever possible.

+
+
+

To this effect OpenStack services will have sensible defaults for OpenShift +deployments and human operators will provide configuration snippets to provide +necessary configuration, such as cinder backend configuration, or to override +the defaults.

+
+
+

This shortens the distance between a service specific configuration file (such +as cinder.conf) and what the human operator provides in the manifests.

+
+
+

These configuration snippets are passed to the operators in the different +customServiceConfig sections available in the manifests, and then they are +layered in the services available in the following levels. To illustrate this, +if we were to set a configuration at the top Cinder level (spec: cinder: +template:) then it would be applied to all the cinder services; for example to +enable debug in all the cinder services we would do:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    template:
+      customServiceConfig: |
+        [DEFAULT]
+        debug = True
+< . . . >
+
+
+
+

If we only wanted to set it for one of the cinder services, for example the +scheduler, then we would use the cinderScheduler section instead:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    template:
+      cinderScheduler:
+        customServiceConfig: |
+          [DEFAULT]
+          debug = True
+< . . . >
+
+
+
+

In openshift it is not recommended to store sensitive information like the +credentials to the cinder storage array in the CRs, so most OpenStack operators +have a mechanism to use OpenShift’s Secrets for sensitive configuration +parameters of the services and then use then by reference in the +customServiceConfigSecrets section which is analogous to the +customServiceConfig.

+
+
+

The contents of the Secret references passed in the +customServiceConfigSecrets will have the same format as customServiceConfig: +a snippet with the section/s and configuration options.

+
+
+

When there are sensitive information in the service configuration then it +becomes a matter of personal preference whether to store all the configuration +in the Secret or only the sensitive parts, but remember that if we split the +configuration between Secret and customServiceConfig we still need the +section header (eg: [DEFAULT]) to be present in both places.

+
+
+

Attention should be paid to each service’s adoption process as they may have +some particularities regarding their configuration.

+
+
+
+

Configuration tooling

+
+

In order to help users to handle the configuration for the TripleO and Openstack +services the tool: https://github.com/openstack-k8s-operators/os-diff has been +develop to compare the configuration files between the TripleO deployment and +the next gen cloud. +Make sure Golang is installed and configured on your env:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/os-diff
+pushd os-diff
+make build
+
+
+
+

Then configure ansible.cfg and ssh-config file according to your environment:

+
+
+
+
Host *
+    IdentitiesOnly yes
+
+Host virthost
+    Hostname virthost
+    IdentityFile ~/.ssh/id_rsa
+    User root
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+
+Host standalone
+    Hostname standalone
+    IdentityFile ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa
+    User root
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+Host crc
+    Hostname crc
+    IdentityFile ~/.ssh/id_rsa
+    User stack
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+
+
+

And test your connection:

+
+
+
+
ssh -F ssh.config standalone
+
+
+
+
+

Node Roles

+
+

In Director deployments we had 4 different standard roles for the nodes: +Controller, Compute, Ceph Storage, Swift Storage, but in podified +OpenStack we just make a distinction based on where things are running, in +OpenShift or external to it.

+
+
+

When adopting a Director OpenStack your Compute nodes will directly become +external nodes, so there should not be much additional planning needed there.

+
+
+

In many deployments being adopted the Controller nodes will require some +thought because we’ll have many OpenShift nodes where the controller services +could run, and we have to decide which ones we want to use, how we are going to +use them, and make sure those nodes are ready to run the services.

+
+
+

In most deployments running OpenStack services on master nodes can have a +seriously adverse impact on the OpenShift cluster, so we recommend placing +OpenStack services on non master nodes.

+
+
+

By default OpenStack Operators deploy OpenStack services on any worker node, but +that is not necessarily what’s best for all deployments, and there may be even +services that won’t even work deployed like that.

+
+
+

When planing a deployment it’s good to remember that not all the services on an +OpenStack deployments are the same as they have very different requirements.

+
+
+

Looking at the Cinder component we can clearly see different requirements for +its services: the cinder-scheduler is a very light service with low +memory, disk, network, and CPU usage; cinder-api service has a higher network +usage due to resource listing requests; the cinder-volume service will have a +high disk and network usage since many of its operations are in the data path +(offline volume migration, create volume from image, etc.), and then we have +the cinder-backup service which has high memory, network, and CPU (to compress +data) requirements.

+
+
+

We also have the Glance and Swift components that are in the data path, and +let’s not forget RabbitMQ and Galera services.

+
+
+

Given these requirements it may be preferable not to let these services wander +all over your OpenShift worker nodes with the possibility of impacting other +workloads, or maybe you don’t mind the light services wandering around but you +want to pin down the heavy ones to a set of infrastructure nodes.

+
+
+

There are also hardware restrictions to take into consideration, because if we +are using a Fibre Channel (FC) Cinder backend we’ll need the cinder-volume, +cinder-backup, and maybe even the glance (if it’s using Cinder as a backend) +services to run on a OpenShift host that has an HBA.

+
+
+

The OpenStack Operators allow a great deal of flexibility on where to run the +OpenStack services, as we can use node labels to define which OpenShift nodes +are eligible to run the different OpenStack services. Refer to the Node +Selector guide to learn more about using labels to define +placement of the OpenStack services.

+
+
+

TODO: Talk about Ceph Storage and Swift Storage nodes, HCI deployments, +etc.

+
+
+
+

Network

+
+

TODO: Write about isolated networks, NetworkAttachmentDefinition, +NetworkAttachmets, etc

+
+
+
+

Storage

+
+

When looking into the storage in an OpenStack deployment we can differentiate +2 different kinds, the storage requirements of the services themselves and the +storage used for the OpenStack users that thee services will manage.

+
+
+

These requirements may drive our OpenShift node selection, as mentioned above, +and may even require us to do some preparations on the OpenShift nodes before +we can deploy the services.

+
+
+

TODO: Galera, RabbitMQ, Swift, Glance, etc.

+
+
+
Cinder requirements
+
+

The Cinder service has both local storage used by the service and OpenStack user +requirements.

+
+
+

Local storage is used for example when downloading a glance image for the create +volume from image operation, which can become considerable when having +concurrent operations and not using cinder volume cache.

+
+
+

In the Operator deployed OpenStack we now have an easy way to configure the +location of the conversion directory to be an NFS share (using the extra +volumes feature), something that needed to be done manually before.

+
+
+

Even if it’s an adoption and it may seem that there’s nothing to consider +regarding the Cinder backends, because we’ll just be using the same ones we are +using in our current deployment, we should still evaluate it, because it may not +be so straightforward.

+
+
+

First we need to check the transport protocol the Cinder backends are using: +RBD, iSCSI, FC, NFS, NVMe-oF, etc.

+
+
+

Once we know all the transport protocols we are using, we can proceed to make +sure we are taking them into consideration when placing the Cinder services +(as mentioned above in the Node Roles section) and the right storage transport +related binaries are running on the OpenShift nodes.

+
+
+

Detailed information about the specifics for each storage transport protocol can +be found in the Cinder Adoption section. Please take a +good look at that document before proceeding to be able to plan the adoption +better.

+
+
+
+
+
+

Node Selector

+
+

There are a variety of reasons why we may want to restrict the nodes where +OpenStack services can be placed:

+
+
+
    +
  • +

    Hardware requirements: System memory, Disk space, Cores, HBAs

    +
  • +
  • +

    Limit the impact of the OpenStack services on other OpenShift workloads.

    +
  • +
  • +

    Avoid collocating OpenStack services.

    +
  • +
+
+
+

The mechanism provided by the OpenStack operators to achieve this is through the +use of labels.

+
+
+

We would either label the OpenShift nodes or use existing labels they already +have, and then use those labels in the OpenStack manifests in the +nodeSelector field.

+
+
+

The nodeSelector field in the OpenStack manifests follows the standard +OpenShift nodeSelector field, please refer to the OpenShift documentation on +the matter +additional information.

+
+
+

This field is present at all the different levels of the OpenStack manifests:

+
+
+
    +
  • +

    Deployment: The OpenStackControlPlane object.

    +
  • +
  • +

    Component: For example the cinder element in the OpenStackControlPlane.

    +
  • +
  • +

    Service: For example the cinderVolume element within the cinder element +in the OpenStackControlPlane.

    +
  • +
+
+
+

This allows a fine grained control of the placement of the OpenStack services +with minimal repetition.

+
+
+

Values of the nodeSelector are propagated to the next levels unless they are +overwritten. This means that a nodeSelector value at the deployment level will +affect all the OpenStack services.

+
+
+

For example we can add label type: openstack to any 3 OpenShift nodes:

+
+
+
+
$ oc label nodes worker0 type=openstack
+$ oc label nodes worker1 type=openstack
+$ oc label nodes worker2 type=openstack
+
+
+
+

And then in our OpenStackControlPlane we can use the label to place all the +services in those 3 nodes:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  nodeSelector:
+    type: openstack
+< . . . >
+
+
+
+

What if we don’t mind where any OpenStack services go but the cinder volume and +backup services because we are using FC and we only have HBAs on a subset of +nodes? Then we can just use the selector on for those specific services, which +for the sake of this example we’ll assume they have the label fc_card: true:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  cinder:
+    template:
+      cinderVolumes:
+          pure_fc:
+            nodeSelector:
+              fc_card: true
+< . . . >
+          lvm-iscsi:
+            nodeSelector:
+              fc_card: true
+< . . . >
+      cinderBackup:
+          nodeSelector:
+            fc_card: true
+< . . . >
+
+
+
+

The Cinder operator does not currently have the possibility of defining +the nodeSelector in cinderVolumes, so we need to specify it on each of the +backends.

+
+
+

It’s possible to leverage labels added by the node feature discovery +operator +to place OpenStack services.

+
+
+

MachineConfig

+
+

Some services require us to have services or kernel modules running on the hosts +where they run, for example iscsid or multipathd daemons, or the +nvme-fabrics kernel module.

+
+
+

For those cases we’ll use MachineConfig manifests, and if we are restricting +the nodes we are placing the OpenStack services using the nodeSelector then +we’ll also want to limit where the MachineConfig is applied.

+
+
+

To define where the MachineConfig can be applied we’ll need to use a +MachineConfigPool that links the MachineConfig to the nodes.

+
+
+

For example to be able to limit MachineConfig to the 3 OpenShift nodes we +marked with the type: openstack label we would create the +MachineConfigPool like this:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfigPool
+metadata:
+  name: openstack
+spec:
+  machineConfigSelector:
+    matchLabels:
+      machineconfiguration.openshift.io/role: openstack
+  nodeSelector:
+    matchLabels:
+      type: openstack
+
+
+
+

And then we could use it in the MachineConfig:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: openstack
+< . . . >
+
+
+ +
+

WARNING: Applying a MachineConfig to an OpenShift node will make the node +reboot.

+
+
+
+
+

Backend services deployment

+
+

The following instructions create OpenStackControlPlane CR with basic +backend services deployed, and all the OpenStack services disabled. +This will be the foundation of the podified control plane.

+
+
+

In subsequent steps, we’ll import the original databases and then add +podified OpenStack control plane services.

+
+
+

Prerequisites

+
+
    +
  • +

    The source cloud which we want to adopt is up and running. It’s on +OpenStack Wallaby release.

    +
  • +
  • +

    A VM instance named test is running on the source cloud and its +floating IP is set into FIP env var. You can use a +helper script +to create that test VM.

    +
  • +
  • +

    The openstack-operator is deployed, but OpenStackControlPlane is +not deployed.

    +
    +

    For developer/CI environments, the openstack operator can be deployed +by running make openstack inside +install_yamls +repo.

    +
    +
    +

    For production environments, the deployment method will likely be +different.

    +
    +
  • +
  • +

    There are free PVs available to be claimed (for MariaDB and RabbitMQ).

    +
    +

    For developer/CI environments driven by install_yamls, make sure +you’ve run make crc_storage.

    +
    +
  • +
+
+
+
+

Variables

+
+
    +
  • +

    Set the desired admin password for the podified deployment. This can +be the original deployment’s admin password or something else.

    +
    +
    +
    ADMIN_PASSWORD=SomePassword
    +
    +
    +
    +

    To use the existing OpenStack deployment password:

    +
    +
    +
    +
    ADMIN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AdminPassword:' | awk -F ': ' '{ print $2; }')
    +
    +
    +
  • +
  • +

    Set service password variables to match the original deployment. +Database passwords can differ in podified environment, but +synchronizing the service account passwords is a required step.

    +
    +

    E.g. in developer environments with TripleO Standalone, the +passwords can be extracted like this:

    +
    +
    +
    +
    AODH_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AodhPassword:' | awk -F ': ' '{ print $2; }')
    +CEILOMETER_METERING_SECRET=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerMeteringSecret:' | awk -F ': ' '{ print $2; }')
    +CEILOMETER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerPassword:' | awk -F ': ' '{ print $2; }')
    +CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')
    +GLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')
    +HEAT_AUTH_ENCRYPTION_KEY=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatAuthEncryptionKey:' | awk -F ': ' '{ print $2; }')
    +HEAT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatPassword:' | awk -F ': ' '{ print $2; }')
    +IRONIC_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' IronicPassword:' | awk -F ': ' '{ print $2; }')
    +MANILA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' ManilaPassword:' | awk -F ': ' '{ print $2; }')
    +NEUTRON_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NeutronPassword:' | awk -F ': ' '{ print $2; }')
    +NOVA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NovaPassword:' | awk -F ': ' '{ print $2; }')
    +OCTAVIA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' OctaviaPassword:' | awk -F ': ' '{ print $2; }')
    +PLACEMENT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' PlacementPassword:' | awk -F ': ' '{ print $2; }')
    +
    +
    +
  • +
+
+
+
+

Pre-checks

+ +
+
+

Procedure - backend services deployment

+
+
    +
  • +

    Make sure you are using the OpenShift namespace where you want the +podified control plane deployed:

    +
    +
    +
    oc project openstack
    +
    +
    +
  • +
  • +

    Create OSP secret.

    +
    +

    The procedure for this will vary, but in developer/CI environments +we use install_yamls:

    +
    +
    +
    +
    # in install_yamls
    +make input
    +
    +
    +
  • +
  • +

    If the $ADMIN_PASSWORD is different than the already set password +in osp-secret, amend the AdminPassword key in the osp-secret +correspondingly:

    +
    +
    +
    oc set data secret/osp-secret "AdminPassword=$ADMIN_PASSWORD"
    +
    +
    +
  • +
  • +

    Set service account passwords in osp-secret to match the service +account passwords from the original deployment:

    +
    +
    +
    oc set data secret/osp-secret "AodhPassword=$AODH_PASSWORD"
    +oc set data secret/osp-secret "CeilometerMeteringSecret=$CEILOMETER_METERING_SECRET"
    +oc set data secret/osp-secret "CeilometerPassword=$CEILOMETER_PASSWORD"
    +oc set data secret/osp-secret "CinderPassword=$CINDER_PASSWORD"
    +oc set data secret/osp-secret "GlancePassword=$GLANCE_PASSWORD"
    +oc set data secret/osp-secret "HeatAuthEncryptionKey=$HEAT_AUTH_ENCRYPTION_KEY"
    +oc set data secret/osp-secret "HeatPassword=$HEAT_PASSWORD"
    +oc set data secret/osp-secret "IronicPassword=$IRONIC_PASSWORD"
    +oc set data secret/osp-secret "ManilaPassword=$MANILA_PASSWORD"
    +oc set data secret/osp-secret "NeutronPassword=$NEUTRON_PASSWORD"
    +oc set data secret/osp-secret "NovaPassword=$NOVA_PASSWORD"
    +oc set data secret/osp-secret "OctaviaPassword=$OCTAVIA_PASSWORD"
    +oc set data secret/osp-secret "PlacementPassword=$PLACEMENT_PASSWORD"
    +
    +
    +
  • +
  • +

    Deploy OpenStackControlPlane. Make sure to only enable DNS, +MariaDB, Memcached, and RabbitMQ services. All other services must +be disabled.

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: core.openstack.org/v1beta1
    +kind: OpenStackControlPlane
    +metadata:
    +  name: openstack
    +spec:
    +  secret: osp-secret
    +  storageClass: local-storage
    +
    +  cinder:
    +    enabled: false
    +    template:
    +      cinderAPI: {}
    +      cinderScheduler: {}
    +      cinderBackup: {}
    +      cinderVolumes: {}
    +
    +  dns:
    +    template:
    +      override:
    +        service:
    +          metadata:
    +            annotations:
    +              metallb.universe.tf/address-pool: ctlplane
    +              metallb.universe.tf/allow-shared-ip: ctlplane
    +              metallb.universe.tf/loadBalancerIPs: 192.168.122.80
    +          spec:
    +            type: LoadBalancer
    +      options:
    +      - key: server
    +        values:
    +        - 192.168.122.1
    +      replicas: 1
    +
    +  glance:
    +    enabled: false
    +    template:
    +      glanceAPIs: {}
    +
    +  horizon:
    +    enabled: false
    +    template: {}
    +
    +  ironic:
    +    enabled: false
    +    template:
    +      ironicConductors: []
    +
    +  keystone:
    +    enabled: false
    +    template: {}
    +
    +  manila:
    +    enabled: false
    +    template:
    +      manilaAPI: {}
    +      manilaScheduler: {}
    +      manilaShares: {}
    +
    +  mariadb:
    +    enabled: false
    +    templates: {}
    +
    +  galera:
    +    enabled: true
    +    templates:
    +      openstack:
    +        secret: osp-secret
    +        replicas: 1
    +        storageRequest: 500M
    +      openstack-cell1:
    +        secret: osp-secret
    +        replicas: 1
    +        storageRequest: 500M
    +
    +  memcached:
    +    enabled: true
    +    templates:
    +      memcached:
    +        replicas: 1
    +
    +  neutron:
    +    enabled: false
    +    template: {}
    +
    +  nova:
    +    enabled: false
    +    template: {}
    +
    +  ovn:
    +    enabled: false
    +    template:
    +      ovnDBCluster:
    +        ovndbcluster-nb:
    +          dbType: NB
    +          storageRequest: 10G
    +          networkAttachment: internalapi
    +        ovndbcluster-sb:
    +          dbType: SB
    +          storageRequest: 10G
    +          networkAttachment: internalapi
    +      ovnNorthd:
    +        networkAttachment: internalapi
    +        replicas: 1
    +      ovnController:
    +        networkAttachment: tenant
    +
    +  placement:
    +    enabled: false
    +    template: {}
    +
    +  rabbitmq:
    +    templates:
    +      rabbitmq:
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.85
    +            spec:
    +              type: LoadBalancer
    +      rabbitmq-cell1:
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.86
    +            spec:
    +              type: LoadBalancer
    +
    +  ceilometer:
    +    enabled: false
    +    template: {}
    +
    +  autoscaling:
    +    enabled: false
    +    template: {}
    +EOF
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check that MariaDB is running.

    +
    +
    +
    oc get pod openstack-galera-0 -o jsonpath='{.status.phase}{"\n"}'
    +oc get pod openstack-cell1-galera-0 -o jsonpath='{.status.phase}{"\n"}'
    +
    +
    +
  • +
+
+
+
+
+

Stop OpenStack services

+
+

Before we can start with the adoption we need to make sure that the OpenStack +services have been stopped.

+
+
+

This is an important step to avoid inconsistencies in the data migrated for the +data-plane adoption procedure caused by resource changes after the DB has been +copied to the new deployment.

+
+
+

Some services are easy to stop because they only perform short asynchronous +operations, but other services are a bit more complex to gracefully stop +because they perform synchronous or long running operations that we may want to +complete instead of aborting them.

+
+
+

Since gracefully stopping all services is non-trivial and beyond the scope of +this guide we’ll proceed with the force method but present a couple of +recommendations on how to check some things in the services.

+
+
+

Note that we should not stop the infrastructure management services now, like +database, RabbitMQ, and HAProxy Load Balancer. Nor should we stop yet the +Nova compute service and containerized modular libvirt daemons.

+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative and refer to a single node standalone director deployment, +use values that are correct for your environment:

+
+
+
+
CONTROLLER1_SSH="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100"
+CONTROLLER2_SSH=""
+CONTROLLER3_SSH=""
+
+
+
+

We chose to use these ssh variables with the ssh commands instead of using +ansible to try to create instructions that are independent on where they are +running, but ansible commands could be used to achieve the same result if we +are in the right host, for example to stop a service:

+
+
+
+
. stackrc ansible -i $(which tripleo-ansible-inventory) Controller -m shell -a "sudo systemctl stop tripleo_horizon.service" -b
+
+
+
+
+

Pre-checks

+
+

We can stop OpenStack services at any moment, but we may leave things in an +undesired state, so at the very least we should have a look to confirm that +there are no long running operations that require other services.

+
+
+

Ensure that there are no ongoing instance live migrations, volume migrations +(online or offline), volume creation, backup restore, attaching, detaching, +etc.

+
+
+
+
openstack server list --all-projects -c ID -c Status |grep -E '\| .+ing \|'
+openstack volume list --all-projects -c ID -c Status |grep -E '\| .+ing \|'| grep -vi error
+openstack volume backup list --all-projects -c ID -c Status |grep -E '\| .+ing \|' | grep -vi error
+openstack share list --all-projects -c ID -c Status |grep -E '\| .+ing \|'| grep -vi error
+openstack image list -c ID -c Status |grep -E '\| .+ing \|'
+
+
+
+

Also collect the topology configuration, +before stopping services required to gather it live. You will need it to compare it +with the post-adoption values later on.

+
+
+
+

Stopping control plane services

+
+

We can stop OpenStack services at any moment, but we may leave things in an +undesired state, so at the very least we should have a look to confirm that +there are no ongoing operations.

+
+
+

1- Connect to all the controller nodes. +2- Stop the control plane services. +3- Verify the control plane services are stopped.

+
+
+

The cinder-backup service on OSP 17.1 could be running as Active-Passive under +pacemaker or as Active-Active, so we’ll have to check how it’s running and +stop it.

+
+
+

These steps can be automated with a simple script that relies on the previously +defined environmental variables and function:

+
+
+
+
# Update the services list to be stopped
+ServicesToStop=("tripleo_horizon.service"
+                "tripleo_keystone.service"
+                "tripleo_cinder_api.service"
+                "tripleo_cinder_api_cron.service"
+                "tripleo_cinder_scheduler.service"
+                "tripleo_cinder_backup.service"
+                "tripleo_glance_api.service"
+                "tripleo_manila_api.service"
+                "tripleo_manila_api_cron.service"
+                "tripleo_manila_scheduler.service"
+                "tripleo_neutron_api.service"
+                "tripleo_nova_api.service"
+                "tripleo_placement_api.service"
+                "tripleo_nova_api_cron.service"
+                "tripleo_nova_api.service"
+                "tripleo_nova_conductor.service"
+                "tripleo_nova_metadata.service"
+                "tripleo_nova_scheduler.service"
+                "tripleo_nova_vnc_proxy.service"
+                "tripleo_aodh_api.service"
+                "tripleo_aodh_api_cron.service"
+                "tripleo_aodh_evaluator.service"
+                "tripleo_aodh_listener.service"
+                "tripleo_aodh_notifier.service"
+                "tripleo_ceilometer_agent_central.service"
+                "tripleo_ceilometer_agent_compute.service"
+                "tripleo_ceilometer_agent_ipmi.service"
+                "tripleo_ceilometer_agent_notification.service")
+
+PacemakerResourcesToStop=("openstack-cinder-volume"
+                          "openstack-cinder-backup"
+                          "openstack-manila-share")
+
+echo "Stopping systemd OpenStack services"
+for service in ${ServicesToStop[*]}; do
+    for i in {1..3}; do
+        SSH_CMD=CONTROLLER${i}_SSH
+        if [ ! -z "${!SSH_CMD}" ]; then
+            echo "Stopping the $service in controller $i"
+            if ${!SSH_CMD} sudo systemctl is-active $service; then
+                ${!SSH_CMD} sudo systemctl stop $service
+            fi
+        fi
+    done
+done
+
+echo "Checking systemd OpenStack services"
+for service in ${ServicesToStop[*]}; do
+    for i in {1..3}; do
+        SSH_CMD=CONTROLLER${i}_SSH
+        if [ ! -z "${!SSH_CMD}" ]; then
+            echo "Checking status of $service in controller $i"
+            if ! ${!SSH_CMD} systemctl show $service | grep ActiveState=inactive >/dev/null; then
+                echo "ERROR: Service $service still running on controller $i"
+            fi
+        fi
+    done
+done
+
+echo "Stopping pacemaker OpenStack services"
+for i in {1..3}; do
+    SSH_CMD=CONTROLLER${i}_SSH
+    if [ ! -z "${!SSH_CMD}" ]; then
+        echo "Using controller $i to run pacemaker commands"
+        for resource in ${PacemakerResourcesToStop[*]}; do
+            if ${!SSH_CMD} sudo pcs resource config $resource; then
+                ${!SSH_CMD} sudo pcs resource disable $resource
+            fi
+        done
+        break
+    fi
+done
+
+
+
+
+
+

Pull Openstack configuration

+
+

Before starting to adoption workflow, we can start by pulling the configuration +from the Openstack services and TripleO on our file system in order to backup +the configuration files and then use it for later, during the configuration of +the adopted services and for the record to compare and make sure nothing has been +missed or misconfigured.

+
+
+

Make sure you have pull the os-diff repository and configure according to your +environment: +link:planning.md#Configuration tooling[Configure os-diff]

+
+
+

Pull configuration from a TripleO deployment

+
+

Before starting you need to update your ssh parameters according to your environment in the os-diff.cfg. +Os-diff will use those parameters to connect to your Director node, query and download the configuration files:

+
+
+
+
ssh_cmd=ssh -F ssh.config standalone
+container_engine=podman
+connection=ssh
+remote_config_path=/tmp/tripleo
+
+
+
+

Make sure the ssh command you provide in ssh_cmd parameter is correct and with key authentication.

+
+
+

Once it’s done, you can start to pull configuration from your OpenStack servies.

+
+
+

All the services are describes in a yaml file:

+
+ +
+

You can enable or disable the services you want then you can start to pull the configuration on your local file system. +Example with default keystone:

+
+
+
+
# service name and file location
+services:
+  # Service name
+  keystone:
+    # Bool to enable/disable a service (not implemented yet)
+    enable: true
+    # Pod name, in both OCP and podman context.
+    # It could be strict match or will only just grep the podman_name
+    # and work with all the pods which matched with pod_name.
+    # To enable/disable use strict_pod_name_match: true/false
+    podman_name: keystone
+    pod_name: keystone
+    container_name: keystone-api
+    # pod options
+    # strict match for getting pod id in TripleO and podman context
+    strict_pod_name_match: false
+    # Path of the config files you want to analyze.
+    # It could be whatever path you want:
+    # /etc/<service_name> or /etc or /usr/share/<something> or even /
+    # @TODO: need to implement loop over path to support multiple paths such as:
+    # - /etc
+    # - /usr/share
+    path:
+      - /etc/
+      - /etc/keystone
+      - /etc/keystone/keystone.conf
+      - /etc/keystone/logging.conf
+
+
+
+

Duplicate the keystone example to each Openstack services you want.

+
+
+

Then, you can pull the configuration with this command:

+
+
+
+
pushd os-diff
+./os-diff pull
+
+
+
+

The configuration will be pulled and stored by default:

+
+
+
+
/tmp/tripleo/
+
+
+
+

Once it’s done, you should have into your local path a directory per services such as:

+
+
+
+
  ▾ tmp/
+    ▾ tripleo/
+      ▾ glance/
+      ▾ keystone/
+
+
+
+
+

Get services topology specific configuration

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified
+SOURCE_MARIADB_IP=192.168.122.100
+SOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')
+
+
+
+

Export shell variables for the following outputs to compare it with post-adoption values later on:

+
+
+
    +
  • +

    Test connection to the original DB:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_DATABASES=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysql -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo "$PULL_OPENSTACK_CONFIGURATION_DATABASES"
    +
    +
    +
    +

    Note the nova, nova_api, nova_cell0 databases residing in the same DB host.

    +
    +
  • +
  • +

    Run mysqlcheck on the original DB to look for things that are not OK:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysqlcheck --all-databases -h $SOURCE_MARIADB_IP -u root "-p$SOURCE_DB_ROOT_PASSWORD" | grep -v OK)
    +echo "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK"
    +
    +
    +
  • +
  • +

    Get Nova cells mappings from database:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" nova_api -e \
    +    'select uuid,name,transport_url,database_connection,disabled from cell_mappings;')
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS"
    +
    +
    +
  • +
  • +

    Get the host names of the registered Nova compute services:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" nova_api -e \
    +    "select host from nova.services where services.binary='nova-compute';")
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES"
    +
    +
    +
  • +
  • +

    Get the list of mapped Nova cells:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS=$($CONTROLLER_SSH sudo podman exec -it nova_api nova-manage cell_v2 list_cells)
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS"
    +
    +
    +
  • +
+
+
+
+
+

MariaDB data copy

+
+

This document describes how to move the databases from the original +OpenStack deployment to the MariaDB instances in the OpenShift +cluster.

+
+
+
+
+

NOTE This example scenario describes a simple single-cell setup. Real +multi-stack topology recommended for production use results in different +cells DBs layout, and should be using different naming schemes (not covered +here this time).

+
+
+
+
+

Prerequisites

+
+
    +
  • +

    Make sure the previous Adoption steps have been performed successfully.

    +
    +
      +
    • +

      The OpenStackControlPlane resource must be already created at this point.

      +
    • +
    • +

      Podified MariaDB and RabbitMQ are running. No other podified +control plane services are running.

      +
    • +
    • +

      Required services specific topology configuration collected

      +
    • +
    • +

      OpenStack services have been stopped

      +
    • +
    • +

      There must be network routability between:

      +
      +
        +
      • +

        The adoption host and the original MariaDB.

        +
      • +
      • +

        The adoption host and the podified MariaDB.

        +
      • +
      • +

        Note that this routability requirement may change in the +future, e.g. we may require routability from the original MariaDB to +podified MariaDB.

        +
      • +
      +
      +
    • +
    +
    +
  • +
  • +

    Podman package is installed

    +
  • +
  • +

    CONTROLLER1_SSH, CONTROLLER2_SSH, and CONTROLLER3_SSH are configured.

    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
PODIFIED_MARIADB_IP=$(oc get svc --selector "mariadb/name=openstack" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_CELL1_MARIADB_IP=$(oc get svc --selector "mariadb/name=openstack-cell1" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)
+
+# The CHARACTER_SET and collation should match the source DB
+# if the do not then it will break foreign key relationships
+# for any tables that are created in the future as part of db sync
+CHARACTER_SET=utf8
+COLLATION=utf8_general_ci
+
+MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified
+# Replace with your environment's MariaDB IP:
+SOURCE_MARIADB_IP=192.168.122.100
+SOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')
+
+
+
+
+

Pre-checks

+
+
    +
  • +

    Get the count of not-OK source databases:

    +
    +
    +
    test -z "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK"  || [ "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK" = " " ]
    +
    +
    +
  • +
  • +

    Test connection to podified DBs (show databases):

    +
    +
    +
    oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \
    +    mysql -rsh "$PODIFIED_MARIADB_IP" -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;'
    +oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \
    +    mysql -rsh "$PODIFIED_CELL1_MARIADB_IP" -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;'
    +
    +
    +
  • +
+
+
+
+

Procedure - data copy

+
+
+
+

NOTE: We’ll need to transition Nova services imported later on into a +superconductor architecture. For that, delete the old service records in +cells DBs, starting from the cell1. New records will be registered with +different hostnames provided by the Nova service operator. All Nova +services, except the compute agent, have no internal state, and its service +records can be safely deleted. Also we need to rename the former default cell +as cell1.

+
+
+
+
+
    +
  • +

    Create a temporary folder to store DB dumps and make sure it’s the +working directory for the following steps:

    +
    +
    +
    mkdir ~/adoption-db
    +cd ~/adoption-db
    +
    +
    +
  • +
  • +

    Create a dump of the original databases:

    +
    +
    +
    podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $MARIADB_IMAGE bash <<EOF
    +
    +# Note we do not want to dump the information and performance schema tables so we filter them
    +# Gnocchi is no longer used as a metric store, skip dumping gnocchi database as well
    +mysql -h ${SOURCE_MARIADB_IP} -u root "-p${SOURCE_DB_ROOT_PASSWORD}" -N -e 'show databases' | grep -E -v 'schema|mysql|gnocchi' | while read dbname; do
    +    echo "Dumping \${dbname}"
    +    mysqldump -h $SOURCE_MARIADB_IP -uroot "-p$SOURCE_DB_ROOT_PASSWORD" \
    +        --single-transaction --complete-insert --skip-lock-tables --lock-tables=0 \
    +        "\${dbname}" > "\${dbname}".sql
    +done
    +
    +EOF
    +
    +
    +
  • +
  • +

    Restore the databases from .sql files into the podified MariaDB:

    +
    +
    +
    # db schemas to rename on import
    +declare -A db_name_map
    +db_name_map["nova"]="nova_cell1"
    +db_name_map["ovs_neutron"]="neutron"
    +
    +# db servers to import into
    +declare -A db_server_map
    +db_server_map["default"]=${PODIFIED_MARIADB_IP}
    +db_server_map["nova_cell1"]=${PODIFIED_CELL1_MARIADB_IP}
    +
    +# db server root password map
    +declare -A db_server_password_map
    +db_server_password_map["default"]=${PODIFIED_DB_ROOT_PASSWORD}
    +db_server_password_map["nova_cell1"]=${PODIFIED_DB_ROOT_PASSWORD}
    +
    +all_db_files=$(ls *.sql)
    +for db_file in ${all_db_files}; do
    +    db_name=$(echo ${db_file} | awk -F'.' '{ print $1; }')
    +    if [[ -v "db_name_map[${db_name}]" ]]; then
    +        echo "renaming ${db_name} to ${db_name_map[${db_name}]}"
    +        db_name=${db_name_map[${db_name}]}
    +    fi
    +    db_server=${db_server_map["default"]}
    +    if [[ -v "db_server_map[${db_name}]" ]]; then
    +        db_server=${db_server_map[${db_name}]}
    +    fi
    +    db_password=${db_server_password_map["default"]}
    +    if [[ -v "db_server_password_map[${db_name}]" ]]; then
    +        db_password=${db_server_password_map[${db_name}]}
    +    fi
    +    echo "creating ${db_name} in ${db_server}"
    +    container_name=$(echo "mariadb-client-${db_name}-create" | sed 's/_/-/g')
    +    oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \
    +        mysql -h "${db_server}" -uroot "-p${db_password}" << EOF
    +CREATE DATABASE IF NOT EXISTS ${db_name} DEFAULT CHARACTER SET ${CHARACTER_SET} DEFAULT COLLATE ${COLLATION};
    +EOF
    +    echo "importing ${db_name} into ${db_server}"
    +    container_name=$(echo "mariadb-client-${db_name}-restore" | sed 's/_/-/g')
    +    oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \
    +        mysql -h "${db_server}" -uroot "-p${db_password}" "${db_name}" < "${db_file}"
    +done
    +oc exec -it openstack-galera-0 -c galera -- mysql --user=root --password=${db_server_password_map["default"]} -e \
    +    "update nova_api.cell_mappings set name='cell1' where name='default';"
    +oc exec -it openstack-cell1-galera-0 -c galera -- mysql --user=root --password=${db_server_password_map["default"]} -e \
    +    "delete from nova_cell1.services where host not like '%nova-cell1-%' and services.binary != 'nova-compute';"
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+

Compare the following outputs with the topology specific configuration +collected earlier:

+
+
+
    +
  • +

    Check that the databases were imported correctly:

    +
    +
    +
    # use 'oc exec' and 'mysql -rs' to maintain formatting
    +dbs=$(oc exec openstack-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo $dbs | grep -Eq '\bkeystone\b'
    +
    +# ensure neutron db is renamed from ovs_neutron
    +echo $dbs | grep -Eq '\bneutron\b'
    +echo $PULL_OPENSTACK_CONFIGURATION_DATABASES | grep -Eq '\bovs_neutron\b'
    +
    +# ensure nova cell1 db is extracted to a separate db server and renamed from nova to nova_cell1
    +c1dbs=$(oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo $c1dbs | grep -Eq '\bnova_cell1\b'
    +
    +# ensure default cell renamed to cell1, and the cell UUIDs retained intact
    +novadb_mapped_cells=$(oc exec openstack-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" \
    +  nova_api -e 'select uuid,name,transport_url,database_connection,disabled from cell_mappings;')
    +uuidf='\S{8,}-\S{4,}-\S{4,}-\S{4,}-\S{12,}'
    +left_behind=$(comm -23 \
    +  <(echo $PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS | grep -oE " $uuidf \S+") \
    +  <(echo $novadb_mapped_cells | tr -s "| " " " | grep -oE " $uuidf \S+"))
    +changed=$(comm -13 \
    +  <(echo $PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS | grep -oE " $uuidf \S+") \
    +  <(echo $novadb_mapped_cells | tr -s "| " " " | grep -oE " $uuidf \S+"))
    +test $(grep -Ec ' \S+$' <<<$left_behind) -eq 1
    +default=$(grep -E ' default$' <<<$left_behind)
    +test $(grep -Ec ' \S+$' <<<$changed) -eq 1
    +grep -qE " $(awk '{print $1}' <<<$default) cell1$" <<<$changed
    +
    +# ensure the registered Nova compute service name has not changed
    +novadb_svc_records=$(oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" \
    +  nova_cell1 -e "select host from services where services.binary='nova-compute' order by host asc;")
    +diff -Z <(echo $novadb_svc_records) <(echo $PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES)
    +
    +
    +
  • +
  • +

    During the pre/post checks the pod mariadb-client might have returned a pod security warning +related to the restricted:latest security context constraint. This is due to default security +context constraints and will not prevent pod creation by the admission controller. You’ll see a +warning for the short-lived pod but it will not interfere with functionality. +For more info visit here

    +
  • +
+
+
+
+
+

OVN data migration

+
+

This document describes how to move OVN northbound and southbound databases +from the original OpenStack deployment to ovsdb-server instances running in the +OpenShift cluster.

+
+
+

Rationale

+
+

While it may be argued that the podified Neutron ML2/OVN driver and OVN northd +service will reconstruct the databases on startup, the reconstruction may be +time consuming on large existing clusters. The procedure below allows to speed +up data migration and avoid unnecessary data plane disruptions due to +incomplete OpenFlow table contents.

+
+
+
+

Prerequisites

+
+
    +
  • +

    Make sure the previous Adoption steps have been performed successfully.

    +
    +
      +
    • +

      The OpenStackControlPlane resource must be already created at this point.

      +
    • +
    • +

      NetworkAttachmentDefinition CRDs for the original cluster are already +defined. Specifically, openstack/internalapi network is defined.

      +
    • +
    • +

      Podified MariaDB and RabbitMQ may already run. Neutron and OVN are not +running yet.

      +
    • +
    • +

      Original OVN is older or equal to the podified version.

      +
    • +
    • +

      There must be network routability between:

      +
      +
        +
      • +

        The adoption host and the original OVN.

        +
      • +
      • +

        The adoption host and the podified OVN.

        +
      • +
      +
      +
    • +
    +
    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
STORAGE_CLASS_NAME=crc-csi-hostpath-provisioner
+OVSDB_IMAGE=quay.io/podified-antelope-centos9/openstack-ovn-base:current-podified
+SOURCE_OVSDB_IP=172.17.1.49
+
+# ssh commands to reach the original controller machines
+CONTROLLER_SSH="ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone"
+
+# ssh commands to reach the original compute machines
+COMPUTE_SSH="ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone"
+
+
+
+

The real value of the SOURCE_OVSDB_IP can be get from the puppet generated configs:

+
+
+
+
grep -rI 'ovn_[ns]b_conn' /var/lib/config-data/puppet-generated/
+
+
+
+
+

Procedure

+
+
    +
  • +

    Stop OVN northd on all original cluster controllers.

    +
  • +
+
+
+
+
${CONTROLLER_SSH} sudo systemctl stop tripleo_ovn_cluster_northd.service
+
+
+
+
    +
  • +

    Prepare the OVN DBs copy dir and the adoption helper pod (pick the storage requests to fit the OVN databases sizes)

    +
  • +
+
+
+
+
oc apply -f - <<EOF
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: ovn-data
+spec:
+  storageClassName: $STORAGE_CLASS_NAME
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 10Gi
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: ovn-copy-data
+  annotations:
+    openshift.io/scc: anyuid
+  labels:
+    app: adoption
+spec:
+  containers:
+  - image: $OVSDB_IMAGE
+    command: [ "sh", "-c", "sleep infinity"]
+    name: adoption
+    volumeMounts:
+    - mountPath: /backup
+      name: ovn-data
+  securityContext:
+    allowPrivilegeEscalation: false
+    capabilities:
+      drop: ALL
+    runAsNonRoot: true
+    seccompProfile:
+      type: RuntimeDefault
+  volumes:
+  - name: ovn-data
+    persistentVolumeClaim:
+      claimName: ovn-data
+EOF
+
+
+
+
    +
  • +

    Wait for the pod to come up

    +
  • +
+
+
+
+
oc wait --for=condition=Ready pod/ovn-copy-data --timeout=30s
+
+
+
+
    +
  • +

    Backup OVN databases.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client backup tcp:$SOURCE_OVSDB_IP:6641 > /backup/ovs-nb.db"
+oc exec ovn-copy-data -- bash -c "ovsdb-client backup tcp:$SOURCE_OVSDB_IP:6642 > /backup/ovs-sb.db"
+
+
+
+
    +
  • +

    Start podified OVN database services prior to import.

    +
  • +
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  ovn:
+    enabled: true
+    template:
+      ovnDBCluster:
+        ovndbcluster-nb:
+          dbType: NB
+          storageRequest: 10G
+          networkAttachment: internalapi
+        ovndbcluster-sb:
+          dbType: SB
+          storageRequest: 10G
+          networkAttachment: internalapi
+'
+
+
+
+
    +
  • +

    Wait for the OVN DB pods reaching the running phase.

    +
  • +
+
+
+
+
oc wait --for=jsonpath='{.status.phase}'=Running pod --selector=service=ovsdbserver-nb
+oc wait --for=jsonpath='{.status.phase}'=Running pod --selector=service=ovsdbserver-sb
+
+
+
+
    +
  • +

    Fetch podified OVN IP addresses on the clusterIP service network.

    +
  • +
+
+
+
+
PODIFIED_OVSDB_NB_IP=$(oc get svc --selector "statefulset.kubernetes.io/pod-name=ovsdbserver-nb-0" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_OVSDB_SB_IP=$(oc get svc --selector "statefulset.kubernetes.io/pod-name=ovsdbserver-sb-0" -ojsonpath='{.items[0].spec.clusterIP}')
+
+
+
+
    +
  • +

    Upgrade database schema for the backup files.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client get-schema tcp:$PODIFIED_OVSDB_NB_IP:6641 > /backup/ovs-nb.ovsschema && ovsdb-tool convert /backup/ovs-nb.db /backup/ovs-nb.ovsschema"
+oc exec ovn-copy-data -- bash -c "ovsdb-client get-schema tcp:$PODIFIED_OVSDB_SB_IP:6642 > /backup/ovs-sb.ovsschema && ovsdb-tool convert /backup/ovs-sb.db /backup/ovs-sb.ovsschema"
+
+
+
+
    +
  • +

    Restore database backup to podified OVN database servers.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client restore tcp:$PODIFIED_OVSDB_NB_IP:6641 < /backup/ovs-nb.db"
+oc exec ovn-copy-data -- bash -c "ovsdb-client restore tcp:$PODIFIED_OVSDB_SB_IP:6642 < /backup/ovs-sb.db"
+
+
+
+
    +
  • +

    Check that podified OVN databases contain objects from backup, e.g.:

    +
  • +
+
+
+
+
oc exec -it ovsdbserver-nb-0 -- ovn-nbctl show
+oc exec -it ovsdbserver-sb-0 -- ovn-sbctl list Chassis
+
+
+
+
    +
  • +

    Switch ovn-remote on compute nodes to point to the new podified database.

    +
  • +
+
+
+
+
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovs-vsctl set open . external_ids:ovn-remote=tcp:$PODIFIED_OVSDB_SB_IP:6642
+
+
+
+

You should now see the following warning in the ovn_controller container logs:

+
+
+
+
2023-03-16T21:40:35Z|03095|ovsdb_cs|WARN|tcp:172.17.1.50:6642: clustered database server has stale data; trying another server
+
+
+
+
    +
  • +

    Reset RAFT state for all compute ovn-controller instances.

    +
  • +
+
+
+
+
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovn-appctl -t ovn-controller sb-cluster-state-reset
+
+
+
+

This should complete connection of the controller process to the new remote. See in logs:

+
+
+
+
2023-03-16T21:42:31Z|03134|main|INFO|Resetting southbound database cluster state
+2023-03-16T21:42:33Z|03135|reconnect|INFO|tcp:172.17.1.50:6642: connected
+
+
+
+
    +
  • +

    Alternatively, just restart ovn-controller on original compute nodes.

    +
  • +
+
+
+
+
$ ${COMPUTE_SSH} sudo systemctl restart tripleo_ovn_controller.service
+
+
+
+
    +
  • +

    Finally, you can start ovn-northd service that will keep OVN northbound and southbound databases in sync.

    +
  • +
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  ovn:
+    enabled: true
+    template:
+      ovnNorthd:
+        networkAttachment: internalapi
+'
+
+
+
+
    +
  • +

    Delete the ovn-data pod and persistent volume claim with OVN databases backup (consider making a snapshot of it, before deleting)

    +
  • +
+
+
+
+
oc delete pod ovn-copy-data
+oc delete pvc ovn-data
+
+
+
+
+
+

Keystone adoption

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably,

    +
    + +
    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Pre-checks

+ +
+
+

Procedure - Keystone adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Keystone:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  keystone:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      override:
    +        service:
    +          internal:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +      databaseInstance: openstack
    +      secret: osp-secret
    +'
    +
    +
    +
  • +
  • +

    Create alias to use openstack command in the adopted deployment:

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +
    +
  • +
  • +

    Clean up old services and endpoints that still point to the old +control plane (everything except Keystone service and endpoints):

    +
    +
    +
    openstack endpoint list | grep keystone | awk '/admin/{ print $2; }' | xargs ${BASH_ALIASES[openstack]} endpoint delete || true
    +
    +for service in aodh cinderv3 glance manila manilav2 neutron nova placement swift; do
    +  openstack service list | awk "/ $service /{ print \$2; }" | xargs ${BASH_ALIASES[openstack]} service delete || true
    +done
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Keystone endpoints are defined and pointing to the podified +FQDNs:

    +
    +
    +
    openstack endpoint list | grep keystone
    +
    +
    +
  • +
+
+
+
+
+

Neutron adoption

+
+

Adopting Neutron means that an existing OpenStackControlPlane CR, where Neutron +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

When the procedure is over, the expectation is to see the NeutronAPI service +up and running: the Keystone endpoints should be updated and the same backend +of the source Cloud will be available. If the conditions above are met, the +adoption is considered concluded.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone and OVN DBs +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Neutron adoption

+
+

As already done for Keystone, the Neutron Adoption follows the same pattern.

+
+
+

Patch OpenStackControlPlane to deploy Neutron:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  neutron:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      override:
+        service:
+          internal:
+            metadata:
+              annotations:
+                metallb.universe.tf/address-pool: internalapi
+                metallb.universe.tf/allow-shared-ip: internalapi
+                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+            spec:
+              type: LoadBalancer
+      databaseInstance: openstack
+      secret: osp-secret
+      networkAttachments:
+      - internalapi
+'
+
+
+
+
+

Post-checks

+
+
Inspect the resulting neutron pods
+
+
+
NEUTRON_API_POD=`oc get pods -l service=neutron | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $NEUTRON_API_POD -c neutron-api -- cat /etc/neutron/neutron.conf
+
+
+
+
+
Check that Neutron API service is registered in Keystone
+
+
+
openstack service list | grep network
+
+
+
+
+
openstack endpoint list | grep network
+
+| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | neutron      | network      | True    | public    | http://neutron-public-openstack.apps-crc.testing  |
+| b943243e596847a9a317c8ce1800fa98 | regionOne | neutron      | network      | True    | internal  | http://neutron-internal.openstack.svc:9696        |
+| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | neutron      | network      | True    | admin     | http://192.168.122.99:9696                        |
+
+
+
+
+
Create sample resources
+
+

We can now test that user can create networks, subnets, ports, routers etc.

+
+
+
+
openstack network create net
+openstack subnet create --network net --subnet-range 10.0.0.0/24 subnet
+openstack router create router
+
+
+
+ + + + + +
+
Note
+
+this page should be expanded to include information on SR-IOV adoption. +
+
+
+
+
+
+

Ceph backend configuration (if applicable)

+
+

If the original deployment uses a Ceph storage backend for any service +(e.g. Glance, Cinder, Nova, Manila), the same backend must be used in the +adopted deployment and CRs must be configured accordingly.

+
+
+

Prerequisites

+
+
    +
  • +

    The OpenStackControlPlane CR must already exist.

    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
CEPH_SSH="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100"
+CEPH_KEY=$($CEPH_SSH "cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0")
+CEPH_CONF=$($CEPH_SSH "cat /etc/ceph/ceph.conf | base64 -w 0")
+
+
+
+
+

Modify capabilities of the "openstack" user to accommodate Manila

+
+

On TripleO environments, the CephFS driver in Manila is configured to use +its own keypair. For convenience, let’s modify the openstack user so that we +can use it across all OpenStack services.

+
+
+

Using the same user across the services serves two purposes:

+
+
+
    +
  • +

    The capabilities of the user required to interact with the Manila service +became far simpler and hence, more became more secure with RHOSP 18.

    +
  • +
  • +

    It is simpler to create a common ceph secret (keyring and ceph config +file) and propagate the secret to all services that need it.

    +
  • +
+
+
+
+
$CEPH_SSH cephadm shell
+ceph auth caps client.openstack \
+  mgr 'allow *' \
+  mon 'allow r, profile rbd' \
+  osd 'profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=images, allow rw pool manila_data'
+
+
+
+
+

Ceph backend configuration

+
+

Create the ceph-conf-files secret, containing Ceph configuration:

+
+
+
+
oc apply -f - <<EOF
+apiVersion: v1
+data:
+  ceph.client.openstack.keyring: $CEPH_KEY
+  ceph.conf: $CEPH_CONF
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+type: Opaque
+EOF
+
+
+
+

The content of the file should look something like this:

+
+
+
+
+
+
---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+stringData:
+  ceph.client.openstack.keyring: |
+    [client.openstack]
+        key = <secret key>
+        caps mgr = "allow *"
+        caps mon = "profile rbd"
+        caps osd = "profile rbd pool=images"
+  ceph.conf: |
+    [global]
+    fsid = 7a1719e8-9c59-49e2-ae2b-d7eb08c695d4
+    mon_host = 10.1.1.2,10.1.1.3,10.1.1.4
+
+
+
+
+
+

Configure extraMounts within the OpenStackControlPlane CR:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  extraMounts:
+    - name: v1
+      region: r1
+      extraVol:
+        - propagation:
+          - CinderVolume
+          - CinderBackup
+          - GlanceAPI
+          - ManilaShare
+          extraVolType: Ceph
+          volumes:
+          - name: ceph
+            projected:
+              sources:
+              - secret:
+                  name: ceph-conf-files
+          mounts:
+          - name: ceph
+            mountPath: "/etc/ceph"
+            readOnly: true
+'
+
+
+
+
+

Getting Ceph FSID

+
+

Configuring some OpenStack services to use Ceph backend may require +the FSID value. You can fetch the value from the config like so:

+
+
+
+
CEPH_FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data."ceph.conf"' | base64 -d | grep fsid | sed -e 's/fsid = //')
+
+
+
+
+
+

Glance adoption

+
+

Adopting Glance means that an existing OpenStackControlPlane CR, where Glance +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

When the procedure is over, the expectation is to see the GlanceAPI service +up and running: the Keystone endpoints should be updated and the same backend +of the source Cloud will be available. If the conditions above are met, the +adoption is considered concluded.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side;

    +
  4. +
  5. +

    (optional) an internal/external Ceph cluster is reachable by both crc and +TripleO

    +
  6. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Glance adoption

+
+

As already done for Keystone, the Glance Adoption follows the same pattern.

+
+
+
Using local storage backend
+
+

When Glance should be deployed with local storage backend (not Ceph), +patch OpenStackControlPlane to deploy Glance:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  glance:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          type: single
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+'
+
+
+
+
+
Using NFS backend
+
+

When the source Cloud based on TripleO uses Glance with a NFS backend, before +patching the OpenStackControlPlane to deploy Glance it’s important to validate +a few networking related prerequisites. +In the source cloud, verify the NFS parameters used by the overcloud to configure +the Glance backend. +In particular, find among the TripleO heat templates the following variables:

+
+
+
+

GlanceBackend: file

+
+
+

GlanceNfsEnabled: true

+
+
+

GlanceNfsShare: 192.168.24.1:/var/nfs

+
+
+
+

that are usually an override of the default content provided by +/usr/share/openstack-tripleo-heat-templates/environments/storage/glance-nfs.yaml[glance-nfs.yaml]. +In the example above, as the first variable shows, unlike Cinder, Glance has no +notion of NFS backend: the File driver is used in this scenario, and behind the +scenes, the filesystem_store_datadir which usually points to /var/lib/glance/images/ +is mapped to the export value provided by the GlanceNfsShare variable. +If the GlanceNfsShare is not exported through a network that is supposed to be +propagated to the adopted OpenStack control plane, an extra action is required +by the human administrator, who must stop the nfs-server and remap the export +to the storage network. This action usually happens when the Glance service is +stopped in the source controller nodes. +In the podified control plane, as per the +(network isolation diagram, +Glance is attached to the Storage network, propagated via the associated +NetworkAttachmentsDefinition CR, and the resulting Pods have already the right +permissions to handle the Image Service traffic through this network. +In a deployed OpenStack control plane, you can verify that the network mapping +matches with what has been deployed in the TripleO based environment by checking +both the NodeNetworkConfigPolicy (nncp) and the NetworkAttachmentDefinition +(net-attach-def) with the following commands:

+
+
+
+
$ oc get nncp
+NAME                        STATUS      REASON
+enp6s0-crc-8cf2w-master-0   Available   SuccessfullyConfigured
+
+$ oc get net-attach-def
+NAME
+ctlplane
+internalapi
+storage
+tenant
+
+$ oc get ipaddresspool -n metallb-system
+NAME          AUTO ASSIGN   AVOID BUGGY IPS   ADDRESSES
+ctlplane      true          false             ["192.168.122.80-192.168.122.90"]
+internalapi   true          false             ["172.17.0.80-172.17.0.90"]
+storage       true          false             ["172.18.0.80-172.18.0.90"]
+tenant        true          false             ["172.19.0.80-172.19.0.90"]
+
+
+
+

The above represents an example of the output that should be checked in the +openshift environment to make sure there are no issues with the propagated +networks.

+
+
+

The following steps assume that:

+
+
+
    +
  1. +

    the Storage network has been propagated to the openstack control plane

    +
  2. +
  3. +

    Glance is able to reach the Storage network and connect to the nfs-server +through the port 2049.

    +
  4. +
+
+
+

If the above conditions are met, it is possible to adopt the Glance service +and create a new default GlanceAPI instance connected with the existing +NFS share.

+
+
+
+
cat << EOF > glance_nfs_patch.yaml
+
+spec:
+  extraMounts:
+  - extraVol:
+    - extraVolType: Nfs
+      mounts:
+      - mountPath: /var/lib/glance/images
+        name: nfs
+      propagation:
+      - Glance
+      volumes:
+      - name: nfs
+        nfs:
+          path: /var/nfs
+          server: 172.17.3.20
+    name: r1
+    region: r1
+  glance:
+    enabled: true
+    template:
+      databaseInstance: openstack
+      customServiceConfig: |
+         [DEFAULT]
+         enabled_backends = default_backend:file
+         [glance_store]
+         default_backend = default_backend
+         [default_backend]
+         filesystem_store_datadir = /var/lib/glance/images/
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          type: single
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+EOF
+
+
+
+

Note:

+
+
+

Replace in glance_nfs_patch.yaml the nfs/server ip address with the IP used +to reach the nfs-server and make sure the nfs/path points to the exported +path in the nfs-server.

+
+
+

Patch OpenStackControlPlane to deploy Glance with a NFS backend:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_nfs_patch.yaml
+
+
+
+

When GlanceAPI is active, you can see a single API instance:

+
+
+
+
$ oc get pods -l service=glance
+NAME                      READY   STATUS    RESTARTS
+glance-default-single-0   3/3     Running   0
+
+
+
+

and the description of the pod must report:

+
+
+
+
Mounts:
+...
+  nfs:
+    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
+    Server:    {{ server ip address }}
+    Path:      {{ nfs export path }}
+    ReadOnly:  false
+...
+
+
+
+

It is also possible to double check the mountpoint by running the following:

+
+
+
+
oc rsh -c glance-api glance-default-single-0
+
+sh-5.1# mount
+...
+...
+{{ ip address }}:/var/nfs on /var/lib/glance/images type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.0.5,local_lock=none,addr=172.18.0.5)
+...
+...
+
+
+
+

You can run an openstack image create command and double check, on the NFS +node, the uuid has been created in the exported directory.

+
+
+

For example:

+
+
+
+
$ oc rsh openstackclient
+$ openstack image list
+
+sh-5.1$  curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img
+...
+...
+
+sh-5.1$ openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img cirros
+...
+...
+
+sh-5.1$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 634482ca-4002-4a6d-b1d5-64502ad02630 | cirros | active |
++--------------------------------------+--------+--------+
+
+
+
+

On the nfs-server node, we can see the same uuid in the exported /var/nfs:

+
+
+
+
$ ls /var/nfs/
+634482ca-4002-4a6d-b1d5-64502ad02630
+
+
+
+
+
Using Ceph storage backend
+
+

If a Ceph backend is used, the customServiceConfig parameter should +be used to inject the right configuration to the GlanceAPI instance.

+
+
+

Make sure the Ceph-related secret (ceph-conf-files) was created in +the openstack namespace and that the extraMounts property of the +OpenStackControlPlane CR has been configured properly. These tasks +are described in an earlier Adoption step Ceph storage backend +configuration.

+
+
+
+
cat << EOF > glance_patch.yaml
+spec:
+  glance:
+    enabled: true
+    template:
+      databaseInstance: openstack
+      customServiceConfig: |
+        [DEFAULT]
+        enabled_backends=default_backend:rbd
+        [glance_store]
+        default_backend=default_backend
+        [default_backend]
+        rbd_store_ceph_conf=/etc/ceph/ceph.conf
+        rbd_store_user=openstack
+        rbd_store_pool=images
+        store_description=Ceph glance store backend.
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+EOF
+
+
+
+
+
+

If you have previously backup your Openstack services configuration file from the old environment: +pull openstack configuration os-diff you can use os-diff to compare +and make sure the configuration is correct.

+
+
+
+
+
+
pushd os-diff
+./os-diff cdiff --service glance -c /tmp/collect_tripleo_configs/glance/etc/glance/glance-api.conf -o glance_patch.yaml
+
+
+
+
+
+

This will produce the difference between both ini configuration files.

+
+
+
+
+

Patch OpenStackControlPlane to deploy Glance with Ceph backend:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_patch.yaml
+
+
+
+
+
+

Post-checks

+
+
Test the glance service from the OpenStack CLI
+
+
+
+

You can compare and make sure the configuration has been correctly applied to the glance pods by running

+
+
+
+
+
+
./os-diff cdiff --service glance -c /etc/glance/glance.conf.d/02-config.conf  -o glance_patch.yaml --frompod -p glance-api
+
+
+
+
+
+

If no line appear, then the configuration is correctly done.

+
+
+
+
+

Inspect the resulting glance pods:

+
+
+
+
GLANCE_POD=`oc get pod |grep glance-default-external-0 | cut -f 1 -d' '`
+oc exec -t $GLANCE_POD -c glance-api -- cat /etc/glance/glance.conf.d/02-config.conf
+
+[DEFAULT]
+enabled_backends=default_backend:rbd
+[glance_store]
+default_backend=default_backend
+[default_backend]
+rbd_store_ceph_conf=/etc/ceph/ceph.conf
+rbd_store_user=openstack
+rbd_store_pool=images
+store_description=Ceph glance store backend.
+
+oc exec -t $GLANCE_POD -c glance-api -- ls /etc/ceph
+ceph.client.openstack.keyring
+ceph.conf
+
+
+
+

Ceph secrets are properly mounted, at this point let’s move to the OpenStack +CLI and check the service is active and the endpoints are properly updated.

+
+
+
+
(openstack)$ service list | grep image
+
+| fc52dbffef36434d906eeb99adfc6186 | glance    | image        |
+
+(openstack)$ endpoint list | grep image
+
+| 569ed81064f84d4a91e0d2d807e4c1f1 | regionOne | glance       | image        | True    | internal  | http://glance-internal-openstack.apps-crc.testing   |
+| 5843fae70cba4e73b29d4aff3e8b616c | regionOne | glance       | image        | True    | public    | http://glance-public-openstack.apps-crc.testing     |
+| 709859219bc24ab9ac548eab74ad4dd5 | regionOne | glance       | image        | True    | admin     | http://glance-admin-openstack.apps-crc.testing      |
+
+
+
+

Check the images that we previously listed in the source Cloud are available +in the adopted service:

+
+
+
+
(openstack)$ image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |
++--------------------------------------+--------+--------+
+
+
+
+
+
Image upload
+
+

We can test that an image can be created on from the adopted service.

+
+
+
+
(openstack)$ alias openstack="oc exec -t openstackclient -- openstack"
+(openstack)$ curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img
+    qemu-img convert -O raw /tmp/cirros-0.5.2-x86_64-disk.img /tmp/cirros-0.5.2-x86_64-disk.img.raw
+    openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img.raw cirros2
+    openstack image list
+  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
+                                 Dload  Upload   Total   Spent    Left  Speed
+100   273  100   273    0     0   1525      0 --:--:-- --:--:-- --:--:--  1533
+  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
+100 15.5M  100 15.5M    0     0  17.4M      0 --:--:-- --:--:-- --:--:-- 17.4M
+
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| Field            | Value                                                                                                                                      |
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| container_format | bare                                                                                                                                       |
+| created_at       | 2023-01-31T21:12:56Z                                                                                                                       |
+| disk_format      | raw                                                                                                                                        |
+| file             | /v2/images/46a3eac1-7224-40bc-9083-f2f0cd122ba4/file                                                                                       |
+| id               | 46a3eac1-7224-40bc-9083-f2f0cd122ba4                                                                                                       |
+| min_disk         | 0                                                                                                                                          |
+| min_ram          | 0                                                                                                                                          |
+| name             | cirros                                                                                                                                     |
+| owner            | 9f7e8fdc50f34b658cfaee9c48e5e12d                                                                                                           |
+| properties       | os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros', owner_specified.openstack.sha256='' |
+| protected        | False                                                                                                                                      |
+| schema           | /v2/schemas/image                                                                                                                          |
+| status           | queued                                                                                                                                     |
+| tags             |                                                                                                                                            |
+| updated_at       | 2023-01-31T21:12:56Z                                                                                                                       |
+| visibility       | shared                                                                                                                                     |
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 46a3eac1-7224-40bc-9083-f2f0cd122ba4 | cirros2| active |
+| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |
++--------------------------------------+--------+--------+
+
+
+(openstack)$ oc rsh ceph
+sh-4.4$ ceph -s
+r  cluster:
+    id:     432d9a34-9cee-4109-b705-0c59e8973983
+    health: HEALTH_OK
+
+  services:
+    mon: 1 daemons, quorum a (age 4h)
+    mgr: a(active, since 4h)
+    osd: 1 osds: 1 up (since 4h), 1 in (since 4h)
+
+  data:
+    pools:   5 pools, 160 pgs
+    objects: 46 objects, 224 MiB
+    usage:   247 MiB used, 6.8 GiB / 7.0 GiB avail
+    pgs:     160 active+clean
+
+sh-4.4$ rbd -p images ls
+46a3eac1-7224-40bc-9083-f2f0cd122ba4
+c3158cad-d50b-452f-bec1-f250562f5c1f
+
+
+
+
+
+
+

Placement adoption

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably,

    +
    +
      +
    • +

      the service databases +must already be imported into the podified MariaDB.

      +
    • +
    • +

      the Keystone service needs to be imported.

      +
    • +
    • +

      the Memcached operator needs to be deployed (nothing to import for it from +the source environment).

      +
    • +
    +
    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Procedure - Placement adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Placement:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  placement:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      databaseInstance: openstack
    +      secret: osp-secret
    +      override:
    +        service:
    +          internal:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +'
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Placement endpoints are defined and pointing to the +podified FQDNs and that Placement API responds.

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +openstack endpoint list | grep placement
    +
    +
    +# Without OpenStack CLI placement plugin installed:
    +PLACEMENT_PUBLIC_URL=$(openstack endpoint list -c 'Service Name' -c 'Service Type' -c URL | grep placement | grep public | awk '{ print $6; }')
    +oc exec -t openstackclient -- curl "$PLACEMENT_PUBLIC_URL"
    +
    +# With OpenStack CLI placement plugin installed:
    +openstack resource class list
    +
    +
    +
  • +
+
+
+
+
+

Nova adoption

+
+
+
+

NOTE This example scenario describes a simple single-cell setup. Real +multi-stack topology recommended for production use results in different +cells DBs layout, and should be using different naming schemes (not covered +here this time).

+
+
+
+
+

Prerequisites

+
+ +
+
+
+

Variables

+
+

Define the shell variables and aliases used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
alias openstack="oc exec -t openstackclient -- openstack"
+
+
+
+
+

Procedure - Nova adoption

+
+
+
+

NOTE: We assume Nova Metadata deployed on the top level and not on each +cell level, so this example imports it the same way. If the source deployment +has a per cell metadata deployment, adjust the given below patch as needed. +Metadata service cannot be run in cell0.

+
+
+
+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Nova:

    +
    +
    +
    oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '
    +spec:
    +  nova:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      secret: osp-secret
    +      apiServiceTemplate:
    +        override:
    +          service:
    +            internal:
    +              metadata:
    +                annotations:
    +                  metallb.universe.tf/address-pool: internalapi
    +                  metallb.universe.tf/allow-shared-ip: internalapi
    +                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +              spec:
    +                type: LoadBalancer
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      metadataServiceTemplate:
    +        enabled: true # deploy single nova metadata on the top level
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      schedulerServiceTemplate:
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      cellTemplates:
    +        cell0:
    +          conductorServiceTemplate:
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +        cell1:
    +          metadataServiceTemplate:
    +            enabled: false # enable here to run it in a cell instead
    +            override:
    +                service:
    +                  metadata:
    +                    annotations:
    +                      metallb.universe.tf/address-pool: internalapi
    +                      metallb.universe.tf/allow-shared-ip: internalapi
    +                      metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +                  spec:
    +                    type: LoadBalancer
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +          conductorServiceTemplate:
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +'
    +
    +
    +
  • +
  • +

    Wait for Nova control plane services' CRs to become ready:

    +
    +
    +
    oc wait --for condition=Ready --timeout=300s Nova/nova
    +
    +
    +
    +

    The local Conductor services will be started for each cell, while the superconductor runs in cell0. +Note that disable_compute_service_check_for_ffu is mandatory for all imported Nova services, until +the external dataplane imported, and until Nova Compute services fast-forward upgraded.

    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check that Nova endpoints are defined and pointing to the +podified FQDNs and that Nova API responds.

    +
    +
    +
    openstack endpoint list | grep nova
    +openstack server list
    +
    +
    +
  • +
+
+
+

Compare the following outputs with the topology specific configuration +collected earlier:

+
+
+
    +
  • +

    Query the superconductor for cell1 existance and compare it to pre-adoption values:

    +
    +
    +
    echo $PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS
    +oc rsh nova-cell0-conductor-0 nova-manage cell_v2 list_cells | grep -F '| cell1 |'
    +
    +
    +
    +

    The expected changes to happen:

    +
    +
    +
      +
    • +

      cell1’s nova DB and user name become nova_cell1.

      +
    • +
    • +

      Default cell is renamed to cell1 (in a multi-cell setup, it should become indexed as the last cell instead).

      +
    • +
    • +

      RabbitMQ transport URL no longer uses guest.

      +
    • +
    +
    +
  • +
+
+
+
+
+

NOTE At this point, Nova control plane services have yet taken control over +existing Nova compute workloads. That would become possible to verify only after +EDPM adoption completed.

+
+
+
+
+
+
+

Cinder adoption

+
+

Adopting a director deployed Cinder service into OpenStack may require some +thought because it’s not always a simple process.

+
+
+

Usually the adoption process entails:

+
+
+
    +
  • +

    Checking existing limitations.

    +
  • +
  • +

    Considering the placement of the cinder services.

    +
  • +
  • +

    Preparing the OpenShift nodes where volume and backup services will run.

    +
  • +
  • +

    Crafting the manifest based on the existing cinder.conf file.

    +
  • +
  • +

    Deploying Cinder.

    +
  • +
  • +

    Validating the new deployment.

    +
  • +
+
+
+

This guide provides necessary knowledge to complete these steps in most +situations, but it still requires knowledge on how OpenStack services work and +the structure of a Cinder configuration file.

+
+
+

Limitations

+
+

There are currently some limitations that are worth highlighting; some are +related to this guideline while some to the operator:

+
+
+
    +
  • +

    There is no global nodeSelector for all cinder volumes, so it needs to be +specified per backend. This may change in the future.

    +
  • +
  • +

    There is no global customServiceConfig or customServiceConfigSecrets for +all cinder volumes, so it needs to be specified per backend. This may change in +the future.

    +
  • +
  • +

    Adoption of LVM backends, where the volume data is stored in the compute +nodes, is not currently being documented in this process. It may get documented +in the future.

    +
  • +
  • +

    Support for Cinder backends that require kernel modules not included in RHEL +has not been tested in Operator deployed OpenStack so it is not documented in +this guide.

    +
  • +
  • +

    Adoption of DCN/Edge deployment is not currently described in this guide.

    +
  • +
+
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, cinder service must have been +stopped and the service databases must already be imported into the podified +MariaDB.

    +
  • +
  • +

    Storage network has been properly configured on the OpenShift cluster.

    +
  • +
+
+
+
+

Variables

+
+

No new environmental variables need to be defined, though we use the +CONTROLLER1_SSH that was defined in a previous step for the pre-checks.

+
+
+
+

Pre-checks

+
+

We are going to need the contents of cinder.conf, so we may want to download +it to have it locally accessible:

+
+
+
+
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf > cinder.conf
+
+
+
+
+

Prepare OpenShift

+
+

As explained the planning section before deploying OpenStack in +OpenShift we need to ensure that the networks are ready, that we have decided +the node selection, and also make sure any necessary changes to the OpenShift +nodes have been made. For Cinder volume and backup services all these 3 must +be carefully considered.

+
+
+
Node Selection
+
+

We may need, or want, to restrict the OpenShift nodes where cinder volume and +backup services can run.

+
+
+

The best example of when we need to do node selection for a specific cinder +service in when we deploy Cinder with the LVM driver. In that scenario the +LVM data where the volumes are stored only exists in a specific host, so we +need to pin the cinder-volume service to that specific OpenShift node. Running +the service on any other OpenShift node would not work. Since nodeSelector +only works on labels we cannot use the OpenShift host node name to restrict +the LVM backend and we’ll need to identify it using a unique label, an existing +or new one:

+
+
+
+
$ oc label nodes worker0 lvm=cinder-volumes
+
+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  cinder:
+    enabled: true
+    template:
+      cinderVolumes:
+        lvm-iscsi:
+          nodeSelector:
+            lvm: cinder-volumes
+< . . . >
+
+
+
+

As mentioned in the Node Selector guide, an example where we +need to use labels is when using FC storage and we don’t have HBA cards in all +our OpenShift nodes. In this scenario we would need to restrict all the cinder +volume backends (not only the FC one) as well as the backup services.

+
+
+

Depending on the cinder backends, their configuration, and the usage of Cinder, +we can have network intensive cinder volume services with lots of I/O as well as +cinder backup services that are not only network intensive but also memory and +CPU intensive. This may be a concern for the OpenShift human operators, and +they may want to use the nodeSelector to prevent these service from +interfering with their other OpenShift workloads

+
+
+

Please make sure to read the Nodes Selector guide before +continuing, as we’ll be referring to some of the concepts explained there in the +following sections.

+
+
+

When selecting the nodes where cinder volume is going to run please remember +that cinder-volume may also use local storage when downloading a glance image +for the create volume from image operation, and it can require a considerable +amount of space when having concurrent operations and not using cinder volume +cache.

+
+
+

If we don’t have nodes with enough local disk space for the temporary images we +can use a remote NFS location for the images. This is something that we had to +manually setup in Director deployments, but with operators we can easily do it +automatically using the extra volumes feature ()extraMounts.

+
+
+
+
Transport protocols
+
+

Due to the specifics of the storage transport protocols some changes may be +required on the OpenShift side, and although this is something that must be +documented by the Vendor here wer are going to provide some generic +instructions that can serve as a guide for the different transport protocols.

+
+
+

Check the backend sections in our cinder.conf file that are listed in the +enabled_backends configuration option to figure out the transport storage +protocol used by the backend.

+
+
+

Depending on the backend we can find the transport protocol:

+
+
+
    +
  • +

    Looking at the volume_driver configuration option, as it may contain the +protocol itself: RBD, iSCSI, FC…​

    +
  • +
  • +

    Looking at the target_protocol configuration option

    +
  • +
+
+
+ + + + + +
+
Warning
+
+Any time a MachineConfig is used to make changes to OpenShift +nodes the node will reboot!! Act accordingly. +
+
+
+
NFS
+
+

There’s nothing to do for NFS. OpenShift can connect to NFS backends without +any additional changes.

+
+
+
+
RBD/Ceph
+
+

There’s nothing to do for RBD/Ceph in terms of preparing the nodes, OpenShift +can connect to Ceph backends without any additional changes. Credentials and +configuration files will need to be provided to the services though.

+
+
+
+
iSCSI
+
+

Connecting to iSCSI volumes requires that the iSCSI initiator is running on the +OpenShift hosts hosts where volume and backup services are going to run, because +the Linux Open iSCSI initiator doesn’t currently support network namespaces, so +we must only run 1 instance of the service for the normal OpenShift usage, plus +the OpenShift CSI plugins, plus the OpenStack services.

+
+
+

If we are not already running iscsid on the OpenShift nodes then we’ll need +to apply a MachineConfig similar to this one:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-enable-iscsid
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    systemd:
+      units:
+      - enabled: true
+        name: iscsid.service
+
+
+
+

Remember that if we are using labels to restrict the nodes where cinder +services are running we’ll need to use a MachineConfigPool as described in +the nodes selector guide to limit the effects of the +MachineConfig to only the nodes were our services may run.

+
+
+

If we are using a toy single node deployment to test the process we may need to +replace worker with master in the MachineConfig.

+
+
+

For production deployments using iSCSI volumes we always recommend setting up +multipathing, please look at the multipathing section to see +how to configure it.

+
+
+

TODO: Add, or at least mention, the Nova eDPM side for iSCSI.

+
+
+
+
FC
+
+

There’s nothing to do for FC volumes to work, but the cinder volume and cinder +backup services need to run in an OpenShift host that has HBAs, so if there +are nodes that don’t have HBAs then we’ll need to use labels to restrict where +these services can run, as mentioned in the [node selection section] +(#node-selection).

+
+
+

This also means that for virtualized OpenShift clusters using FC we’ll need to +expose the host’s HBAs inside the VM.

+
+
+

For production deployments using FC volumes we always recommend setting up +multipathing, please look at the multipathing section to see +how to configure it.

+
+
+
+
NVMe-oF
+
+

Connecting to NVMe-oF volumes requires that the nvme kernel modules are loaded +on the OpenShift hosts.

+
+
+

If we are not already loading the nvme-fabrics module on the OpenShift nodes +where volume and backup services are going to run then we’ll need to apply a +MachineConfig similar to this one:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-load-nvme-fabrics
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    storage:
+      files:
+        - path: /etc/modules-load.d/nvme_fabrics.conf
+          overwrite: false
+          # Mode must be decimal, this is 0644
+          mode: 420
+          user:
+            name: root
+          group:
+            name: root
+          contents:
+            # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.
+            # This is the rfc2397 text/plain string format
+            source: data:,nvme-fabrics
+
+
+
+

Remember that if we are using labels to restrict the nodes where cinder +services are running we’ll need to use a MachineConfigPool as described in +the nodes selector guide to limit the effects of the +MachineConfig to only the nodes were our services may run.

+
+
+

If we are using a toy single node deployment to test the process we may need to +replace worker with master in the MachineConfig.

+
+
+

We are only loading the nvme-fabrics module because it takes care of loading +the transport specific modules (tcp, rdma, fc) as needed.

+
+
+

For production deployments using NVMe-oF volumes we always recommend using +multipathing. For NVMe-oF volumes OpenStack uses native multipathing, called +ANA.

+
+
+

Once the OpenShift nodes have rebooted and are loading the nvme-fabrics module +we can confirm that the Operating System is configured and supports ANA by +checking on the host:

+
+
+
+
cat /sys/module/nvme_core/parameters/multipath
+
+
+
+ + + + + +
+
Important
+
+ANA doesn’t use the Linux Multipathing Device Mapper, but the +*current OpenStack +code requires multipathd on compute nodes to be running for Nova to be able to +use multipathing, so please remember to follow the multipathing part for compute +nodes on the multipathing section. +
+
+
+

TODO: Add, or at least mention, the Nova eDPM side for NVMe-oF.

+
+
+
+
Multipathing
+
+

For iSCSI and FC protocols we always recommend using multipathing, which +has 4 parts:

+
+
+
    +
  • +

    Prepare the OpenShift hosts

    +
  • +
  • +

    Configure the Cinder services

    +
  • +
  • +

    Prepare the Nova computes

    +
  • +
  • +

    Configure the Nova service

    +
  • +
+
+
+

To prepare the OpenShift hosts we need to ensure that the Linux Multipath +Device Mapper is configured and running on the OpenShift hosts, and we do +that using MachineConfig like this one:

+
+
+
+
# Includes the /etc/multipathd.conf contents and the systemd unit changes
+apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-enable-multipathd
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    storage:
+      files:
+        - path: /etc/multipath.conf
+          overwrite: false
+          # Mode must be decimal, this is 0600
+          mode: 384
+          user:
+            name: root
+          group:
+            name: root
+          contents:
+            # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.
+            # This is the rfc2397 text/plain string format
+            source: data:,defaults%20%7B%0A%20%20user_friendly_names%20no%0A%20%20recheck_wwid%20yes%0A%20%20skip_kpartx%20yes%0A%20%20find_multipaths%20yes%0A%7D%0A%0Ablacklist%20%7B%0A%7D
+    systemd:
+      units:
+      - enabled: true
+        name: multipathd.service
+
+
+
+

Remember that if we are using labels to restrict the nodes where cinder +services are running we’ll need to use a MachineConfigPool as described in +the nodes selector guide to limit the effects of the +MachineConfig to only the nodes were our services may run.

+
+
+

If we are using a toy single node deployment to test the process we may need to +replace worker with master in the MachineConfig.

+
+
+

To configure the cinder services to use multipathing we need to enable the +use_multipath_for_image_xfer configuration option in all the backend sections +and in the [DEFAULT] section for the backup service, but in Podified +deployments we don’t need to worry about it, because that’s the default. So as +long as we don’t override it setting use_multipath_for_image_xfer = false then +multipathing will work as long as the service is running on the OpenShift host.

+
+
+

TODO: Add, or at least mention, the Nova eDPM side for Multipathing once +it’s implemented.

+
+
+
+
+
+

Configurations

+
+

As described in the planning Cinder is configured using +configuration snippets instead of using obscure configuration parameters +defined by the installer.

+
+
+

The recommended way to deploy Cinder volume backends has changed to remove old +limitations, add flexibility, and improve operations in general.

+
+
+

When deploying with Director we used to run a single Cinder volume service with +all our backends (each backend would run on its own process), and even though +that way of deploying is still supported, we don’t recommend it. We recommend +using a volume service per backend since it’s a superior deployment model.

+
+
+

So for an LVM and a Ceph backend we would have 2 entries in cinderVolume and, +as mentioned in the limitations section, we cannot set global defaults for all +volume services, so we would have to define it for each of them, like this:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    enabled: true
+    template:
+      cinderVolume:
+        lvm:
+          customServiceConfig: |
+            [DEFAULT]
+            debug = True
+            [lvm]
+< . . . >
+        ceph:
+          customServiceConfig: |
+            [DEFAULT]
+            debug = True
+            [ceph]
+< . . . >
+
+
+
+

Reminder that for volume backends that have sensitive information using Secret +and the customServiceConfigSecrets key is the recommended way to go.

+
+
+
+

Prepare the configuration

+
+

For adoption instead of using a whole deployment manifest we’ll use a targeted +patch, like we did with other services, and in this patch we will enable the +different cinder services with their specific configurations.

+
+
+

WARNING: Check that all configuration options are still valid for the new +OpenStack version, since configuration options may have been deprecated, +removed, or added. This applies to both backend driver specific configuration +options and other generic options.

+
+
+

There are 2 ways to prepare a cinder configuration for adoption, tailor-making +it or doing it quick and dirty. There is no difference in how Cinder will +operate with both methods, so we are free to chose, though we recommend +tailor-making it whenever possible.

+
+
+

The high level explanation of the tailor-made approach is:

+
+
+
    +
  1. +

    Determine what part of the configuration is generic for all the cinder +services and remove anything that would change when deployed in OpenShift, like +the connection in the [dabase] section, the transport_url and log_dir in +[DEFAULT], the whole [coordination] section. This configuration goes into +the customServiceConfig (or a Secret and then used in +customServiceConfigSecrets) at the cinder: template: level.

    +
  2. +
  3. +

    Determine if there’s any scheduler specific configuration and add it to the +customServiceConfig section in cinder: template: cinderScheduler.

    +
  4. +
  5. +

    Determine if there’s any API specific configuration and add it to the +customServiceConfig section in cinder: template: cinderAPI.

    +
  6. +
  7. +

    If we have cinder backup deployed, then we’ll get the cinder backup relevant +configuration options and add them to customServiceConfig (or a Secret and +then used in customServiceConfigSecrets) at the cinder: template: +cinderBackup: level. We should remove the host configuration in the +[DEFAULT] section to facilitate supporting multiple replicas in the future.

    +
  8. +
  9. +

    Determine the individual volume backend configuration for each of the +drivers. The configuration will not only be the specific driver section, it +should also include the [backend_defaults] section and FC zoning sections is +they are being used, because the cinder operator doesn’t support a +customServiceConfig section global for all volume services. Each backend +would have its own section under cinder: template: cinderVolumes and the +configuration would go in customServiceConfig (or a Secret and then used in +customServiceConfigSecrets).

    +
  10. +
  11. +

    Check if any of the cinder volume drivers being used requires a custom vendor +image. If they do, find the location of the image in the vendor’s instruction +available in the w OpenStack Cinder ecosystem +page +and add it under the specific’s driver section using the containerImage key. +For example, if we had a Pure Storage array and the driver was already certified +for OSP18, then we would have something like this:

    +
    +
    +
    spec:
    +  cinder:
    +    enabled: true
    +    template:
    +      cinderVolume:
    +        pure:
    +          containerImage: registry.connect.redhat.com/purestorage/openstack-cinder-volume-pure-rhosp-18-0'
    +          customServiceConfigSecrets:
    +            - openstack-cinder-pure-cfg
    +< . . . >
    +
    +
    +
  12. +
  13. +

    External files: Cinder services sometimes use external files, for example for +a custom policy, or to store credentials, or SSL CA bundles to connect to a +storage array, and we need to make those files available to the right +containers. To achieve this we’ll use Secrets or ConfigMap to store the +information in OpenShift and then the extraMounts key. For example, for the +Ceph credentials stored in a Secret called ceph-conf-files we would patch +the top level extraMounts in OpenstackControlPlane:

    +
    +
    +
    spec:
    +  extraMounts:
    +  - extraVol:
    +    - extraVolType: Ceph
    +      mounts:
    +      - mountPath: /etc/ceph
    +        name: ceph
    +        readOnly: true
    +      propagation:
    +      - CinderVolume
    +      - CinderBackup
    +      - Glance
    +      volumes:
    +      - name: ceph
    +        projected:
    +          sources:
    +          - secret:
    +              name: ceph-conf-files
    +
    +
    +
    +

    But for a service specific one, like the API policy, we would do it directly +on the service itself, in this example we include the cinder API +configuration that references the policy we are adding from a ConfigMap +called my-cinder-conf that has a key policy with the contents of the +policy:

    +
    +
    +
    +
    spec:
    +  cinder:
    +    enabled: true
    +    template:
    +      cinderAPI:
    +        customServiceConfig: |
    +           [oslo_policy]
    +           policy_file=/etc/cinder/api/policy.yaml
    +      extraMounts:
    +      - extraVol:
    +        - extraVolType: Ceph
    +          mounts:
    +          - mountPath: /etc/cinder/api
    +            name: policy
    +            readOnly: true
    +          propagation:
    +          - CinderAPI
    +          volumes:
    +          - name: policy
    +            projected:
    +              sources:
    +              - configMap:
    +                  name: my-cinder-conf
    +                  items:
    +                    - key: policy
    +                      path: policy.yaml
    +
    +
    +
  14. +
+
+
+

The quick and dirty process is more straightforward:

+
+
+
    +
  1. +

    Create an agnostic configuration file removing any specifics from the old +deployment’s cinder.conf file, like the connection in the [dabase] +section, the transport_url and log_dir in [DEFAULT], the whole +[coordination] section, etc..

    +
  2. +
  3. +

    Assuming the configuration has sensitive information, drop the modified +contents of the whole file into a Secret.

    +
  4. +
  5. +

    Reference this secret in all the services, creating a cinder volumes section +for each backend and just adding the respective enabled_backends option.

    +
  6. +
  7. +

    Add external files as mentioned in the last bullet of the tailor-made +configuration explanation.

    +
  8. +
+
+
+

Example of what the quick and dirty configuration patch would look like:

+
+
+
+
   spec:
+     cinder:
+       enabled: true
+       template:
+         cinderAPI:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderScheduler:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderBackup:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderVolume:
+           lvm1:
+             customServiceConfig: |
+               [DEFAULT]
+               enabled_backends = lvm1
+             customServiceConfigSecrets:
+               - cinder-conf
+           lvm2:
+             customServiceConfig: |
+               [DEFAULT]
+               enabled_backends = lvm2
+             customServiceConfigSecrets:
+               - cinder-conf
+
+
+
+
Configuration generation helper tool
+
+

Creating the right Cinder configuration files to deploy using Operators may +sometimes be a complicated experience, especially the first times, so we have a +helper tool that can create a draft of the files from a cinder.conf file.

+
+
+

This tool is not meant to be a automation tool, it’s mostly to help us get the +gist of it, maybe point out some potential pitfalls and reminders.

+
+
+ + + + + +
+
Important
+
+The tools requires PyYAML Python package to be installed (pip +install PyYAML). +
+
+
+

This cinder-cfg.py script defaults to reading the +cinder.conf file from the current directory (unless --config option is used) +and outputs files to the current directory (unless --out-dir option is used).

+
+
+

In the output directory we’ll always get a cinder.patch file with the Cinder +specific configuration patch to apply to the OpenStackControlPlane CR but we +may also get an additional file called cinder-prereq.yaml file with some +Secrets and MachineConfigs.

+
+
+

Example of an invocation setting input and output explicitly to the defaults for +a Ceph backend:

+
+
+
+
$ python cinder-cfg.py --config cinder.conf --out-dir ./
+WARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using "extraMounts" or remove the option.
+
+WARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.
+
+WARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.
+
+WARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!
+
+Output written at ./: cinder.patch
+
+
+
+

The script outputs some warnings to let us know things we may need to do +manually -adding the custom policy, provide the ceph configuration files- and +also let us know a change in how the service_user has been removed.

+
+
+

A different example when using multiple backends, one of them being a 3PAR FC +could be:

+
+
+
+
$ python cinder-cfg.py --config cinder.conf --out-dir ./
+WARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using "extraMounts" or remove the option.
+
+ERROR:root:Backend hpe_fc requires a vendor container image, but there is no certified image available yet. Patch will use the last known image for reference, but IT WILL NOT WORK
+
+WARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.
+
+WARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.
+
+WARNING:root:Configuration is using FC, please ensure all your OpenShift nodes have HBAs or use labels to ensure that Volume and Backup services are scheduled on nodes with HBAs.
+
+WARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!
+
+Output written at ./: cinder.patch, cinder-prereq.yaml
+
+
+
+

In this case we can see that there are additional messages, so let’s quickly go over them:

+
+
+
    +
  • +

    There’s one message mentioning how this backend driver needs external vendor +dependencies so the standard container image will not work. Unfortunately this +image is still not available, so an older image is used in the output patch file +for reference. We can then replace this image with one we build ourselves or +with a Red Hat official one once the image is available. In this case we can see +in our cinder.patch file:

    +
    +
    +
          cinderVolumes:
    +      hpe-fc:
    +        containerImage: registry.connect.redhat.com/hpe3parcinder/openstack-cinder-volume-hpe3parcinder17-0
    +
    +
    +
  • +
  • +

    The FC message reminds us that this transport protocol requires specific HBA +cards to be present on the nodes where cinder services are running.

    +
  • +
  • +

    In this case we also see that it has created the cinder-prereq.yaml file and +if we look into it we’ll see there is one MachineConfig and one Secret. The +MachineConfig is called 99-master-cinder-enable-multipathd and like the name +suggests enables multipathing on all the OCP worker nodes. The Secret is +called openstackcinder-volumes-hpe_fc and contains the 3PAR backend +configuration because it has sensitive information (credentials), and in the +cinder.patch file we’ll see that it uses this configuration:

    +
    +
    +
       cinderVolumes:
    +      hpe-fc:
    +        customServiceConfigSecrets:
    +        - openstackcinder-volumes-hpe_fc
    +
    +
    +
  • +
+
+
+
+
+

Procedure - Cinder adoption

+
+

Assuming we have already stopped cinder services, prepared the OpenShift nodes, +deployed the OpenStack operators and a bare OpenStack manifest, and migrated the +database, and prepared the patch manifest with the Cinder service configuration, +all that’s left is to apply the patch and wait for the operator to apply the +changes and deploy the Cinder services.

+
+
+

Our recommendation is to write the patch manifest into a file, for example +cinder.patch and then apply it with something like:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file=cinder.patch
+
+
+
+

For example, for the RBD deployment from the Development Guide the +cinder.patch would look like this:

+
+
+
+
spec:
+  extraMounts:
+  - extraVol:
+    - extraVolType: Ceph
+      mounts:
+      - mountPath: /etc/ceph
+        name: ceph
+        readOnly: true
+      propagation:
+      - CinderVolume
+      - CinderBackup
+      - Glance
+      volumes:
+      - name: ceph
+        projected:
+          sources:
+          - secret:
+              name: ceph-conf-files
+  cinder:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      cinderAPI:
+        override:
+          service:
+            internal:
+              metadata:
+                annotations:
+                  metallb.universe.tf/address-pool: internalapi
+                  metallb.universe.tf/allow-shared-ip: internalapi
+                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+        replicas: 1
+        customServiceConfig: |
+          [DEFAULT]
+          default_volume_type=tripleo
+      cinderScheduler:
+        replicas: 1
+      cinderBackup:
+        networkAttachments:
+        - storage
+        replicas: 1
+        customServiceConfig: |
+          [DEFAULT]
+          backup_driver=cinder.backup.drivers.ceph.CephBackupDriver
+          backup_ceph_conf=/etc/ceph/ceph.conf
+          backup_ceph_user=openstack
+          backup_ceph_pool=backups
+      cinderVolumes:
+        ceph:
+          networkAttachments:
+          - storage
+          replicas: 1
+          customServiceConfig: |
+            [tripleo_ceph]
+            backend_host=hostgroup
+            volume_backend_name=tripleo_ceph
+            volume_driver=cinder.volume.drivers.rbd.RBDDriver
+            rbd_ceph_conf=/etc/ceph/ceph.conf
+            rbd_user=openstack
+            rbd_pool=volumes
+            rbd_flatten_volume_from_snapshot=False
+            report_discard_supported=True
+
+
+
+

Once the services have been deployed we’ll need to clean up the old scheduler +and backup services which will appear as being down while we have others that +appear as being up:

+
+
+
+
openstack volume service list
+
++------------------+------------------------+------+---------+-------+----------------------------+
+| Binary           | Host                   | Zone | Status  | State | Updated At                 |
++------------------+------------------------+------+---------+-------+----------------------------+
+| cinder-backup    | standalone.localdomain | nova | enabled | down  | 2023-06-28T11:00:59.000000 |
+| cinder-scheduler | standalone.localdomain | nova | enabled | down  | 2023-06-28T11:00:29.000000 |
+| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | up    | 2023-06-28T17:00:03.000000 |
+| cinder-scheduler | cinder-scheduler-0     | nova | enabled | up    | 2023-06-28T17:00:02.000000 |
+| cinder-backup    | cinder-backup-0        | nova | enabled | up    | 2023-06-28T17:00:01.000000 |
++------------------+------------------------+------+---------+-------+----------------------------+
+
+
+
+

In this case we need to remove services for hosts standalone.localdomain

+
+
+
+
oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-backup standalone.localdomain
+oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-scheduler standalone.localdomain
+
+
+
+

The reason why we haven’t preserved the name of the backup service is because +we have taken the opportunity to change its configuration to support +Active-Active, even though we are not doing so right now because we have 1 +replica.

+
+
+

Now that we have the Cinder services running we know that the DB schema +migration has been completed and we can proceed to apply the DB data migrations. +While it is not necessary to run these data migrations at this precise moment, +because we can just run them right before the next upgrade, we consider that for +adoption it’s best to run them now to make sure there are no issues before +running production workloads on the deployment.

+
+
+

The command to run the DB data migrations is:

+
+
+
+
oc exec -it cinder-scheduler-0 -- cinder-manage db online_data_migrations
+
+
+
+
+

Post-checks

+
+

Before we can run any checks we need to set the right cloud configuration for +the openstack command to be able to connect to our OpenShift control plane.

+
+
+

Just like we did in the KeyStone adoption step we ensure we have the openstack alias defined:

+
+
+
+
alias openstack="oc exec -t openstackclient -- openstack"
+
+
+
+

Now we can run a set of tests to confirm that the deployment is there using our +old database contents:

+
+
+
    +
  • +

    See that Cinder endpoints are defined and pointing to the podified +FQDNs:

    +
    +
    +
    openstack endpoint list --service cinderv3
    +
    +
    +
  • +
  • +

    Check that the cinder services are running and up. The API won’t show but if +you get a response you know it’s up as well:

    +
    +
    +
    openstack volume service list
    +
    +
    +
  • +
  • +

    Check that our old volume types, volumes, snapshots, and backups are there:

    +
    +
    +
    openstack volume type list
    +openstack volume list
    +openstack volume snapshot list
    +openstack volume backup list
    +
    +
    +
  • +
+
+
+

To confirm that everything not only looks good but it’s also properly working +we recommend doing some basic operations:

+
+
+
    +
  • +

    Create a volume from an image to check that the connection to glance is +working.

    +
    +
    +
    openstack volume create --image cirros --bootable --size 1 disk_new
    +
    +
    +
  • +
  • +

    Backup the old attached volume to a new backup. Example:

    +
    +
    +
    openstack --os-volume-api-version 3.47 volume create --backup backup restored
    +
    +
    +
  • +
+
+
+

We don’t boot a nova instance using the new volume from image or try to detach +the old volume because nova and cinder are still not connected.

+
+
+
+
+

Horizon adoption

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, Memcached and +keystone should be already adopted.

    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Procedure - Horizon adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Horizon:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  horizon:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      memcachedInstance: memcached
    +      secret: osp-secret
    +'
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Horizon instance is successfully deployed and ready

    +
  • +
+
+
+
+
oc get horizon
+
+
+
+
    +
  • +

    Check that dashboard is reachable and returns status code 200

    +
  • +
+
+
+
+
PUBLIC_URL=$(oc get horizon horizon -o jsonpath='{.status.endpoint}')
+curl --silent --output /dev/stderr --head --write-out "%{http_code}" "$PUBLIC_URL/dashboard/auth/login/?next=/dashboard/" | grep 200
+
+
+
+
+
+

Manila adoption

+
+

OpenStack Manila is the Shared File Systems service. It provides OpenStack +users with a self-service API to create and manage file shares. File +shares (or simply, "shares"), are built for concurrent read/write access by +any number of clients. This, coupled with the inherent elasticity of the +underlying storage makes the Shared File Systems service essential in +cloud environments with require RWX ("read write many") persistent storage.

+
+
+

Networking

+
+

File shares in OpenStack are accessed directly over a network. Hence, it is +essential to plan the networking of the cloud to create a successful and +sustainable orchestration layer for shared file systems.

+
+
+

Manila supports two levels of storage networking abstractions - one where +users can directly control the networking for their respective file shares; +and another where the storage networking is configured by the OpenStack +administrator. It is important to ensure that the networking in the Red Hat +OpenStack Platform 17.1 matches the network plans for your new cloud after +adoption. This ensures that tenant workloads remain connected to +storage through the adoption process, even as the control plane suffers a +minor interruption. Manila’s control plane services are not in the data +path; and shutting down the API, scheduler and share manager services will +not impact access to existing shared file systems.

+
+
+

Typically, storage and storage device management networks are separate. +Manila services only need access to the storage device management network. +For example, if a Ceph cluster was used in the deployment, the "storage" +network refers to the Ceph cluster’s public network, and Manila’s share +manager service needs to be able to reach it.

+
+
+
+

Prerequisites

+
+
    +
  • +

    Ensure that manila systemd services (api, cron, scheduler) are +stopped.

    +
  • +
  • +

    Ensure that manila pacemaker services ("openstack-manila-share") are +stopped.

    +
  • +
  • +

    Ensure that the database migration has completed.

    +
  • +
  • +

    Ensure that OpenShift nodes where manila-share service will be deployed +can reach the management network that the storage system is in.

    +
  • +
  • +

    Ensure that services such as keystone +and memcached are available prior to +adopting manila services.

    +
  • +
  • +

    If tenant-driven networking was enabled (driver_handles_share_servers=True), +ensure that neutron has been deployed prior to +adopting manila services.

    +
  • +
+
+
+
+

Procedure - Manila adoption

+
+
Copying configuration from the RHOSP 17.1 deployment
+
+

Define the CONTROLLER1_SSH environment variable, if it hasn’t been +defined already. Then copy the +configuration file from RHOSP 17.1 for reference.

+
+
+
+
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/manila/etc/manila/manila.conf | awk '!/^ *#/ && NF' > ~/manila.conf
+
+
+
+

Review this configuration, alongside any configuration changes that were noted +since RHOSP 17.1. Not all of it makes sense to bring into the new cloud +environment:

+
+
+
    +
  • +

    The manila operator is capable of setting up database related configuration +([database]), service authentication (auth_strategy, +[keystone_authtoken]), message bus configuration +(transport_url, control_exchange), the default paste config +(api_paste_config) and inter-service communication configuration (` +, `[nova], [cinder], [glance] [oslo_messaging_*]). So +all of these can be ignored.

    +
  • +
  • +

    Ignore the osapi_share_listen configuration. In RHOSP 18, we rely on +OpenShift’s routes and ingress.

    +
  • +
  • +

    Pay attention to policy overrides. In RHOSP 18, manila ships with a secure +default RBAC, and overrides may not be necessary. Please review RBAC +defaults by using the Oslo policy generator +tool. If a custom policy is necessary, you must provide it as a +ConfigMap. The following sample spec illustrates how a +ConfigMap called manila-policy can be set up with the contents of a +file called policy.yaml.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        manilaAPI:
+          customServiceConfig: |
+             [oslo_policy]
+             policy_file=/etc/manila/policy.yaml
+        extraMounts:
+        - extraVol:
+          - extraVolType: Undefined
+            mounts:
+            - mountPath: /etc/manila/
+              name: policy
+              readOnly: true
+            propagation:
+            - ManilaAPI
+            volumes:
+            - name: policy
+              projected:
+                sources:
+                - configMap:
+                    name: manila-policy
+                    items:
+                      - key: policy
+                        path: policy.yaml
+
+
+
+
    +
  • +

    The Manila API service needs the enabled_share_protocols option to be +added in the customServiceConfig section in manila: template: manilaAPI.

    +
  • +
  • +

    If you had scheduler overrides, add them to the customServiceConfig +section in manila: template: manilaScheduler.

    +
  • +
  • +

    If you had multiple storage backend drivers configured with RHOSP 17.1, +you will need to split them up when deploying RHOSP 18. Each storage +backend driver needs to use its own instance of the manila-share +service.

    +
  • +
  • +

    If a storage backend driver needs a custom container image, find it on the +RHOSP Ecosystem Catalog +and set manila: template: manilaShares: <custom name> : containerImage +value. The following example illustrates multiple storage backend drivers, +using custom container images.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        manilaAPI:
+          customServiceConfig: |
+            [DEFAULT]
+            enabled_share_protocols = nfs
+          replicas: 3
+        manilaScheduler:
+          replicas: 3
+        manilaShares:
+         netapp:
+           customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends = netapp
+             [netapp]
+             driver_handles_share_servers = False
+             share_backend_name = netapp
+             share_driver = manila.share.drivers.netapp.common.NetAppDriver
+             netapp_storage_family = ontap_cluster
+             netapp_transport_type = http
+           replicas: 1
+         pure:
+            customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends=pure-1
+             [pure-1]
+             driver_handles_share_servers = False
+             share_backend_name = pure-1
+             share_driver = manila.share.drivers.purestorage.flashblade.FlashBladeShareDriver
+             flashblade_mgmt_vip = 203.0.113.15
+             flashblade_data_vip = 203.0.10.14
+            containerImage: registry.connect.redhat.com/purestorage/openstack-manila-share-pure-rhosp-18-0
+            replicas: 1
+
+
+
+
    +
  • +

    If providing sensitive information, such as passwords, hostnames and +usernames, it is recommended to use OpenShift secrets, and the +customServiceConfigSecrets key. An example:

    +
  • +
+
+
+
+
cat << __EOF__ > ~/netapp_secrets.conf
+
+[netapp]
+netapp_server_hostname = 203.0.113.10
+netapp_login = fancy_netapp_user
+netapp_password = secret_netapp_password
+netapp_vserver = mydatavserver
+__EOF__
+
+oc create secret generic osp-secret-manila-netapp --from-file=~/netapp_secrets.conf -n openstack
+
+
+
+
    +
  • +

    customConfigSecrets can be used in any service, the following is a +config example using the secret we created as above.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        < . . . >
+        manilaShares:
+         netapp:
+           customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends = netapp
+             [netapp]
+             driver_handles_share_servers = False
+             share_backend_name = netapp
+             share_driver = manila.share.drivers.netapp.common.NetAppDriver
+             netapp_storage_family = ontap_cluster
+             netapp_transport_type = http
+           customServiceConfigSecrets:
+             - osp-secret-manila-netapp
+           replicas: 1
+    < . . . >
+
+
+
+
    +
  • +

    If you need to present extra files to any of the services, you can use +extraMounts. For example, when using ceph, you’d need Manila’s ceph +user’s keyring file as well as the ceph.conf configuration file +available. These are mounted via extraMounts as done with the example +below.

    +
  • +
  • +

    Ensure that the names of the backends (share_backend_name) remain as they +did on RHOSP 17.1.

    +
  • +
  • +

    It is recommended to set the replica count of the manilaAPI service and +the manilaScheduler service to 3. You should ensure to set the replica +count of the manilaShares service/s to 1.

    +
  • +
  • +

    Ensure that the appropriate storage management network is specified in the +manilaShares section. The example below connects the manilaShares +instance with the CephFS backend driver to the storage network.

    +
  • +
+
+
+
+
Deploying the manila control plane
+
+

Patch OpenStackControlPlane to deploy Manila; here’s an example that uses +Native CephFS:

+
+
+
+
cat << __EOF__ > ~/manila.patch
+spec:
+  manila:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      manilaAPI:
+        override:
+          service:
+            internal:
+              metadata:
+                annotations:
+                  metallb.universe.tf/address-pool: internalapi
+                  metallb.universe.tf/allow-shared-ip: internalapi
+                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+    template:
+      manilaAPI:
+        replicas: 3
+        customServiceConfig: |
+          [DEFAULT]
+          enabled_share_protocols = cephfs
+      manilaScheduler:
+        replicas: 3
+      manilaShares:
+        cephfs:
+          replicas: 1
+          customServiceConfig: |
+            [DEFAULT]
+            enabled_share_backends = tripleo_ceph
+            [tripleo_ceph]
+            driver_handles_share_servers=False
+            share_backend_name=tripleo_ceph
+            share_driver=manila.share.drivers.cephfs.driver.CephFSDriver
+            cephfs_conf_path=/etc/ceph/ceph.conf
+            cephfs_auth_id=openstack
+            cephfs_cluster_name=ceph
+            cephfs_volume_mode=0755
+            cephfs_protocol_helper_type=CEPHFS
+          networkAttachments:
+              - storage
+__EOF__
+
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file=~/manila.patch
+
+
+
+
+
+

Post-checks

+
+
Inspect the resulting manila service pods
+
+
+
oc get pods -l service=manila
+
+
+
+
+
Check that Manila API service is registered in Keystone
+
+
+
openstack service list | grep manila
+
+
+
+
+
openstack endpoint list | grep manila
+
+| 1164c70045d34b959e889846f9959c0e | regionOne | manila       | share        | True    | internal  | http://manila-internal.openstack.svc:8786/v1/%(project_id)s        |
+| 63e89296522d4b28a9af56586641590c | regionOne | manilav2     | sharev2      | True    | public    | https://manila-public-openstack.apps-crc.testing/v2                |
+| af36c57adcdf4d50b10f484b616764cc | regionOne | manila       | share        | True    | public    | https://manila-public-openstack.apps-crc.testing/v1/%(project_id)s |
+| d655b4390d7544a29ce4ea356cc2b547 | regionOne | manilav2     | sharev2      | True    | internal  | http://manila-internal.openstack.svc:8786/v2                       |
+
+
+
+
+
Verify resources
+
+

We can now test the health of the service

+
+
+
+
openstack share service list
+openstack share pool list --detail
+
+
+
+

We can check on existing workloads

+
+
+
+
openstack share list
+openstack share snapshot list
+
+
+
+

We can create further resources

+
+
+
+
openstack share create cephfs 10 --snapshot mysharesnap --name myshareclone
+
+
+
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, the service databases +must already be imported into the podified MariaDB.

    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Pre-checks

+
+

TODO

+
+
+
+

Procedure - Ironic adoption

+
+

TODO

+
+
+
+

Post-checks

+
+

TODO

+
+
+
+
+

Heat adoption

+
+

Adopting Heat means that an existing OpenStackControlPlane CR, where Heat +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

After the adoption process has been completed, a user can expect that they +will then have CR’s for Heat, HeatAPI, HeatEngine and HeatCFNAPI. +Additionally, a user should have endpoints created within Keystone to facilitate +the above mentioned servies.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A OpenShift environment is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone +should be already adopted.

    +
  • +
  • +

    In addition, if your existing Heat stacks contain resources from other services +such as Neutron, Nova, Swift, etc. Those services should be adopted first before +trying to adopt Heat.

    +
  • +
+
+
+
+

Procedure - Heat adoption

+
+

As already done for Keystone, the Heat Adoption follows a similar pattern.

+
+
+

Patch the osp-secret to update the HeatAuthEncryptionKey and HeatPassword. This needs +to match what you have configured in the existing TripleO Heat configuration.

+
+
+

You can retrieve and verify the existing auth_encryption_key and service passwords via:

+
+
+
+
[stack@rhosp17 ~]$ grep -E 'HeatPassword|HeatAuth' ~/overcloud-deploy/overcloud/overcloud-passwords.yaml
+  HeatAuthEncryptionKey: Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2
+  HeatPassword: dU2N0Vr2bdelYH7eQonAwPfI3
+
+
+
+

And verifying on one of the Controllers that this is indeed the value in use:

+
+
+
+
[stack@rhosp17 ~]$ ansible -i overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml overcloud-controller-0 -m shell -a "grep auth_encryption_key /var/lib/config-data/puppet-generated/heat/etc/heat/heat.conf | grep -Ev '^#|^$'" -b
+overcloud-controller-0 | CHANGED | rc=0 >>
+auth_encryption_key=Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2
+
+
+
+

This password needs to be base64 encoded and added to the osp-secret

+
+
+
+
❯ echo Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2 | base64
+UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK
+
+❯ oc patch secret osp-secret --type='json' -p='[{"op" : "replace" ,"path" : "/data/HeatAuthEncryptionKey" ,"value" : "UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK"}]'
+secret/osp-secret patched
+
+
+
+

Patch OpenStackControlPlane to deploy Heat:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  heat:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      memcachedInstance: memcached
+      passwordSelectors:
+        authEncryptionKey: HeatAuthEncryptionKey
+        database: HeatDatabasePassword
+        service: HeatPassword
+'
+
+
+
+
+

Post-checks

+
+

Ensure all of the CR’s reach the "Setup Complete" state:

+
+
+
+
❯ oc get Heat,HeatAPI,HeatEngine,HeatCFNAPI
+NAME                           STATUS   MESSAGE
+heat.heat.openstack.org/heat   True     Setup complete
+
+NAME                                  STATUS   MESSAGE
+heatapi.heat.openstack.org/heat-api   True     Setup complete
+
+NAME                                        STATUS   MESSAGE
+heatengine.heat.openstack.org/heat-engine   True     Setup complete
+
+NAME                                        STATUS   MESSAGE
+heatcfnapi.heat.openstack.org/heat-cfnapi   True     Setup complete
+
+
+
+
Check that Heat service is registered in Keystone
+
+
+
 oc exec -it openstackclient -- openstack service list -c Name -c Type
++------------+----------------+
+| Name       | Type           |
++------------+----------------+
+| heat       | orchestration  |
+| glance     | image          |
+| heat-cfn   | cloudformation |
+| ceilometer | Ceilometer     |
+| keystone   | identity       |
+| placement  | placement      |
+| cinderv3   | volumev3       |
+| nova       | compute        |
+| neutron    | network        |
++------------+----------------+
+
+
+
+
+
❯ oc exec -it openstackclient -- openstack endpoint list --service=heat -f yaml
+- Enabled: true
+  ID: 1da7df5b25b94d1cae85e3ad736b25a5
+  Interface: public
+  Region: regionOne
+  Service Name: heat
+  Service Type: orchestration
+  URL: http://heat-api-public-openstack-operators.apps.okd.bne-shift.net/v1/%(tenant_id)s
+- Enabled: true
+  ID: 414dd03d8e9d462988113ea0e3a330b0
+  Interface: internal
+  Region: regionOne
+  Service Name: heat
+  Service Type: orchestration
+  URL: http://heat-api-internal.openstack-operators.svc:8004/v1/%(tenant_id)s
+
+
+
+
+
Check Heat engine services are up
+
+
+
 oc exec -it openstackclient -- openstack orchestration service list -f yaml
+- Binary: heat-engine
+  Engine ID: b16ad899-815a-4b0c-9f2e-e6d9c74aa200
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:01.000000'
+- Binary: heat-engine
+  Engine ID: 887ed392-0799-4310-b95c-ac2d3e6f965f
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:00.000000'
+- Binary: heat-engine
+  Engine ID: 26ed9668-b3f2-48aa-92e8-2862252485ea
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:00.000000'
+- Binary: heat-engine
+  Engine ID: 1011943b-9fea-4f53-b543-d841297245fd
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:01.000000'
+
+
+
+
+
Verify you can now see your Heat stacks again
+
+

We can now test that user can create networks, subnets, ports, routers etc.

+
+
+
+
❯ openstack stack list -f yaml
+- Creation Time: '2023-10-11T22:03:20Z'
+  ID: 20f95925-7443-49cb-9561-a1ab736749ba
+  Project: 4eacd0d1cab04427bc315805c28e66c9
+  Stack Name: test-networks
+  Stack Status: CREATE_COMPLETE
+  Updated Time: null
+
+
+
+
+
+
+

Telemetry adoption

+
+

Adopting Telemetry means that an existing OpenStackControlPlane CR, where Telmetry +services are supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. MariaDB, Keystone and EDPM should be already adopted.

    +
  • +
+
+
+
+

Procedure - Telemetry adoption

+
+

Patch OpenStackControlPlane to deploy Ceilometer services:

+
+
+
+
cat << EOF > ceilometer_patch.yaml
+spec:
+  ceilometer:
+    enabled: true
+    template:
+      centralImage: quay.io/podified-antelope-centos9/openstack-ceilometer-central:current-podified
+      computeImage: quay.io/podified-antelope-centos9/openstack-ceilometer-compute:current-podified
+      customServiceConfig: |
+        [DEFAULT]
+        debug=true
+      ipmiImage: quay.io/podified-antelope-centos9/openstack-ceilometer-ipmi:current-podified
+      nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
+      notificationImage: quay.io/podified-antelope-centos9/openstack-ceilometer-notification:current-podified
+      secret: osp-secret
+      sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
+EOF
+
+
+
+
+
+

If you have previously backup your Openstack services configuration file from the old environment: +pull openstack configuration os-diff you can use os-diff to compare +and make sure the configuration is correct.

+
+
+
+
+
+
pushd os-diff
+./os-diff cdiff --service ceilometer -c /tmp/collect_tripleo_configs/ceilometer/etc/ceilometer/ceilometer.conf -o ceilometer_patch.yaml
+
+
+
+
+
+

This will producre the difference between both ini configuration files.

+
+
+
+
+

Patch OpenStackControlPlane to deploy Ceilometer services:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file ceilometer_patch.yaml
+
+
+
+
+

Post-checks

+
+
Inspect the resulting Ceilometer pods
+
+
+
CEILOMETETR_POD=`oc get pods -l service=ceilometer | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $CEILOMETETR_POD -c ceilometer-central-agent -- cat /etc/ceilometer/ceilometer.conf
+
+
+
+
+
Inspect the resulting Ceilometer IPMI agent pod on Data Plane nodes
+
+
+
podman ps | grep ceilometer-ipmi
+
+
+
+
+
Inspecting enabled pollsters
+
+
+
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml']}"  | base64 -d
+
+
+
+
+
Enabling pollsters according to requirements
+
+
+
cat << EOF > polling.yaml
+---
+sources:
+    - name: pollsters
+      interval: 300
+      meters:
+        - volume.size
+        - image.size
+        - cpu
+        - memory
+EOF
+
+oc patch secret ceilometer-config-data  --patch="{\"data\": { \"polling.yaml\": \"$(base64 -w0 polling.yaml)\"}}"
+
+
+
+
+
+
+

Autoscaling adoption

+
+

Adopting Autoscaling means that an existing OpenStackControlPlane CR, where Aodh +services are supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. MariaDB, Keystone, Heat and Telemetry +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Autoscaling adoption

+
+

Patch OpenStackControlPlane to deploy Autoscaling services:

+
+
+
+
cat << EOF > aodh_patch.yaml
+spec:
+  autoscaling:
+    enabled: true
+    prometheus:
+      deployPrometheus: false
+    aodh:
+      customServiceConfig: |
+        [DEFAULT]
+        debug=true
+      secret: osp-secret
+      apiImage: "quay.io/podified-antelope-centos9/openstack-aodh-api:current-podified"
+      evaluatorImage: "quay.io/podified-antelope-centos9/openstack-aodh-evaluator:current-podified"
+      notifierImage: "quay.io/podified-antelope-centos9/openstack-aodh-notifier:current-podified"
+      listenerImage: "quay.io/podified-antelope-centos9/openstack-aodh-listener:current-podified"
+      passwordSelectors:
+      databaseUser: aodh
+      databaseInstance: openstack
+      memcachedInstance: memcached
+EOF
+
+
+
+
+
+

If you have previously backup your Openstack services configuration file from the old environment: +pull openstack configuration os-diff you can use os-diff to compare +and make sure the configuration is correct.

+
+
+
+
+
+
pushd os-diff
+./os-diff cdiff --service aodh -c /tmp/collect_tripleo_configs/aodh/etc/aodh/aodh.conf -o aodh_patch.yaml
+
+
+
+
+
+

This will producre the difference between both ini configuration files.

+
+
+
+
+

Patch OpenStackControlPlane to deploy Aodh services:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file aodh_patch.yaml
+
+
+
+
+

Post-checks

+
+
If autoscaling services are enabled inspect Aodh pods
+
+
+
AODH_POD=`oc get pods -l service=aodh | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $AODH_POD -c aodh-api -- cat /etc/aodh/aodh.conf
+
+
+
+
+
Check whether Aodh API service is registered in Keystone
+
+
+
openstack endpoint list | grep aodh
+| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | aodh      | network      | True    | public    | http://aodh-public-openstack.apps-crc.testing  |
+| b943243e596847a9a317c8ce1800fa98 | regionOne | aodh      | network      | True    | internal  | http://aodh-internal.openstack.svc:9696        |
+| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | aodh      | network      | True    | admin     | http://192.168.122.99:9696                     |
+
+
+
+
+
Create sample resources
+
+

We can now test that user can create alarms.

+
+
+
+
openstack alarm create \
+--name low_alarm \
+--type gnocchi_resources_threshold \
+--metric cpu \
+--resource-id b7ac84e4-b5ca-4f9e-a15c-ece7aaf68987 \
+--threshold 35000000000 \
+--comparison-operator lt \
+--aggregation-method rate:mean \
+--granularity 300 \
+--evaluation-periods 3 \
+--alarm-action 'log:\\' \
+--ok-action 'log:\\' \
+--resource-type instance
+
+
+
+
+
(TODO)
+
+
    +
  • +

    Include adopted autoscaling heat templates

    +
  • +
  • +

    Include adopted Aodh alarm create commands of type prometheus

    +
  • +
+
+
+
+
+
+

Stop other infrastructure management and compute services

+
+

Before we can start with the EDPM adoption we need to make sure that compute, +libvirt, load balancing, messaging and database services have been stopped on +the source cloud. Also, we need to disable repositories for +modular libvirt daemons on compute hosts.

+
+
+

After this step, the source cloud’s control plane can be decomissioned, +which is taking down only cloud controllers, database and messaging nodes. +Nodes that must remain functional are those running the compute, storage, +or networker roles (in terms of composable roles covered by Tripleo Heat +Templates).

+
+
+

Variables

+
+

Define the shell variables used in the steps below. +Define the map of compute node name, IP pairs. +The values are just illustrative and refer to a single node standalone director deployment, +use values that are correct for your environment:

+
+
+
+
EDPM_PRIVATEKEY_PATH="~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa"
+declare -A computes
+computes=(
+  ["standalone.localdomain"]="192.168.122.100"
+  # ...
+)
+
+
+
+

We chose to use these ssh variables with the ssh commands instead of using +ansible to try to create instructions that are independent on where they are +running, but ansible commands could be used to achieve the same result if we +are in the right host, for example to stop a service:

+
+
+
+
. stackrc
+ansible -i $(which tripleo-ansible-inventory) Compute -m shell -a "sudo systemctl stop tripleo_virtqemud.service" -b
+
+
+
+
+

Stopping remaining services

+
+

Remove the conflicting repositories and packages (in case of a devsetup that +uses Standalone TripleO) from all compute hosts. That is required to install +libvirt packages, when these hosts become adopted as External DataPlane Managed +(EDPM) nodes, where modular libvirt daemons are no longer running in podman +containers.

+
+
+

These steps can be automated with a simple script that relies on the previously +defined environmental variables and function:

+
+
+
+
ComputeServicesToStop=(
+                "tripleo_nova_compute.service"
+                "tripleo_nova_libvirt.target"
+                "tripleo_nova_migration_target.service"
+                "tripleo_nova_virtlogd_wrapper.service"
+                "tripleo_nova_virtnodedevd.service"
+                "tripleo_nova_virtproxyd.service"
+                "tripleo_nova_virtqemud.service"
+                "tripleo_nova_virtsecretd.service"
+                "tripleo_nova_virtstoraged.service")
+
+PacemakerResourcesToStop=(
+                "galera-bundle"
+                "haproxy-bundle"
+                "rabbitmq-bundle")
+
+echo "Disabling systemd units and cleaning up for compute services"
+for i in "${!computes[@]}"; do
+    SSH_CMD="ssh -i $EDPM_PRIVATEKEY_PATH root@${computes[$i]}"
+    for service in ${ComputeServicesToStop[*]}; do
+        echo "Stopping the $service in compute $i"
+        if ${SSH_CMD} sudo systemctl is-active $service; then
+            ${SSH_CMD} sudo systemctl disable --now $service
+            ${SSH_CMD} test -f /etc/systemd/system/$service '||' sudo systemctl mask $service
+        fi
+    done
+done
+
+echo "Stopping pacemaker services"
+for i in {1..3}; do
+    SSH_CMD=CONTROLLER${i}_SSH
+    if [ ! -z "${!SSH_CMD}" ]; then
+        echo "Using controller $i to run pacemaker commands"
+        for resource in ${PacemakerResourcesToStop[*]}; do
+            if ${!SSH_CMD} sudo pcs resource config $resource; then
+                ${!SSH_CMD} sudo pcs resource disable $resource
+            fi
+        done
+        break
+    fi
+done
+
+
+
+
+
+

EDPM adoption

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed.

    +
  • +
  • +

    Remaining source cloud services are stopped on compute hosts.

    +
  • +
+
+
+
+
+

WARNING This step is a "point of no return" in the EDPM adoption +procedure. The source control plane and data plane services must not +be ever enabled back, after EDPM is deployed, and Podified control +plane has taken control over it!

+
+
+
+
+
+

Variables

+
+

Define the shell variables used in the Fast-forward upgrade steps below. +Set FIP to the floating IP address of the test VM pre-created earlier on the source cloud. +Define the map of compute node name, IP pairs. +The values are just illustrative, use values that are correct for your environment:

+
+
+
+
PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)
+
+alias openstack="oc exec -t openstackclient -- openstack"
+FIP=192.168.122.20
+declare -A computes
+export computes=(
+  ["standalone.localdomain"]="192.168.122.100"
+  # ...
+)
+
+
+
+
+

Pre-checks

+
+
    +
  • +

    Make sure the IPAM is configured

    +
  • +
+
+
+
+
oc apply -f - <<EOF
+apiVersion: network.openstack.org/v1beta1
+kind: NetConfig
+metadata:
+  name: netconfig
+spec:
+  networks:
+  - name: CtlPlane
+    dnsDomain: ctlplane.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 192.168.122.120
+        start: 192.168.122.100
+      - end: 192.168.122.200
+        start: 192.168.122.150
+      cidr: 192.168.122.0/24
+      gateway: 192.168.122.1
+  - name: InternalApi
+    dnsDomain: internalapi.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.17.0.250
+        start: 172.17.0.100
+      cidr: 172.17.0.0/24
+      vlan: 20
+  - name: External
+    dnsDomain: external.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 10.0.0.250
+        start: 10.0.0.100
+      cidr: 10.0.0.0/24
+      gateway: 10.0.0.1
+  - name: Storage
+    dnsDomain: storage.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.18.0.250
+        start: 172.18.0.100
+      cidr: 172.18.0.0/24
+      vlan: 21
+  - name: StorageMgmt
+    dnsDomain: storagemgmt.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.20.0.250
+        start: 172.20.0.100
+      cidr: 172.20.0.0/24
+      vlan: 23
+  - name: Tenant
+    dnsDomain: tenant.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.19.0.250
+        start: 172.19.0.100
+      cidr: 172.19.0.0/24
+      vlan: 22
+EOF
+
+
+
+
+

Procedure - EDPM adoption

+
+
    +
  • +

    Temporary fix until the OSP 17 backport of the stable compute UUID feature +lands.

    +
    +

    For each compute node grab the UUID of the compute service and write it too +the stable compute_id file in /var/lib/nova/ directory.

    +
    +
    +
    +
    for name in "${!computes[@]}";
    +do
    +  uuid=$(\
    +    openstack hypervisor show $name \
    +    -f value -c 'id'\
    +  )
    +  echo "Writing $uuid to /var/lib/nova/compute_id on $name"
    +  ssh \
    +    -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa \
    +    root@"${computes[$name]}" \
    +    "echo $uuid > /var/lib/nova/compute_id"
    +done
    +
    +
    +
  • +
  • +

    Create a ssh authentication secret for the EDPM nodes:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: v1
    +kind: Secret
    +metadata:
    +    name: dataplane-adoption-secret
    +    namespace: openstack
    +data:
    +    ssh-privatekey: |
    +$(cat ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa | base64 | sed 's/^/        /')
    +EOF
    +
    +
    +
  • +
  • +

    Generate an ssh key-pair nova-migration-ssh-key secret

    +
    +
    +
    cd "$(mktemp -d)"
    +ssh-keygen -f ./id -t ecdsa-sha2-nistp521 -N ''
    +oc get secret nova-migration-ssh-key || oc create secret generic nova-migration-ssh-key \
    +  -n openstack \
    +  --from-file=ssh-privatekey=id \
    +  --from-file=ssh-publickey=id.pub \
    +  --type kubernetes.io/ssh-auth
    +rm -f id*
    +cd -
    +
    +
    +
  • +
  • +

    Create a Nova Compute Extra Config service

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: v1
    +kind: ConfigMap
    +metadata:
    +  name: nova-compute-extraconfig
    +  namespace: openstack
    +data:
    +  19-nova-compute-cell1-workarounds.conf: |
    +    [workarounds]
    +    disable_compute_service_check_for_ffu=true
    +---
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneService
    +metadata:
    +  name: nova-compute-extraconfig
    +  namespace: openstack
    +spec:
    +  label: nova.compute.extraconfig
    +  configMaps:
    +    - nova-compute-extraconfig
    +  secrets:
    +    - nova-cell1-compute-config
    +    - nova-migration-ssh-key
    +  playbook: osp.edpm.nova
    +EOF
    +
    +
    +
    +

    The secret nova-cell<X>-compute-config is auto-generated for each +cell<X>. That secret, alongside nova-migration-ssh-key, should +always be specified for each custom OpenStackDataPlaneService related to Nova.

    +
    +
  • +
  • +

    Create a repo-setup service to configure Antelope repositories

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneService
    +metadata:
    +  name: repo-setup
    +  namespace: openstack
    +spec:
    +  label: dataplane.deployment.repo.setup
    +  play: |
    +    - hosts: all
    +      strategy: linear
    +      tasks:
    +        - name: Enable podified-repos
    +          become: true
    +          ansible.builtin.shell: |
    +            # TODO: Use subscription-manager and a valid OSP18 repos instead
    +            # This is a hack to deploy RDO Delorean repos to RHEL as if it were Centos 9 Stream
    +            set -euxo pipefail
    +            curl -sL https://github.com/openstack-k8s-operators/repo-setup/archive/refs/heads/main.tar.gz | tar -xz
    +            python3 -m venv ./venv
    +            PBR_VERSION=0.0.0 ./venv/bin/pip install ./repo-setup-main
    +            # This is required for FIPS enabled until trunk.rdoproject.org
    +            # is not being served from a centos7 host, tracked by
    +            # https://issues.redhat.com/browse/RHOSZUUL-1517
    +            dnf -y install crypto-policies
    +            update-crypto-policies --set FIPS:NO-ENFORCE-EMS
    +            # FIXME: perform dnf upgrade for other packages in EDPM ansible
    +            # here we only ensuring that decontainerized libvirt can start
    +            ./venv/bin/repo-setup current-podified -b antelope -d centos9 --stream
    +            dnf -y upgrade openstack-selinux
    +            rm -f /run/virtlogd.pid
    +            rm -rf repo-setup-main
    +EOF
    +
    +
    +
  • +
  • +

    Deploy OpenStackDataPlaneNodeSet:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneNodeSet
    +metadata:
    +  name: openstack
    +spec:
    +  networkAttachments:
    +      - ctlplane
    +  preProvisioned: true
    +  services:
    +    - repo-setup
    +    - download-cache
    +    - bootstrap
    +    - configure-network
    +    - validate-network
    +    - install-os
    +    - configure-os
    +    - run-os
    +    - libvirt
    +    - nova-compute-extraconfig
    +    - ovn
    +  env:
    +    - name: ANSIBLE_CALLBACKS_ENABLED
    +      value: "profile_tasks"
    +    - name: ANSIBLE_FORCE_COLOR
    +      value: "True"
    +  nodes:
    +    standalone:
    +      hostName: standalone
    +      ansible:
    +        ansibleHost: ${computes[standalone.localdomain]}
    +      networks:
    +      - defaultRoute: true
    +        fixedIP: ${computes[standalone.localdomain]}
    +        name: CtlPlane
    +        subnetName: subnet1
    +      - name: InternalApi
    +        subnetName: subnet1
    +      - name: Storage
    +        subnetName: subnet1
    +      - name: Tenant
    +        subnetName: subnet1
    +  nodeTemplate:
    +    ansibleSSHPrivateKeySecret: dataplane-adoption-secret
    +    managementNetwork: ctlplane
    +    ansible:
    +      ansibleUser: root
    +      ansiblePort: 22
    +      ansibleVars:
    +        service_net_map:
    +          nova_api_network: internalapi
    +          nova_libvirt_network: internalapi
    +
    +        # edpm_network_config
    +        # Default nic config template for a EDPM compute node
    +        # These vars are edpm_network_config role vars
    +        edpm_network_config_override: ""
    +        edpm_network_config_template: |
    +           ---
    +           {% set mtu_list = [ctlplane_mtu] %}
    +           {% for network in role_networks %}
    +           {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
    +           {%- endfor %}
    +           {% set min_viable_mtu = mtu_list | max %}
    +           network_config:
    +           - type: ovs_bridge
    +             name: {{ neutron_physical_bridge_name }}
    +             mtu: {{ min_viable_mtu }}
    +             use_dhcp: false
    +             dns_servers: {{ ctlplane_dns_nameservers }}
    +             domain: {{ dns_search_domains }}
    +             addresses:
    +             - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
    +             routes: {{ ctlplane_host_routes }}
    +             members:
    +             - type: interface
    +               name: nic1
    +               mtu: {{ min_viable_mtu }}
    +               # force the MAC address of the bridge to this interface
    +               primary: true
    +           {% for network in role_networks %}
    +             - type: vlan
    +               mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}
    +               vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}
    +               addresses:
    +               - ip_netmask:
    +                   {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}
    +               routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}
    +           {% endfor %}
    +
    +        edpm_network_config_hide_sensitive_logs: false
    +        #
    +        # These vars are for the network config templates themselves and are
    +        # considered EDPM network defaults.
    +        neutron_physical_bridge_name: br-ctlplane
    +        neutron_public_interface_name: eth0
    +        role_networks:
    +        - InternalApi
    +        - Storage
    +        - Tenant
    +        networks_lower:
    +          External: external
    +          InternalApi: internalapi
    +          Storage: storage
    +          Tenant: tenant
    +
    +        # edpm_nodes_validation
    +        edpm_nodes_validation_validate_controllers_icmp: false
    +        edpm_nodes_validation_validate_gateway_icmp: false
    +
    +        timesync_ntp_servers:
    +        - hostname: clock.redhat.com
    +        - hostname: clock2.redhat.com
    +
    +        edpm_ovn_controller_agent_image: quay.io/podified-antelope-centos9/openstack-ovn-controller:current-podified
    +        edpm_iscsid_image: quay.io/podified-antelope-centos9/openstack-iscsid:current-podified
    +        edpm_logrotate_crond_image: quay.io/podified-antelope-centos9/openstack-cron:current-podified
    +        edpm_nova_compute_container_image: quay.io/podified-antelope-centos9/openstack-nova-compute:current-podified
    +        edpm_nova_libvirt_container_image: quay.io/podified-antelope-centos9/openstack-nova-libvirt:current-podified
    +        edpm_ovn_metadata_agent_image: quay.io/podified-antelope-centos9/openstack-neutron-metadata-agent-ovn:current-podified
    +
    +        gather_facts: false
    +        enable_debug: false
    +        # edpm firewall, change the allowed CIDR if needed
    +        edpm_sshd_configure_firewall: true
    +        edpm_sshd_allowed_ranges: ['192.168.122.0/24']
    +        # SELinux module
    +        edpm_selinux_mode: enforcing
    +        plan: overcloud
    +
    +        # Do not attempt OVS 3.2 major upgrades here
    +        edpm_ovs_packages:
    +        - openvswitch3.1
    +EOF
    +
    +
    +
  • +
  • +

    Deploy OpenStackDataPlaneDeployment:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneDeployment
    +metadata:
    +  name: openstack
    +spec:
    +  nodeSets:
    +  - openstack
    +EOF
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check if all the Ansible EE pods reaches Completed status:

    +
    +
    +
      # watching the pods
    +  watch oc get pod -l app=openstackansibleee
    +
    +
    +
    +
    +
      # following the ansible logs with:
    +  oc logs -l app=openstackansibleee -f --max-log-requests 10
    +
    +
    +
  • +
  • +

    Wait for the dataplane node set to reach the Ready status:

    +
    +
    +
      oc wait --for condition=Ready osdpns/openstack --timeout=30m
    +
    +
    +
  • +
+
+
+
+

Nova compute services fast-forward upgrade from Wallaby to Antelope

+
+

Nova services rolling upgrade cannot be done during adoption, +there is in a lock-step with Nova control plane services, because those +are managed independently by EDPM ansible, and Kubernetes operators. +Nova service operator and OpenStack Dataplane operator ensure upgrading +is done independently of each other, by configuring +[upgrade_levels]compute=auto for Nova services. Nova control plane +services apply the change right after CR is patched. Nova compute EDPM +services will catch up the same config change with ansible deployment +later on.

+
+
+
+
+

NOTE: Additional orchestration happening around the FFU workarounds +configuration for Nova compute EDPM service is a subject of future changes.

+
+
+
+
+
    +
  • +

    Wait for cell1 Nova compute EDPM services version updated (it may take some time):

    +
    +
    +
      oc exec -it openstack-galera-0 -c galera -- mysql --user=root --password=${PODIFIED_DB_ROOT_PASSWORD} \
    +    -e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';"
    +
    +
    +
    +

    The above query should return an empty result as a completion criterion.

    +
    +
  • +
  • +

    Remove pre-FFU workarounds for Nova control plane services:

    +
    +
    +
      oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '
    +  spec:
    +    nova:
    +      template:
    +        cellTemplates:
    +          cell0:
    +            conductorServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +          cell1:
    +            metadataServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +            conductorServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +        apiServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +        metadataServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +        schedulerServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +  '
    +
    +
    +
  • +
  • +

    Wait for Nova control plane services' CRs to become ready:

    +
    +
    +
      oc wait --for condition=Ready --timeout=300s Nova/nova
    +
    +
    +
  • +
  • +

    Remove pre-FFU workarounds for Nova compute EDPM services:

    +
    +
    +
      oc apply -f - <<EOF
    +  apiVersion: v1
    +  kind: ConfigMap
    +  metadata:
    +    name: nova-compute-ffu
    +    namespace: openstack
    +  data:
    +    20-nova-compute-cell1-ffu-cleanup.conf: |
    +      [workarounds]
    +      disable_compute_service_check_for_ffu=false
    +  ---
    +  apiVersion: dataplane.openstack.org/v1beta1
    +  kind: OpenStackDataPlaneService
    +  metadata:
    +    name: nova-compute-ffu
    +    namespace: openstack
    +  spec:
    +    label: nova.compute.ffu
    +    configMaps:
    +      - nova-compute-ffu
    +    secrets:
    +      - nova-cell1-compute-config
    +      - nova-migration-ssh-key
    +    playbook: osp.edpm.nova
    +  ---
    +  apiVersion: dataplane.openstack.org/v1beta1
    +  kind: OpenStackDataPlaneDeployment
    +  metadata:
    +    name: openstack-nova-compute-ffu
    +    namespace: openstack
    +  spec:
    +    nodeSets:
    +      - openstack
    +    servicesOverride:
    +      - nova-compute-ffu
    +  EOF
    +
    +
    +
  • +
  • +

    Wait for Nova compute EDPM service to become ready:

    +
    +
    +
      oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m
    +
    +
    +
  • +
  • +

    Run Nova DB online migrations to complete FFU:

    +
    +
    +
      oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations
    +  oc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations
    +
    +
    +
  • +
  • +

    Verify if Nova services con stop the existing test VM instance:

    +
    +
    +
    ${BASH_ALIASES[openstack]} server list | grep -qF '| test | ACTIVE |' && openstack server stop test
    +${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |'
    +${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test | grep "it is in power state shutdown" || echo PASS
    +
    +
    +
  • +
  • +

    Verify if Nova services can start the existing test VM instance:

    +
    +
    +
    ${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |' && openstack server start test
    +${BASH_ALIASES[openstack]} server list | grep -F '| test | ACTIVE |'
    +${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test --fit-width -f json | jq -r '.state' | grep running
    +
    +
    +
  • +
+
+
+
+
+

Troubleshooting

+
+

This document contains information about various issues you might face +and how to solve them.

+
+
+

ErrImagePull due to missing authentication

+
+

The deployed containers pull the images from private containers registries that +can potentially return authentication errors like:

+
+
+
+
Failed to pull image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0":
+rpc error: code = Unknown desc = unable to retrieve auth token: invalid
+username/password: unauthorized: Please login to the Red Hat Registry using
+your Customer Portal credentials.
+
+
+
+

An example of a failed pod:

+
+
+
+
  Normal   Scheduled       3m40s                  default-scheduler  Successfully assigned openstack/rabbitmq-server-0 to worker0
+  Normal   AddedInterface  3m38s                  multus             Add eth0 [10.101.0.41/23] from ovn-kubernetes
+  Warning  Failed          2m16s (x6 over 3m38s)  kubelet            Error: ImagePullBackOff
+  Normal   Pulling         2m5s (x4 over 3m38s)   kubelet            Pulling image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0"
+  Warning  Failed          2m5s (x4 over 3m38s)   kubelet            Failed to pull image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0": rpc error: code  ... can be found here: https://access.redhat.com/RegistryAuthentication
+  Warning  Failed          2m5s (x4 over 3m38s)   kubelet            Error: ErrImagePull
+  Normal   BackOff         110s (x7 over 3m38s)   kubelet            Back-off pulling image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0"
+
+
+
+

To solve this issue we need to get a valid pull-secret from the official Red +Hat console site, +store this pull secret locally in a machine with access to the Kubernetes API +(service node), and then run:

+
+
+
+
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull_secret_location.json>
+
+
+
+

The previous command will make available the authentication information in all +the cluster’s compute nodes, then trigger a new pod deployment to pull the +container image with:

+
+
+
+
kubectl delete pod rabbitmq-server-0 -n openstack
+
+
+
+

And the pod should be able to pull the image successfully. For more +information about what container registries requires what type of +authentication, check the official +docs.

+
+
+
+
+
+
+

Ceph migration

+
+
+

Ceph RBD Migration

+
+

In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated +Storage nodes, the daemons living in the OpenStack control plane should be +moved/migrated into the existing external RHEL nodes (typically the compute +nodes for an HCI environment or dedicated storage nodes in all the remaining +use cases).

+
+
+

Requirements

+
+
    +
  • +

    Ceph is >= 5 and managed by cephadm/orchestrator.

    +
  • +
  • +

    Ceph NFS (ganesha) migrated from a TripleO based deployment to cephadm.

    +
  • +
  • +

    Both the Ceph public and cluster networks are propagated, via TripleO, to the target nodes.

    +
  • +
  • +

    Ceph Mons need to keep their IPs (to avoid cold migration).

    +
  • +
+
+
+
+

Scenario 1: Migrate mon and mgr from controller nodes

+
+

The goal of the first POC is to prove we are able to successfully drain a +controller node, in terms of ceph daemons, and move them to a different node. +The initial target of the POC is RBD only, which means we’re going to move only +mon and mgr daemons. For the purposes of this POC, we’ll deploy a ceph cluster +with only mon, mgrs, and osds to simulate the environment a customer will be in +before starting the migration. +The goal of the first POC is to ensure that:

+
+
+
    +
  • +

    We can keep the mon IP addresses moving them to the CephStorage nodes.

    +
  • +
  • +

    We can drain the existing controller nodes and shut them down.

    +
  • +
  • +

    We can deploy additional monitors to the existing nodes, promoting them as +_admin nodes that can be used by administrators to manage the ceph cluster +and perform day2 operations against it.

    +
  • +
  • +

    We can keep the cluster operational during the migration.

    +
  • +
+
+
+
Prerequisites
+
+

The Storage Nodes should be configured to have both storage and storage_mgmt +network to make sure we can use both Ceph public and cluster networks.

+
+
+

This step is the only one where the interaction with TripleO is required. From +17+ we don’t have to run any stack update, however, we have commands that +should be performed to run os-net-config on the bare-metal node and configure +additional networks.

+
+
+

Make sure the network is defined in metalsmith.yaml for the CephStorageNodes:

+
+
+
+
  - name: CephStorage
+    count: 2
+    instances:
+      - hostname: oc0-ceph-0
+        name: oc0-ceph-0
+      - hostname: oc0-ceph-1
+        name: oc0-ceph-1
+    defaults:
+      networks:
+        - network: ctlplane
+          vif: true
+        - network: storage_cloud_0
+            subnet: storage_cloud_0_subnet
+        - network: storage_mgmt_cloud_0
+            subnet: storage_mgmt_cloud_0_subnet
+      network_config:
+        template: templates/single_nic_vlans/single_nic_vlans_storage.j2
+
+
+
+

Then run:

+
+
+
+
openstack overcloud node provision \
+  -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \
+  --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam
+
+
+
+

Verify that the storage network is running on the node:

+
+
+
+
(undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a
+Warning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts.
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+
+
+
+
+
Migrate mon(s) and mgr(s) on the two existing CephStorage nodes
+
+

Create a ceph spec based on the default roles with the mon/mgr on the +controller nodes.

+
+
+
+
openstack overcloud ceph spec -o ceph_spec.yaml -y  \
+   --stack overcloud-0     overcloud-baremetal-deployed-0.yaml
+
+
+
+

Deploy the Ceph cluster

+
+
+
+
 openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \
+    --stack overcloud-0 -o deployed_ceph.yaml \
+    --network-data ~/oc0-network-data.yaml \
+    --ceph-spec ~/ceph_spec.yaml
+
+
+
+

Note:

+
+
+

The ceph_spec.yaml, which is the OSP-generated description of the ceph cluster, +will be used, later in the process, as the basic template required by cephadm +to update the status/info of the daemons.

+
+
+

Check the status of the cluster:

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_OK
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m)
+    mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk
+    osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host ls
+HOST              ADDR           LABELS          STATUS
+oc0-ceph-0        192.168.24.14  osd
+oc0-ceph-1        192.168.24.7   osd
+oc0-controller-0  192.168.24.15  _admin mgr mon
+oc0-controller-1  192.168.24.23  _admin mgr mon
+oc0-controller-2  192.168.24.13  _admin mgr mon
+
+
+
+

The goal of the next section is to migrate the oc0-controller-{1,2} daemons +into oc0-ceph-{0,1} as the very basic scenario that demonstrates we can +actually make this kind of migration using cephadm.

+
+
+
+
Migrate oc0-controller-1 into oc0-ceph-0
+
+

ssh into controller-0, then

+
+
+
+
cephadm shell -v /home/ceph-admin/specs:/specs
+
+
+
+

ssh into ceph-0, then

+
+
+
+
sudo “watch podman ps”  # watch the new mon/mgr being deployed here
+
+
+
+

(optional) if mgr is active in the source node, then:

+
+
+
+
ceph mgr fail <mgr instance>
+
+
+
+

From the cephadm shell, remove the labels on oc0-controller-1

+
+
+
+
    for label in mon mgr _admin; do
+           ceph orch host rm label oc0-controller-1 $label;
+    done
+
+
+
+

Add the missing labels to oc0-ceph-0

+
+
+
+
[ceph: root@oc0-controller-0 /]#
+> for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done
+Added label mon to host oc0-ceph-0
+Added label mgr to host oc0-ceph-0
+Added label _admin to host oc0-ceph-0
+
+
+
+

Drain and force-remove the oc0-controller-1 node

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1
+Scheduled to remove the following daemons from host 'oc0-controller-1'
+type                 id
+-------------------- ---------------
+mon                  oc0-controller-1
+mgr                  oc0-controller-1.mtxohd
+crash                oc0-controller-1
+
+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force
+Removed  host 'oc0-controller-1'
+
+[ceph: root@oc0-controller-0 /]# ceph orch host ls
+HOST              ADDR           LABELS          STATUS
+oc0-ceph-0        192.168.24.14  osd
+oc0-ceph-1        192.168.24.7   osd
+oc0-controller-0  192.168.24.15  mgr mon _admin
+oc0-controller-2  192.168.24.13  _admin mgr mon
+
+
+
+

If you have only 3 mon nodes, and the drain of the node doesn’t work as +expected (the containers are still there), then SSH to controller-1 and +force-purge the containers in the node:

+
+
+
+
[root@oc0-controller-1 ~]# sudo podman ps
+CONTAINER ID  IMAGE                                                                                        COMMAND               CREATED         STATUS             PORTS       NAMES
+5c1ad36472bc  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1
+3b14cc7bf4dd  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mgr.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd
+
+[root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force
+
+[root@oc0-controller-1 ~]# sudo podman ps
+CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
+
+
+
+ + + + + +
+
Note
+
+cephadm rm-cluster on a node that is not part of the cluster anymore has the +effect of removing all the containers and doing some cleanup on the filesystem. +
+
+
+

Before shutting the oc0-controller-1 down, move the IP address (on the same +network) to the oc0-ceph-0 node:

+
+
+
+
mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0]
+
+[root@oc0-controller-1 ~]# ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-ex    inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\       valid_lft forever preferred_lft forever
+6: vlan100    inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\       valid_lft forever preferred_lft forever
+7: vlan12    inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+8: vlan11    inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+9: vlan13    inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\       valid_lft forever preferred_lft forever
+10: vlan70    inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\       valid_lft forever preferred_lft forever
+11: vlan1    inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+12: vlan14    inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\       valid_lft forever preferred_lft forever
+
+
+
+

On the oc0-ceph-0:

+
+
+
+
[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+[heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11
+[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.121/32 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+
+
+
+

Poweroff oc0-controller-1.

+
+
+

Add the new mon on oc0-ceph-0 using the old IP address:

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121
+Deployed mon.oc0-ceph-0 on host 'oc0-ceph-0'
+
+
+
+

Check the new container in the oc0-ceph-0 node:

+
+
+
+
b581dc8bbb78  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-ceph-0...  24 seconds ago  Up 24 seconds ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0
+
+
+
+

On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec +removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:

+
+
+
+
cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file
+
+[ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml
+
+--- ceph_spec.yaml.bkp  2022-07-29 15:41:34.516329643 +0000
++++ ceph_spec.yaml      2022-07-29 15:28:26.455329643 +0000
+@@ -7,14 +7,6 @@
+ - mgr
+ service_type: host
+ ---
+-addr: 192.168.24.12
+-hostname: oc0-controller-1
+-labels:
+-- _admin
+-- mon
+-- mgr
+-service_type: host
+ ----
+ addr: 192.168.24.19
+ hostname: oc0-controller-2
+ labels:
+@@ -38,7 +30,7 @@
+ placement:
+   hosts:
+   - oc0-controller-0
+-  - oc0-controller-1
++  - oc0-ceph-0
+   - oc0-controller-2
+ service_id: mon
+ service_name: mon
+@@ -47,8 +39,8 @@
+ placement:
+   hosts:
+   - oc0-controller-0
+-  - oc0-controller-1
+   - oc0-controller-2
++  - oc0-ceph-0
+ service_id: mgr
+ service_name: mgr
+ service_type: mgr
+
+
+
+

Apply the resulting spec:

+
+
+
+
ceph orch apply -i ceph_spec.yaml
+
+ The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm
+
+[ceph: root@oc0-controller-0 specs]# ceph orch ls
+NAME                     PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
+crash                               4/4  5m ago     61m  *
+mgr                                 3/3  5m ago     69s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
+mon                                 3/3  5m ago     70s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
+osd.default_drive_group               8  2m ago     69s  oc0-ceph-0;oc0-ceph-1
+
+[ceph: root@oc0-controller-0 specs]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_WARN
+            1 stray host(s) with 1 daemon(s) not managed by cephadm
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m)
+    mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb
+    osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+

Fix the warning by refreshing the mgr:

+
+
+
+
ceph mgr fail oc0-controller-0.xzgtvo
+
+
+
+

And at this point the cluster is clean:

+
+
+
+
[ceph: root@oc0-controller-0 specs]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_OK
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m)
+    mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb
+    osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+

oc0-controller-1 has been removed and powered off without leaving traces on the ceph cluster.

+
+
+

The same approach and the same steps can be applied to migrate oc0-controller-2 to oc0-ceph-1.

+
+
+
+
Screen Recording:
+ +
+
+
+

What’s next

+ +
+
+

Useful resources

+ +
+
+
+

Ceph RGW Migration

+
+

In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated +Storage nodes, the RGW daemons living in the OpenStack Controller nodes will be +migrated into the existing external RHEL nodes (typically the Compute nodes +for an HCI environment or CephStorage nodes in the remaining use cases).

+
+
+

Requirements

+
+
    +
  • +

    Ceph is >= 5 and managed by cephadm/orchestrator

    +
  • +
  • +

    An undercloud is still available: nodes and networks are managed by TripleO

    +
  • +
+
+
+
+

Ceph Daemon Cardinality

+
+

Ceph 5+ applies strict constraints in the way daemons can be colocated +within the same node. The resulting topology depends on the available hardware, +as well as the amount of Ceph services present in the Controller nodes which are +going to be retired. The following document describes the procedure required +to migrate the RGW component (and keep an HA model using the Ceph Ingress +daemon in a common TripleO scenario where Controller nodes represent the +spec placement where the service is deployed. As a general rule, the +number of services that can be migrated depends on the number of available +nodes in the cluster. The following diagrams cover the distribution of the Ceph +daemons on the CephStorage nodes where at least three nodes are required in a +scenario that sees only RGW and RBD (no dashboard):

+
+
+
+
|    |                     |             |
+|----|---------------------|-------------|
+| osd | mon/mgr/crash      | rgw/ingress |
+| osd | mon/mgr/crash      | rgw/ingress |
+| osd | mon/mgr/crash      | rgw/ingress |
+
+
+
+

With dashboard, and without Manila at least four nodes are required (dashboard +has no failover):

+
+
+
+
|     |                     |             |
+|-----|---------------------|-------------|
+| osd | mon/mgr/crash | rgw/ingress       |
+| osd | mon/mgr/crash | rgw/ingress       |
+| osd | mon/mgr/crash | dashboard/grafana |
+| osd | rgw/ingress   | (free)            |
+
+
+
+

With dashboard and Manila 5 nodes minimum are required (and dashboard has no +failover):

+
+
+
+
|     |                     |                         |
+|-----|---------------------|-------------------------|
+| osd | mon/mgr/crash       | rgw/ingress             |
+| osd | mon/mgr/crash       | rgw/ingress             |
+| osd | mon/mgr/crash       | mds/ganesha/ingress     |
+| osd | rgw/ingress         | mds/ganesha/ingress     |
+| osd | mds/ganesha/ingress | dashboard/grafana       |
+
+
+
+
+

Current Status

+
+
+
(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+
+
+    +------------------------+    +----------------+
+    | IP Addresses           |    |  Hostname      |
+    +------------------------+    +----------------+
+    | ctlplane=192.168.24.25 |    | cephstorage-0  |
+    | ctlplane=192.168.24.10 |    | cephstorage-1  |
+    | ctlplane=192.168.24.32 |    | cephstorage-2  |
+    | ctlplane=192.168.24.28 |    | compute-0      |
+    | ctlplane=192.168.24.26 |    | compute-1      |
+    | ctlplane=192.168.24.43 |    | controller-0   |
+    | ctlplane=192.168.24.7  |    | controller-1   |
+    | ctlplane=192.168.24.41 |    | controller-2   |
+    +------------------------+    +----------------+
+
+
+
+

SSH into controller-0 and check the pacemaker status: this will help +identify the relevant information that we need to know before starting the +RGW migration.

+
+
+
+
Full List of Resources:
+  * ip-192.168.24.46	(ocf:heartbeat:IPaddr2):     	Started controller-0
+  * ip-10.0.0.103   	(ocf:heartbeat:IPaddr2):     	Started controller-1
+  * ip-172.17.1.129 	(ocf:heartbeat:IPaddr2):     	Started controller-2
+  * ip-172.17.3.68  	(ocf:heartbeat:IPaddr2):     	Started controller-0
+  * ip-172.17.4.37  	(ocf:heartbeat:IPaddr2):     	Started controller-1
+  * Container bundle set: haproxy-bundle
+
+[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:
+    * haproxy-bundle-podman-0   (ocf:heartbeat:podman):  Started controller-2
+    * haproxy-bundle-podman-1   (ocf:heartbeat:podman):  Started controller-0
+    * haproxy-bundle-podman-2   (ocf:heartbeat:podman):  Started controller-1
+
+
+
+

Use the ip command to identify the ranges of the storage networks.

+
+
+
+
[heat-admin@controller-0 ~]$ ip -o -4 a
+
+1: lo	inet 127.0.0.1/8 scope host lo\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+7: br-ex	inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+8: vlan70	inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\   	valid_lft forever preferred_lft forever
+8: vlan70	inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\   	valid_lft forever preferred_lft forever
+9: vlan50	inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\   	valid_lft forever preferred_lft forever
+10: vlan30	inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+10: vlan30	inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+11: vlan20	inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\   	valid_lft forever preferred_lft forever
+12: vlan40	inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\   	valid_lft forever preferred_lft forever
+
+
+
+

In this example:

+
+
+
    +
  • +

    vlan30 represents the Storage Network, where the new RGW instances should be +started on the CephStorage nodes

    +
  • +
  • +

    br-ex represents the External Network, which is where in the current +environment, haproxy has the frontend VIP assigned

    +
  • +
+
+
+
+

Prerequisite: check the frontend network (Controller nodes)

+
+

Identify the network that we previously had in haproxy and propagate it (via +TripleO) to the CephStorage nodes. This network is used to reserve a new VIP +that will be owned by Ceph and used as the entry point for the RGW service.

+
+
+

ssh into controller-0 and check the current HaProxy configuration until we +find ceph_rgw section:

+
+
+
+
$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
+
+...
+...
+listen ceph_rgw
+  bind 10.0.0.103:8080 transparent
+  bind 172.17.3.68:8080 transparent
+  mode http
+  balance leastconn
+  http-request set-header X-Forwarded-Proto https if { ssl_fc }
+  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
+  http-request set-header X-Forwarded-Port %[dst_port]
+  option httpchk GET /swift/healthcheck
+  option httplog
+  option forwardfor
+  server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2
+  server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2
+  server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2
+
+
+
+

Double check the network used as HaProxy frontend:

+
+
+
+
[controller-0]$ ip -o -4 a
+
+...
+7: br-ex	inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+...
+
+
+
+

As described in the previous section, the check on controller-0 shows that we +are exposing the services using the external network, which is not present in +the CephStorage nodes, and we need to propagate it via TripleO.

+
+
+
+

Propagate the HaProxy frontend network to CephStorage nodes

+
+

Change the nic template used to define the ceph-storage network interfaces and +add the new config section.

+
+
+
+
---
+network_config:
+- type: interface
+  name: nic1
+  use_dhcp: false
+  dns_servers: {{ ctlplane_dns_nameservers }}
+  addresses:
+  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
+  routes: {{ ctlplane_host_routes }}
+- type: vlan
+  vlan_id: {{ storage_mgmt_vlan_id }}
+  device: nic1
+  addresses:
+  - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }}
+  routes: {{ storage_mgmt_host_routes }}
+- type: interface
+  name: nic2
+  use_dhcp: false
+  defroute: false
+- type: vlan
+  vlan_id: {{ storage_vlan_id }}
+  device: nic2
+  addresses:
+  - ip_netmask: {{ storage_ip }}/{{ storage_cidr }}
+  routes: {{ storage_host_routes }}
+- type: ovs_bridge
+  name: {{ neutron_physical_bridge_name }}
+  dns_servers: {{ ctlplane_dns_nameservers }}
+  domain: {{ dns_search_domains }}
+  use_dhcp: false
+  addresses:
+  - ip_netmask: {{ external_ip }}/{{ external_cidr }}
+  routes: {{ external_host_routes }}
+  members:
+  - type: interface
+    name: nic3
+    primary: true
+
+
+
+

In addition, add the External Network to the baremetal.yaml file used by +metalsmith and run the overcloud node provision command passing the +--network-config option:

+
+
+
+
- name: CephStorage
+  count: 3
+  hostname_format: cephstorage-%index%
+  instances:
+  - hostname: cephstorage-0
+  name: ceph-0
+  - hostname: cephstorage-1
+  name: ceph-1
+  - hostname: cephstorage-2
+  name: ceph-2
+  defaults:
+  profile: ceph-storage
+  network_config:
+      template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2
+  networks:
+  - network: ctlplane
+      vif: true
+  - network: storage
+  - network: storage_mgmt
+  - network: external
+
+
+
+
+
(undercloud) [stack@undercloud-0]$
+
+openstack overcloud node provision
+   -o overcloud-baremetal-deployed-0.yaml
+   --stack overcloud
+   --network-config -y
+  $PWD/network/baremetal_deployment.yaml
+
+
+
+

Check the new network on the CephStorage nodes:

+
+
+
+
[root@cephstorage-0 ~]# ip -o -4 a
+
+1: lo	inet 127.0.0.1/8 scope host lo\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+11: vlan40	inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\   	valid_lft forever preferred_lft forever
+12: vlan30	inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+14: br-ex	inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+
+
+
+

And now it’s time to start migrating the RGW backends and build the ingress on +top of them.

+
+
+
+

Migrate the RGW backends

+
+

To match the cardinality diagram we use cephadm labels to refer to a group of +nodes where a given daemon type should be deployed.

+
+
+

Add the RGW label to the cephstorage nodes:

+
+
+
+
for i in 0 1 2; {
+    ceph orch host label add cephstorage-$i rgw;
+}
+
+
+
+
+
[ceph: root@controller-0 /]#
+
+for i in 0 1 2; {
+    ceph orch host label add cephstorage-$i rgw;
+}
+
+Added label rgw to host cephstorage-0
+Added label rgw to host cephstorage-1
+Added label rgw to host cephstorage-2
+
+[ceph: root@controller-0 /]# ceph orch host ls
+
+HOST       	ADDR       	LABELS      	STATUS
+cephstorage-0  192.168.24.54  osd rgw
+cephstorage-1  192.168.24.44  osd rgw
+cephstorage-2  192.168.24.30  osd rgw
+controller-0   192.168.24.45  _admin mon mgr
+controller-1   192.168.24.11  _admin mon mgr
+controller-2   192.168.24.38  _admin mon mgr
+
+6 hosts in cluster
+
+
+
+

During the overcloud deployment, RGW is applied at step2 +(external_deployment_steps), and a cephadm compatible spec is generated in +/home/ceph-admin/specs/rgw from the ceph_mkspec ansible module. +Find and patch the RGW spec, specifying the right placement using the labels +approach, and change the rgw backend port to 8090 to avoid conflicts +with the Ceph Ingress Daemon (*)

+
+
+
+
[root@controller-0 heat-admin]# cat rgw
+
+networks:
+- 172.17.3.0/24
+placement:
+  hosts:
+  - controller-0
+  - controller-1
+  - controller-2
+service_id: rgw
+service_name: rgw.rgw
+service_type: rgw
+spec:
+  rgw_frontend_port: 8080
+  rgw_realm: default
+  rgw_zone: default
+
+
+
+

Patch the spec replacing controller nodes with the label key

+
+
+
+
---
+networks:
+- 172.17.3.0/24
+placement:
+  label: rgw
+service_id: rgw
+service_name: rgw.rgw
+service_type: rgw
+spec:
+  rgw_frontend_port: 8090
+  rgw_realm: default
+  rgw_zone: default
+
+
+ +
+

Apply the new RGW spec using the orchestrator CLI:

+
+
+
+
$ cephadm shell -m /home/ceph-admin/specs/rgw
+$ cephadm shell -- ceph orch apply -i /mnt/rgw
+
+
+
+

Which triggers the redeploy:

+
+
+
+
...
+osd.9                     	cephstorage-2
+rgw.rgw.cephstorage-0.wsjlgx  cephstorage-0  172.17.3.23:8090   starting
+rgw.rgw.cephstorage-1.qynkan  cephstorage-1  172.17.3.26:8090   starting
+rgw.rgw.cephstorage-2.krycit  cephstorage-2  172.17.3.81:8090   starting
+rgw.rgw.controller-1.eyvrzw   controller-1   172.17.3.146:8080  running (5h)
+rgw.rgw.controller-2.navbxa   controller-2   172.17.3.66:8080   running (5h)
+
+...
+osd.9                     	cephstorage-2
+rgw.rgw.cephstorage-0.wsjlgx  cephstorage-0  172.17.3.23:8090  running (19s)
+rgw.rgw.cephstorage-1.qynkan  cephstorage-1  172.17.3.26:8090  running (16s)
+rgw.rgw.cephstorage-2.krycit  cephstorage-2  172.17.3.81:8090  running (13s)
+
+
+
+

At this point, we need to make sure that the new RGW backends are reachable on +the new ports, but we’re going to enable an IngressDaemon on port 8080 +later in the process. For this reason, ssh on each RGW node (the CephStorage +nodes) and add the iptables rule to allow connections to both 8080 and 8090 +ports in the CephStorage nodes.

+
+
+
+
iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment "ceph rgw ingress" -j ACCEPT
+
+iptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment "ceph rgw backends" -j ACCEPT
+
+for port in 8080 8090; {
+    for i in 25 10 32; {
+       ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \
+       -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \
+       -j ACCEPT;
+   }
+}
+
+
+
+

From a Controller node (e.g. controller-0) try to reach (curl) the rgw backends:

+
+
+
+
for i in 26 23 81; do {
+    echo "---"
+    curl 172.17.3.$i:8090;
+    echo "---"
+    echo
+done
+
+
+
+

And you should observe the following:

+
+
+
+
---
+Query 172.17.3.23
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+---
+Query 172.17.3.26
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+---
+Query 172.17.3.81
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+
+
+
NOTE
+
+

In case RGW backends are migrated in the CephStorage nodes, there’s no +“internalAPI” network(this is not true in the case of HCI). Reconfig the RGW +keystone endpoint, pointing to the external Network that has been propagated +(see the previous section)

+
+
+
+
[ceph: root@controller-0 /]# ceph config dump | grep keystone
+global   basic rgw_keystone_url  http://172.16.1.111:5000
+
+[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000
+
+
+
+
+
+

Deploy a Ceph IngressDaemon

+
+

HaProxy is managed by TripleO via Pacemaker: the three running instances at +this point will point to the old RGW backends, resulting in a wrong, not +working configuration. +Since we’re going to deploy the Ceph Ingress Daemon, the first thing to do +is remove the existing ceph_rgw config, clean up the config created by TripleO +and restart the service to make sure other services are not affected by this +change.

+
+
+

ssh on each Controller node and remove the following is the section from +/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:

+
+
+
+
listen ceph_rgw
+  bind 10.0.0.103:8080 transparent
+  mode http
+  balance leastconn
+  http-request set-header X-Forwarded-Proto https if { ssl_fc }
+  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
+  http-request set-header X-Forwarded-Port %[dst_port]
+  option httpchk GET /swift/healthcheck
+  option httplog
+  option forwardfor
+   server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2
+  server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2
+  server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2
+
+
+
+

Restart haproxy-bundle and make sure it’s started:

+
+
+
+
[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle
+haproxy-bundle successfully restarted
+
+
+[root@controller-0 ~]# sudo pcs status | grep haproxy
+
+  * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:
+    * haproxy-bundle-podman-0   (ocf:heartbeat:podman):  Started controller-0
+    * haproxy-bundle-podman-1   (ocf:heartbeat:podman):  Started controller-1
+    * haproxy-bundle-podman-2   (ocf:heartbeat:podman):  Started controller-2
+
+
+
+

Double check no process is bound to 8080 anymore`"

+
+
+
+
[root@controller-0 ~]# ss -antop | grep 8080
+[root@controller-0 ~]#
+
+
+
+

And the swift CLI should fail at this point:

+
+
+
+
(overcloud) [root@cephstorage-0 ~]# swift list
+
+HTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc41beb0430>: Failed to establish a new connection: [Errno 111] Connection refused'))
+
+
+
+

Now we can start deploying the Ceph IngressDaemon on the CephStorage nodes.

+
+
+

Set the required images for both HaProxy and Keepalived

+
+
+
+
[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3
+
+[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5
+
+
+
+

Prepare the ingress spec and mount it to cephadm:

+
+
+
+
$ sudo vim /home/ceph-admin/specs/rgw_ingress
+
+
+
+

and paste the following content:

+
+
+
+
---
+service_type: ingress
+service_id: rgw.rgw
+placement:
+  label: rgw
+spec:
+  backend_service: rgw.rgw
+  virtual_ip: 10.0.0.89/24
+  frontend_port: 8080
+  monitor_port: 8898
+  virtual_interface_networks:
+    - 10.0.0.0/24
+
+
+
+

Mount the generated spec and apply it using the orchestrator CLI:

+
+
+
+
$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress
+$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress
+
+
+
+

Wait until the ingress is deployed and query the resulting endpoint:

+
+
+
+
[ceph: root@controller-0 /]# ceph orch ls
+
+NAME                 	PORTS            	RUNNING  REFRESHED  AGE  PLACEMENT
+crash                                         	6/6  6m ago 	3d   *
+ingress.rgw.rgw      	10.0.0.89:8080,8898  	6/6  37s ago	60s  label:rgw
+mds.mds                   3/3  6m ago 	3d   controller-0;controller-1;controller-2
+mgr                       3/3  6m ago 	3d   controller-0;controller-1;controller-2
+mon                       3/3  6m ago 	3d   controller-0;controller-1;controller-2
+osd.default_drive_group   15  37s ago	3d   cephstorage-0;cephstorage-1;cephstorage-2
+rgw.rgw   ?:8090          3/3  37s ago	4m   label:rgw
+
+
+
+
+
[ceph: root@controller-0 /]# curl  10.0.0.89:8080
+
+---
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>[ceph: root@controller-0 /]#
+—
+
+
+
+

The result above shows that we’re able to reach the backend from the +IngressDaemon, which means we’re almost ready to interact with it using the +swift CLI.

+
+
+
+

Update the object-store endpoints

+
+

The endpoints still point to the old VIP owned by pacemaker, but given it’s +still used by other services and we reserved a new VIP on the same network, +before any other action we should update the object-store endpoint.

+
+
+

List the current endpoints:

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object
+
+| 1326241fb6b6494282a86768311f48d1 | regionOne | swift    	| object-store   | True	| internal  | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |
+| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift    	| object-store   | True	| public	| http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s  |
+| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift    	| object-store   | True	| admin 	| http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |
+
+
+
+

Update the endpoints pointing to the Ingress VIP:

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url "http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s" 95596a2d92c74c15b83325a11a4f07a3
+
+(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store
+| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift    	| object-store   | True	| internal  | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |
+| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift    	| object-store   | True	| public	| http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s   |
+| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift    	| object-store   | True	| admin 	| http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |
+
+
+
+

And repeat the same action for both internal and admin. +Test the migrated service.

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ swift list --debug
+
+DEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint
+DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens
+DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000
+DEBUG:urllib3.connectionpool:http://10.0.0.115:5000 "POST /v3/auth/tokens HTTP/1.1" 201 7795
+DEBUG:keystoneclient.auth.identity.v3.base:{"token": {"methods": ["password"], "user": {"domain": {"id": "default", "name": "Default"}, "id": "6f87c7ffdddf463bbc633980cfd02bb3", "name": "admin", "password_expires_at": null},
+
+
+...
+...
+...
+
+DEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H "X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos" -H "Accept-Encoding: gzip"
+DEBUG:swiftclient:RESP STATUS: 200 OK
+DEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'}
+DEBUG:swiftclient:RESP BODY: b'[]'
+
+
+
+

Run tempest tests against object-storage:

+
+
+
+
(overcloud) [stack@undercloud-0 tempest-dir]$  tempest run --regex tempest.api.object_storage
+...
+...
+...
+======
+Totals
+======
+Ran: 141 tests in 606.5579 sec.
+ - Passed: 128
+ - Skipped: 13
+ - Expected Fail: 0
+ - Unexpected Success: 0
+ - Failed: 0
+Sum of execute time for each test: 657.5183 sec.
+
+==============
+Worker Balance
+==============
+ - Worker 0 (1 tests) => 0:10:03.400561
+ - Worker 1 (2 tests) => 0:00:24.531916
+ - Worker 2 (4 tests) => 0:00:10.249889
+ - Worker 3 (30 tests) => 0:00:32.730095
+ - Worker 4 (51 tests) => 0:00:26.246044
+ - Worker 5 (6 tests) => 0:00:20.114803
+ - Worker 6 (20 tests) => 0:00:16.290323
+ - Worker 7 (27 tests) => 0:00:17.103827
+
+
+
+
+

Additional Resources

+
+

A screen recording is available.

+
+
+
+
+
+
+ + + + + + + \ No newline at end of file