diff --git a/dev/.timestamp-images b/dev/.timestamp-images new file mode 100644 index 000000000..e69de29bb diff --git a/dev/images/.gitkeep b/dev/images/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/dev/index.html b/dev/index.html new file mode 100644 index 000000000..9d8361bca --- /dev/null +++ b/dev/index.html @@ -0,0 +1,1331 @@ + + + + + + + +Data Plane Adoption contributor documentation + + + + + + +
+
+

Development environment

+
+
+

The Adoption development environment utilizes +install_yamls +for CRC VM creation and for creation of the VM that hosts the source +Wallaby (or OSP 17.1) OpenStack in Standalone configuration.

+
+
+

Environment prep

+
+

Get dataplane adoption repo:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/data-plane-adoption.git ~/data-plane-adoption
+
+
+
+

Get install_yamls:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/install_yamls.git ~/install_yamls
+
+
+
+

Install tools for operator development:

+
+
+
+
cd ~/install_yamls/devsetup
+make download_tools
+
+
+
+
+

Deployment of CRC with network isolation

+
+
+
cd ~/install_yamls/devsetup
+PULL_SECRET=$HOME/pull-secret.txt CPUS=12 MEMORY=40000 DISK=100 make crc
+
+eval $(crc oc-env)
+oc login -u kubeadmin -p 12345678 https://api.crc.testing:6443
+
+make crc_attach_default_interface
+
+
+
+
+

Development environment with Openstack ironic

+
+

Create the BMaaS network (crc-bmaas) and virtual baremetal nodes controlled by +a RedFish BMC emulator.

+
+
+
+
cd ~/install_yamls
+make nmstate
+make namespace
+cd devsetup  # back to install_yamls/devsetup
+make bmaas
+
+
+
+

A node definition YAML file to use with the openstack baremetal +create <file>.yaml command can be generated for the virtual baremetal +nodes by running the bmaas_generate_nodes_yaml make target. Store it +in a temp file for later.

+
+
+
+
make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml
+
+
+
+

Set variables to deploy edpm Standalone with additional network +(baremetal) and compute driver ironic.

+
+
+
+
cat << EOF > /tmp/addtional_nets.json
+[
+  {
+    "type": "network",
+    "name": "crc-bmaas",
+    "standalone_config": {
+      "type": "ovs_bridge",
+      "name": "baremetal",
+      "mtu": 1500,
+      "vip": true,
+      "ip_subnet": "172.20.1.0/24",
+      "allocation_pools": [
+        {
+          "start": "172.20.1.100",
+          "end": "172.20.1.150"
+        }
+      ],
+      "host_routes": [
+        {
+          "destination": "192.168.130.0/24",
+          "nexthop": "172.20.1.1"
+        }
+      ]
+    }
+  }
+]
+EOF
+export EDPM_COMPUTE_ADDITIONAL_NETWORKS=$(jq -c . /tmp/addtional_nets.json)
+export STANDALONE_COMPUTE_DRIVER=ironic
+export NTP_SERVER=pool.ntp.org  # Only neccecary if not on the RedHat network ...
+export EDPM_COMPUTE_CEPH_ENABLED=false  # Optional
+
+
+
+
+

Use the install_yamls devsetup +to create a virtual machine (edpm-compute-0) connected to the isolated networks.

+
+
+

To use OSP 17.1 content to deploy TripleO Standalone, follow the +guide for setting up downstream content +for make standalone.

+
+
+

To use Wallaby content instead, run the following:

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone
+
+
+
+
+
+

Install the openstack-k8s-operators (openstack-operator)

+
+
+
cd ..  # back to install_yamls
+make crc_storage
+make input
+make openstack
+
+
+
+

Convenience steps

+
+

To make our life easier we can copy the deployment passwords we’ll be using +in the backend services deployment phase of the data plane adoption.

+
+
+
+
scp -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100:/root/tripleo-standalone-passwords.yaml ~/
+
+
+
+

If we want to be able to easily run openstack commands from the host without +actually installing the package and copying the configuration file from the VM +we can create a simple alias:

+
+
+
+
alias openstack="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 OS_CLOUD=standalone openstack"
+
+
+
+
+

Route networks

+
+

Route VLAN20 to have access to the MariaDB cluster:

+
+
+
+
EDPM_BRIDGE=$(sudo virsh dumpxml edpm-compute-0 | grep -oP "(?<=bridge=').*(?=')")
+sudo ip link add link $EDPM_BRIDGE name vlan20 type vlan id 20
+sudo ip addr add dev vlan20 172.17.0.222/24
+sudo ip link set up dev vlan20
+
+
+
+
+

Snapshot/revert

+
+

When the deployment of the Standalone OpenStack is finished, it’s a +good time to snapshot the machine, so that multiple Adoption attempts +can be done without having to deploy from scratch.

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_snapshot
+
+
+
+

And when you wish to revert the Standalone deployment to the +snapshotted state:

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_revert
+
+
+
+

Similar snapshot could be done for the CRC virtual machine, but the +developer environment reset on CRC side can be done sufficiently via +the install_yamls *_cleanup targets. This is further detailed in +the section: +Reset the environment to pre-adoption state

+
+
+
+

Create a workload to adopt

+
+
+
Ironic Steps
+
+
+
# Enroll baremetal nodes
+make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml
+scp -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa /tmp/ironic_nodes.yaml root@192.168.122.100:
+ssh -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100
+
+export OS_CLOUD=standalone
+openstack baremetal create /root/ironic_nodes.yaml
+export IRONIC_PYTHON_AGENT_RAMDISK_ID=$(openstack image show deploy-ramdisk -c id -f value)
+export IRONIC_PYTHON_AGENT_KERNEL_ID=$(openstack image show deploy-kernel -c id -f value)
+for node in $(openstack baremetal node list -c UUID -f value); do
+  openstack baremetal node set $node \
+    --driver-info deploy_ramdisk=${IRONIC_PYTHON_AGENT_RAMDISK_ID} \
+    --driver-info deploy_kernel=${IRONIC_PYTHON_AGENT_KERNEL_ID} \
+    --resource-class baremetal \
+    --property capabilities='boot_mode:uefi'
+done
+
+# Create a baremetal flavor
+openstack flavor create baremetal --ram 1024 --vcpus 1 --disk 15 \
+  --property resources:VCPU=0 \
+  --property resources:MEMORY_MB=0 \
+  --property resources:DISK_GB=0 \
+  --property resources:CUSTOM_BAREMETAL=1 \
+  --property capabilities:boot_mode="uefi"
+
+# Create image
+IMG=Fedora-Cloud-Base-38-1.6.x86_64.qcow2
+URL=https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/$IMG
+curl -o /tmp/${IMG} -L $URL
+DISK_FORMAT=$(qemu-img info /tmp/${IMG} | grep "file format:" | awk '{print $NF}')
+openstack image create --container-format bare --disk-format ${DISK_FORMAT} Fedora-Cloud-Base-38 < /tmp/${IMG}
+
+export BAREMETAL_NODES=$(openstack baremetal node list -c UUID -f value)
+# Manage nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal node manage $node
+done
+
+# Wait for nodes to reach "manageable" state
+watch openstack baremetal node list
+
+# Inspect baremetal nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal introspection start $node
+done
+
+# Wait for inspection to complete
+watch openstack baremetal introspection list
+
+# Provide nodes
+for node in $BAREMETAL_NODES; do
+  openstack baremetal node provide $node
+done
+
+# Wait for nodes to reach "available" state
+watch openstack baremetal node list
+
+# Create an instance on baremetal
+openstack server show baremetal-test || {
+    openstack server create baremetal-test --flavor baremetal --image Fedora-Cloud-Base-38 --nic net-id=provisioning --wait
+}
+
+# Check instance status and network connectivity
+openstack server show baremetal-test
+ping -c 4 $(openstack server show baremetal-test -f json -c addresses | jq -r .addresses.provisioning[0])
+
+
+
+
+
+
Virtual Machine Steps
+
+
+
cd ~/data-plane-adoption
+bash tests/roles/development_environment/templates/pre_launch.bash
+
+
+
+
+
+
Ceph Storage Steps
+
+

Make sure a cinder-volume backend is properly configured, or skip below steps +to create a test workload without volume attachments.

+
+
+

Confirm the image UUID can be seen in Ceph’s images pool.

+
+
+
+
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l
+
+
+
+

Create a Cinder volume, a backup from it, and snapshot it.

+
+
+
+
openstack volume create --image cirros --bootable --size 1 disk
+openstack volume backup create --name backup disk
+openstack volume snapshot create --volume disk snapshot
+
+
+
+

Add volume to the test VM

+
+
+
+
openstack server add volume test disk
+
+
+
+
+
+
+

Performing the Data Plane Adoption

+
+

The development environment is now set up, you can go to the Adoption +documentation +and perform adoption manually, or run the test +suite +against your environment.

+
+
+
+

Reset the environment to pre-adoption state

+
+

The development environment must be rolled back in case we want to execute another Adoption run.

+
+
+

Delete the data-plane and control-plane resources from the CRC vm

+
+
+
+
oc delete --ignore-not-found=true --wait=false openstackdataplanedeployment/openstack
+oc delete --ignore-not-found=true --wait=false openstackdataplanedeployment/openstack-nova-compute-ffu
+oc delete --ignore-not-found=true --wait=false openstackcontrolplane/openstack
+oc patch openstackcontrolplane openstack --type=merge --patch '
+metadata:
+  finalizers: []
+' || true
+
+while oc get pod | grep rabbitmq-server-0; do
+    sleep 2
+done
+while oc get pod | grep openstack-galera-0; do
+    sleep 2
+done
+
+oc delete --wait=false pod ovn-copy-data || true
+oc delete secret osp-secret || true
+
+
+
+

Revert the standalone vm to the snapshotted state

+
+
+
+
cd ~/install_yamls/devsetup
+make standalone_revert
+
+
+
+

Clean up and initialize the storage PVs in CRC vm

+
+
+
+
cd ..
+for i in {1..3}; do make crc_storage_cleanup crc_storage && break || sleep 5; done
+
+
+
+
+

Experimenting with an additional compute node

+
+

The following is not on the critical path of preparing the development +environment for Adoption, but it shows how to make the environment +work with an additional compute node VM.

+
+
+

The remaining steps should be completed on the hypervisor hosting crc +and edpm-compute-0.

+
+
+

Deploy NG Control Plane with Ceph

+
+

Export the Ceph configuration from edpm-compute-0 into a secret.

+
+
+
+
SSH=$(ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100)
+KEY=$($SSH "cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0")
+CONF=$($SSH "cat /etc/ceph/ceph.conf | base64 -w 0")
+
+cat <<EOF > ceph_secret.yaml
+apiVersion: v1
+data:
+  ceph.client.openstack.keyring: $KEY
+  ceph.conf: $CONF
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+type: Opaque
+EOF
+
+oc create -f ceph_secret.yaml
+
+
+
+

Deploy the NG control plane with Ceph as backend for Glance and +Cinder. As described in +the install_yamls README, +use the sample config located at +https://github.com/openstack-k8s-operators/openstack-operator/blob/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml +but make sure to replace the FSID in the sample with the one from +the secret created in the previous step.

+
+
+
+
curl -o /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml https://raw.githubusercontent.com/openstack-k8s-operators/openstack-operator/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data."ceph.conf"' | base64 -d | grep fsid | sed -e 's/fsid = //') && echo $FSID
+sed -i "s/_FSID_/${FSID}/" /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+oc apply -f /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml
+
+
+
+

A NG control plane which uses the same Ceph backend should now be +functional. If you create a test image on the NG system to confirm +it works from the configuration above, be sure to read the warning +in the next section.

+
+
+

Before beginning adoption testing or development you may wish to +deploy an EDPM node as described in the following section.

+
+
+
+

Warning about two OpenStacks and one Ceph

+
+

Though workloads can be created in the NG deployment to test, be +careful not to confuse them with workloads from the Wallaby cluster +to be migrated. The following scenario is now possible.

+
+
+

A Glance image exists on the Wallaby OpenStack to be adopted.

+
+
+
+
[stack@standalone standalone]$ export OS_CLOUD=standalone
+[stack@standalone standalone]$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 33a43519-a960-4cd0-a593-eca56ee553aa | cirros | active |
++--------------------------------------+--------+--------+
+[stack@standalone standalone]$
+
+
+
+

If you now create an image with the NG cluster, then a Glance image +will exsit on the NG OpenStack which will adopt the workloads of the +wallaby.

+
+
+
+
[fultonj@hamfast ng]$ export OS_CLOUD=default
+[fultonj@hamfast ng]$ export OS_PASSWORD=12345678
+[fultonj@hamfast ng]$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 4ebccb29-193b-4d52-9ffd-034d440e073c | cirros | active |
++--------------------------------------+--------+--------+
+[fultonj@hamfast ng]$
+
+
+
+

Both Glance images are stored in the same Ceph pool.

+
+
+
+
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l
+Inferring fsid 7133115f-7751-5c2f-88bd-fbff2f140791
+Using recent ceph image quay.rdoproject.org/tripleowallabycentos9/daemon@sha256:aa259dd2439dfaa60b27c9ebb4fb310cdf1e8e62aa7467df350baf22c5d992d8
+NAME                                       SIZE     PARENT  FMT  PROT  LOCK
+33a43519-a960-4cd0-a593-eca56ee553aa         273 B            2
+33a43519-a960-4cd0-a593-eca56ee553aa@snap    273 B            2  yes
+4ebccb29-193b-4d52-9ffd-034d440e073c       112 MiB            2
+4ebccb29-193b-4d52-9ffd-034d440e073c@snap  112 MiB            2  yes
+
+
+
+

However, as far as each Glance service is concerned each has one +image. Thus, in order to avoid confusion during adoption the test +Glance image on the NG OpenStack should be deleted.

+
+
+
+
openstack image delete 4ebccb29-193b-4d52-9ffd-034d440e073c
+
+
+
+

Connecting the NG OpenStack to the existing Ceph cluster is part of +the adoption procedure so that the data migration can be minimized +but understand the implications of the above example.

+
+
+
+

Deploy edpm-compute-1

+
+

edpm-compute-0 is not available as a standard EDPM system to be +managed by edpm-ansible +or +dataplane-operator +because it hosts the wallaby deployment which will be adopted +and after adoption it will only host the Ceph server.

+
+
+

Use the install_yamls devsetup +to create additional virtual machines and be sure +that the EDPM_COMPUTE_SUFFIX is set to 1 or greater. +Do not set EDPM_COMPUTE_SUFFIX to 0 or you could delete +the Wallaby system created in the previous section.

+
+
+

When deploying EDPM nodes add an extraMounts like the following in +the OpenStackDataPlaneNodeSet CR nodeTemplate so that they will be +configured to use the same Ceph cluster.

+
+
+
+
    edpm-compute:
+      nodeTemplate:
+        extraMounts:
+        - extraVolType: Ceph
+          volumes:
+          - name: ceph
+            secret:
+              secretName: ceph-conf-files
+          mounts:
+          - name: ceph
+            mountPath: "/etc/ceph"
+            readOnly: true
+
+
+
+

A NG data plane which uses the same Ceph backend should now be +functional. Be careful about not confusing new workloads to test the +NG OpenStack with the Wallaby OpenStack as described in the previous +section.

+
+
+
+

Begin Adoption Testing or Development

+
+

We should now have:

+
+
+
    +
  • +

    An NG glance service based on Antelope running on CRC

    +
  • +
  • +

    An TripleO-deployed glance serviced running on edpm-compute-0

    +
  • +
  • +

    Both services have the same Ceph backend

    +
  • +
  • +

    Each service has their own independent database

    +
  • +
+
+
+

An environment above is assumed to be available in the +Glance Adoption documentation. You +may now follow other Data Plane Adoption procedures described in the +documentation. +The same pattern can be applied to other services.

+
+
+
+
+
+
+

Contributing to documentation

+
+
+

Rendering documentation locally

+
+

Install docs build requirements into virtualenv:

+
+
+
+
python3 -m venv local/docs-venv
+source local/docs-venv/bin/activate
+pip install -r docs/doc_requirements.txt
+
+
+
+

Serve docs site on localhost:

+
+
+
+
mkdocs serve
+
+
+
+

Click the link it outputs. As you save changes to files modified in your editor, +the browser will automatically show the new content.

+
+
+
+

Patterns and tips for contributing to documentation

+
+
    +
  • +

    Pages concerning individual components/services should make sense in +the context of the broader adoption procedure. While adopting a +service in isolation is an option for developers, let’s write the +documentation with the assumption the adoption procedure is being +done in full, going step by step (one doc after another).

    +
  • +
  • +

    The procedure should be written with production use in mind. This +repository could be used as a starting point for product +technical documentation. We should not tie the documentation to +something that wouldn’t translate well from dev envs to production.

    +
    +
      +
    • +

      This includes not assuming that the source environment is +Standalone, and the destination is CRC. We can provide examples for +Standalone/CRC, but it should be possible to use the procedure +with fuller environments in a way that is obvious from the docs.

      +
    • +
    +
    +
  • +
  • +

    If possible, try to make code snippets copy-pastable. Use shell +variables if the snippets should be parametrized. Use oc rather +than kubectl in snippets.

    +
  • +
  • +

    Focus on the "happy path" in the docs as much as possible, +troubleshooting info can go into the Troubleshooting page, or +alternatively a troubleshooting section at the end of the document, +visibly separated from the main procedure.

    +
  • +
  • +

    The full procedure will inevitably happen to be quite long, so let’s +try to be concise in writing to keep the docs consumable (but not to +a point of making things difficult to understand or omitting +important things).

    +
  • +
  • +

    A bash alias can be created for long command however when implementing +them in the test roles you should transform them to avoid command not +found errors. +From:

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +openstack endpoint list | grep network
    +
    +
    +
    +

    To:

    +
    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +${BASH_ALIASES[openstack]} endpoint list | grep network
    +
    +
    +
  • +
+
+
+
+
+
+

Tests

+
+
+

Test suite information

+
+

The adoption docs repository also includes a test suite for Adoption. +There are targets in the Makefile which can be used to execute the +test suite:

+
+
+
    +
  • +

    test-minimal - a minimal test scenario, the eventual set of +services in this scenario should be the "core" services needed to +launch a VM. This scenario assumes local storage backend for +services like Glance and Cinder.

    +
  • +
  • +

    test-with-ceph - like 'minimal' but with Ceph storage backend for +Glance and Cinder.

    +
  • +
+
+
+
+

Configuring the test suite

+
+
    +
  • +

    Create tests/vars.yaml and tests/secrets.yaml by copying the +included samples (tests/vars.sample.yaml, +tests/secrets.sample.yaml).

    +
  • +
  • +

    Walk through the tests/vars.yaml and tests/secrets.yaml files +and see if you need to edit any values. If you are using the +documented development environment, majority of the defaults should +work out of the box. The comments in the YAML files will guide you +regarding the expected values. You may want to double check that +these variables suit your environment:

    +
    +
      +
    • +

      install_yamls_path

      +
    • +
    • +

      tripleo_passwords

      +
    • +
    • +

      controller*_ssh

      +
    • +
    • +

      edpm_privatekey_path

      +
    • +
    • +

      timesync_ntp_servers

      +
    • +
    +
    +
  • +
+
+
+
+

Running the tests

+
+

The interface between the execution infrastructure and the test suite +is an Ansible inventory and variables files. Inventory and variable +samples are provided. To run the tests, follow this procedure:

+
+
+
    +
  • +

    Install dependencies and create a venv:

    +
    +
    +
    sudo dnf -y install python-devel
    +python3 -m venv venv
    +source venv/bin/activate
    +pip install openstackclient osc_placement jmespath
    +ansible-galaxy collection install community.general
    +
    +
    +
  • +
  • +

    Run make test-with-ceph (the documented development environment +does include Ceph).

    +
    +

    If you are using Ceph-less environment, you should run make +test-minimal.

    +
    +
  • +
+
+
+
+

Making patches to the test suite

+
+

Please be aware of the following when changing the test suite:

+
+
+
    +
  • +

    The test suite should follow the docs as much as possible.

    +
    +

    The purpose of the test suite is to verify what the user would run +if they were following the docs. We don’t want to loosely rewrite +the docs into Ansible code following Ansible best practices. We want +to test the exact same bash commands/snippets that are written in +the docs. This often means that we should be using the shell +module and do a verbatim copy/paste from docs, instead of using the +best Ansible module for the task at hand.

    +
    +
  • +
+
+
+
+
+
+ + + + + + + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 000000000..108b3e977 --- /dev/null +++ b/index.html @@ -0,0 +1,9 @@ + + +

Data Plane Adoption documentation

+ + + diff --git a/user/.timestamp-images b/user/.timestamp-images new file mode 100644 index 000000000..e69de29bb diff --git a/user/images/.gitkeep b/user/images/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/user/index.html b/user/index.html new file mode 100644 index 000000000..93cf2fcd1 --- /dev/null +++ b/user/index.html @@ -0,0 +1,7707 @@ + + + + + + + +OpenStack adoption user documentation + + + + + + +
+
+

OpenStack adoption

+
+
+

Planning the new deployment

+
+

Just like you did back when you installed your Director deployed OpenStack, the +upgrade/migration to the podified OpenStack requires planning various aspects +of the environment such as node roles, planning your network topology, and +storage.

+
+
+

This document covers some of this planning, but it is recommended to read +the whole adoption guide before actually starting the process to be sure that +there is a global understanding of the whole process.

+
+
+

Service configurations

+
+

There is a fundamental difference between the Director and Operator deployments +regarding the configuration of the services.

+
+
+

In Director deployments many of the service configurations are abstracted by +Director specific configuration options. A single Director option may trigger +changes for multiple services and support for drivers (for example Cinder’s) +required patches to the Director code base.

+
+
+

In Operator deployments this approach has changed: reduce the installer specific knowledge and leverage OpenShift and +OpenStack service specific knowledge whenever possible.

+
+
+

To this effect OpenStack services will have sensible defaults for OpenShift +deployments and human operators will provide configuration snippets to provide +necessary configuration, such as cinder backend configuration, or to override +the defaults.

+
+
+

This shortens the distance between a service specific configuration file (such +as cinder.conf) and what the human operator provides in the manifests.

+
+
+

These configuration snippets are passed to the operators in the different +customServiceConfig sections available in the manifests, and then they are +layered in the services available in the following levels. To illustrate this, +if you were to set a configuration at the top Cinder level (spec: cinder: +template:) then it would be applied to all the cinder services; for example to +enable debug in all the cinder services you would do:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    template:
+      customServiceConfig: |
+        [DEFAULT]
+        debug = True
+< . . . >
+
+
+
+

If you only want to set it for one of the cinder services, for example the +scheduler, then you use the cinderScheduler section instead:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    template:
+      cinderScheduler:
+        customServiceConfig: |
+          [DEFAULT]
+          debug = True
+< . . . >
+
+
+
+

In OpenShift it is not recommended to store sensitive information like the +credentials to the cinder storage array in the CRs, so most OpenStack operators +have a mechanism to use OpenShift’s Secrets for sensitive configuration +parameters of the services and then use then by reference in the +customServiceConfigSecrets section which is analogous to the +customServiceConfig.

+
+
+

The contents of the Secret references passed in the +customServiceConfigSecrets will have the same format as customServiceConfig: +a snippet with the section/s and configuration options.

+
+
+

When there are sensitive information in the service configuration then it +becomes a matter of personal preference whether to store all the configuration +in the Secret or only the sensitive parts. However, if you split the +configuration between Secret and customServiceConfig you still need the +section header (eg: [DEFAULT]) to be present in both places.

+
+
+

Attention should be paid to each service’s adoption process as they may have +some particularities regarding their configuration.

+
+
+
+

Configuration tooling

+
+

In order to help users to handle the configuration for the TripleO and OpenStack +services the tool: https://github.com/openstack-k8s-operators/os-diff has been +develop to compare the configuration files between the TripleO deployment and +the next gen cloud. +Make sure Golang is installed and configured on your env:

+
+
+
+
git clone https://github.com/openstack-k8s-operators/os-diff
+pushd os-diff
+make build
+
+
+
+

Then configure ansible.cfg and ssh-config file according to your environment:

+
+
+
+
Host *
+    IdentitiesOnly yes
+
+Host virthost
+    Hostname virthost
+    IdentityFile ~/.ssh/id_rsa
+    User root
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+
+Host standalone
+    Hostname standalone
+    IdentityFile ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa
+    User root
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+Host crc
+    Hostname crc
+    IdentityFile ~/.ssh/id_rsa
+    User stack
+    StrictHostKeyChecking no
+    UserKnownHostsFile=/dev/null
+
+
+
+

And test your connection:

+
+
+
+
ssh -F ssh.config standalone
+
+
+
+
+

Node roles

+
+

In Director deployments you had 4 different standard roles for the nodes: +Controller, Compute, Ceph Storage, Swift Storage, but in podified +OpenStack you make a distinction based on where things are running, in +OpenShift or external to it.

+
+
+

When adopting a Director OpenStack your Compute nodes will directly become +external nodes, so there should not be much additional planning needed there.

+
+
+

In many deployments being adopted the Controller nodes will require some +thought because you have many OpenShift nodes where the controller services +could run, and you have to decide which ones you want to use, how you are going to use them, and make sure those nodes are ready to run the services.

+
+
+

In most deployments running OpenStack services on master nodes can have a +seriously adverse impact on the OpenShift cluster, so it is recommended that you place OpenStack services on non master nodes.

+
+
+

By default OpenStack Operators deploy OpenStack services on any worker node, but +that is not necessarily what’s best for all deployments, and there may be even +services that won’t even work deployed like that.

+
+
+

When planing a deployment it’s good to remember that not all the services on an +OpenStack deployments are the same as they have very different requirements.

+
+
+

Looking at the Cinder component you can clearly see different requirements for +its services: the cinder-scheduler is a very light service with low +memory, disk, network, and CPU usage; cinder-api service has a higher network +usage due to resource listing requests; the cinder-volume service will have a +high disk and network usage since many of its operations are in the data path +(offline volume migration, create volume from image, etc.), and then you have +the cinder-backup service which has high memory, network, and CPU (to compress +data) requirements.

+
+
+

The Glance and Swift components are in the data path, as well as RabbitMQ and Galera services.

+
+
+

Given these requirements it may be preferable not to let these services wander +all over your OpenShift worker nodes with the possibility of impacting other +workloads, or maybe you don’t mind the light services wandering around but you +want to pin down the heavy ones to a set of infrastructure nodes.

+
+
+

There are also hardware restrictions to take into consideration, because if you +are using a Fibre Channel (FC) Cinder backend you need the cinder-volume, +cinder-backup, and maybe even the glance (if it’s using Cinder as a backend) +services to run on a OpenShift host that has an HBA.

+
+
+

The OpenStack Operators allow a great deal of flexibility on where to run the +OpenStack services, as you can use node labels to define which OpenShift nodes +are eligible to run the different OpenStack services. Refer to the About node +selector to learn more about using labels to define +placement of the OpenStack services.

+
+
+
+

Storage

+
+

When looking into the storage in an OpenStack deployment you can differentiate +2 different kinds, the storage requirements of the services themselves and the +storage used for the OpenStack users that the services will manage.

+
+
+

These requirements may drive your OpenShift node selection, as mentioned above, +and may require you to do some preparations on the OpenShift nodes before +you can deploy the services.

+
+
+
Cinder requirements
+
+

The Cinder service has both local storage used by the service and OpenStack user +requirements.

+
+
+

Local storage is used for example when downloading a glance image for the create +volume from image operation, which can become considerable when having +concurrent operations and not using cinder volume cache.

+
+
+

In the Operator deployed OpenStack, there is a way to configure the +location of the conversion directory to be an NFS share (using the extra +volumes feature), something that needed to be done manually before.

+
+
+

Even if it’s an adoption and it may seem that there’s nothing to consider +regarding the Cinder backends, because you are using the same ones that you are +using in your current deployment, you should still evaluate it, because it may not be so straightforward.

+
+
+

First you need to check the transport protocol the Cinder backends are using: +RBD, iSCSI, FC, NFS, NVMe-oF, etc.

+
+
+

Once you know all the transport protocols that you are using, you can make +sure that you are taking them into consideration when placing the Cinder services +(as mentioned above in the Node Roles section) and the right storage transport +related binaries are running on the OpenShift nodes.

+
+
+

Detailed information about the specifics for each storage transport protocol can +be found in the Adopting the Block Storage service.

+
+
+
+
+
+

About node selector

+
+

There are a variety of reasons why you might want to restrict the nodes where +OpenStack services can be placed:

+
+
+
    +
  • +

    Hardware requirements: System memory, Disk space, Cores, HBAs

    +
  • +
  • +

    Limit the impact of the OpenStack services on other OpenShift workloads.

    +
  • +
  • +

    Avoid collocating OpenStack services.

    +
  • +
+
+
+

The mechanism provided by the OpenStack operators to achieve this is through the +use of labels.

+
+
+

You either label the OpenShift nodes or use existing labels, and then use those labels in the OpenStack manifests in the +nodeSelector field.

+
+
+

The nodeSelector field in the OpenStack manifests follows the standard +OpenShift nodeSelector field, please refer to the OpenShift documentation on +the matter +additional information.

+
+
+

This field is present at all the different levels of the OpenStack manifests:

+
+
+
    +
  • +

    Deployment: The OpenStackControlPlane object.

    +
  • +
  • +

    Component: For example the cinder element in the OpenStackControlPlane.

    +
  • +
  • +

    Service: For example the cinderVolume element within the cinder element +in the OpenStackControlPlane.

    +
  • +
+
+
+

This allows a fine grained control of the placement of the OpenStack services +with minimal repetition.

+
+
+

Values of the nodeSelector are propagated to the next levels unless they are +overwritten. This means that a nodeSelector value at the deployment level will +affect all the OpenStack services.

+
+
+

For example, you can add label type: openstack to any 3 OpenShift nodes:

+
+
+
+
$ oc label nodes worker0 type=openstack
+$ oc label nodes worker1 type=openstack
+$ oc label nodes worker2 type=openstack
+
+
+
+

And then in our OpenStackControlPlane you can use the label to place all the +services in those 3 nodes:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  nodeSelector:
+    type: openstack
+< . . . >
+
+
+
+

You can use the selector for specific services. For example, you might want to place your cinder volume and backup services on certain nodes if you are using FC and only have HBAs on a subset of +nodes. The following example assumes that you have the label fc_card: true:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  cinder:
+    template:
+      cinderVolumes:
+          pure_fc:
+            nodeSelector:
+              fc_card: true
+< . . . >
+          lvm-iscsi:
+            nodeSelector:
+              fc_card: true
+< . . . >
+      cinderBackup:
+          nodeSelector:
+            fc_card: true
+< . . . >
+
+
+
+

The Cinder operator does not currently have the possibility of defining +the nodeSelector in cinderVolumes, so you need to specify it on each of the +backends.

+
+
+

It’s possible to leverage labels added by the node feature discovery +operator +to place OpenStack services.

+
+
+

MachineConfig

+
+

Some services require you to have services or kernel modules running on the hosts +where they run, for example iscsid or multipathd daemons, or the +nvme-fabrics kernel module.

+
+
+

For those cases you use MachineConfig manifests, and if you are restricting +the nodes that you are placing the OpenStack services on using the nodeSelector then +you also want to limit where the MachineConfig is applied.

+
+
+

To define where the MachineConfig can be applied, you need to use a +MachineConfigPool that links the MachineConfig to the nodes.

+
+
+

For example to be able to limit MachineConfig to the 3 OpenShift nodes that you +marked with the type: openstack label, you create the +MachineConfigPool like this:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfigPool
+metadata:
+  name: openstack
+spec:
+  machineConfigSelector:
+    matchLabels:
+      machineconfiguration.openshift.io/role: openstack
+  nodeSelector:
+    matchLabels:
+      type: openstack
+
+
+
+

And then you use it in the MachineConfig:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: openstack
+< . . . >
+
+
+ +
+

WARNING: Applying a MachineConfig to an OpenShift node will make the node +reboot.

+
+
+
+
+

Deploying backend services

+
+

The following instructions create OpenStackControlPlane CR with basic +backend services deployed, and all the OpenStack services disabled. +This will be the foundation of the podified control plane.

+
+
+

In subsequent steps, you import the original databases and then add +podified OpenStack control plane services.

+
+
+

Prerequisites

+
+
    +
  • +

    The cloud that you want to adopt is up and running, and it is on the +OpenStack Wallaby release.

    +
  • +
  • +

    A VM instance named test is running on the source cloud and its +floating IP is set into FIP env var. You can use a helper script to create that test VM.

    +
  • +
  • +

    The openstack-operator is deployed, but OpenStackControlPlane is +not deployed.

    +
    +

    For developer/CI environments, the openstack operator can be deployed +by running make openstack inside +install_yamls +repo.

    +
    +
    +

    For production environments, the deployment method will likely be +different.

    +
    +
  • +
  • +

    There are free PVs available to be claimed (for MariaDB and RabbitMQ).

    +
    +

    For developer/CI environments driven by install_yamls, make sure +you’ve run make crc_storage.

    +
    +
  • +
+
+
+
+

Variables

+
+
    +
  • +

    Set the desired admin password for the podified deployment. This can +be the original deployment’s admin password or something else.

    +
    +
    +
    ADMIN_PASSWORD=SomePassword
    +
    +
    +
    +

    To use the existing OpenStack deployment password:

    +
    +
    +
    +
    ADMIN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AdminPassword:' | awk -F ': ' '{ print $2; }')
    +
    +
    +
  • +
  • +

    Set service password variables to match the original deployment. +Database passwords can differ in podified environment, but +synchronizing the service account passwords is a required step.

    +
    +

    E.g. in developer environments with TripleO Standalone, the +passwords can be extracted like this:

    +
    +
    +
    +
    AODH_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AodhPassword:' | awk -F ': ' '{ print $2; }')
    +CEILOMETER_METERING_SECRET=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerMeteringSecret:' | awk -F ': ' '{ print $2; }')
    +CEILOMETER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerPassword:' | awk -F ': ' '{ print $2; }')
    +CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')
    +GLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')
    +HEAT_AUTH_ENCRYPTION_KEY=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatAuthEncryptionKey:' | awk -F ': ' '{ print $2; }')
    +HEAT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatPassword:' | awk -F ': ' '{ print $2; }')
    +IRONIC_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' IronicPassword:' | awk -F ': ' '{ print $2; }')
    +MANILA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' ManilaPassword:' | awk -F ': ' '{ print $2; }')
    +NEUTRON_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NeutronPassword:' | awk -F ': ' '{ print $2; }')
    +NOVA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NovaPassword:' | awk -F ': ' '{ print $2; }')
    +OCTAVIA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' OctaviaPassword:' | awk -F ': ' '{ print $2; }')
    +PLACEMENT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' PlacementPassword:' | awk -F ': ' '{ print $2; }')
    +
    +
    +
  • +
+
+
+
+

Pre-checks

+ +
+
+

Procedure - backend services deployment

+
+
    +
  • +

    Make sure you are using the OpenShift namespace where you want the +podified control plane deployed:

    +
    +
    +
    oc project openstack
    +
    +
    +
  • +
  • +

    Create OSP secret.

    +
    +

    The procedure for this will vary, but in developer/CI environments +you use install_yamls:

    +
    +
    +
    +
    # in install_yamls
    +make input
    +
    +
    +
  • +
  • +

    If the $ADMIN_PASSWORD is different than the already set password +in osp-secret, amend the AdminPassword key in the osp-secret +correspondingly:

    +
    +
    +
    oc set data secret/osp-secret "AdminPassword=$ADMIN_PASSWORD"
    +
    +
    +
  • +
  • +

    Set service account passwords in osp-secret to match the service +account passwords from the original deployment:

    +
    +
    +
    oc set data secret/osp-secret "AodhPassword=$AODH_PASSWORD"
    +oc set data secret/osp-secret "CeilometerMeteringSecret=$CEILOMETER_METERING_SECRET"
    +oc set data secret/osp-secret "CeilometerPassword=$CEILOMETER_PASSWORD"
    +oc set data secret/osp-secret "CinderPassword=$CINDER_PASSWORD"
    +oc set data secret/osp-secret "GlancePassword=$GLANCE_PASSWORD"
    +oc set data secret/osp-secret "HeatAuthEncryptionKey=$HEAT_AUTH_ENCRYPTION_KEY"
    +oc set data secret/osp-secret "HeatPassword=$HEAT_PASSWORD"
    +oc set data secret/osp-secret "IronicPassword=$IRONIC_PASSWORD"
    +oc set data secret/osp-secret "ManilaPassword=$MANILA_PASSWORD"
    +oc set data secret/osp-secret "NeutronPassword=$NEUTRON_PASSWORD"
    +oc set data secret/osp-secret "NovaPassword=$NOVA_PASSWORD"
    +oc set data secret/osp-secret "OctaviaPassword=$OCTAVIA_PASSWORD"
    +oc set data secret/osp-secret "PlacementPassword=$PLACEMENT_PASSWORD"
    +
    +
    +
  • +
  • +

    Deploy OpenStackControlPlane. Make sure to only enable DNS, +MariaDB, Memcached, and RabbitMQ services. All other services must +be disabled.

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: core.openstack.org/v1beta1
    +kind: OpenStackControlPlane
    +metadata:
    +  name: openstack
    +spec:
    +  secret: osp-secret
    +  storageClass: local-storage
    +
    +  cinder:
    +    enabled: false
    +    template:
    +      cinderAPI: {}
    +      cinderScheduler: {}
    +      cinderBackup: {}
    +      cinderVolumes: {}
    +
    +  dns:
    +    template:
    +      override:
    +        service:
    +          metadata:
    +            annotations:
    +              metallb.universe.tf/address-pool: ctlplane
    +              metallb.universe.tf/allow-shared-ip: ctlplane
    +              metallb.universe.tf/loadBalancerIPs: 192.168.122.80
    +          spec:
    +            type: LoadBalancer
    +      options:
    +      - key: server
    +        values:
    +        - 192.168.122.1
    +      replicas: 1
    +
    +  glance:
    +    enabled: false
    +    template:
    +      glanceAPIs: {}
    +
    +  horizon:
    +    enabled: false
    +    template: {}
    +
    +  ironic:
    +    enabled: false
    +    template:
    +      ironicConductors: []
    +
    +  keystone:
    +    enabled: false
    +    template: {}
    +
    +  manila:
    +    enabled: false
    +    template:
    +      manilaAPI: {}
    +      manilaScheduler: {}
    +      manilaShares: {}
    +
    +  mariadb:
    +    enabled: false
    +    templates: {}
    +
    +  galera:
    +    enabled: true
    +    templates:
    +      openstack:
    +        secret: osp-secret
    +        replicas: 1
    +        storageRequest: 500M
    +      openstack-cell1:
    +        secret: osp-secret
    +        replicas: 1
    +        storageRequest: 500M
    +
    +  memcached:
    +    enabled: true
    +    templates:
    +      memcached:
    +        replicas: 1
    +
    +  neutron:
    +    enabled: false
    +    template: {}
    +
    +  nova:
    +    enabled: false
    +    template: {}
    +
    +  ovn:
    +    enabled: false
    +    template:
    +      ovnDBCluster:
    +        ovndbcluster-nb:
    +          dbType: NB
    +          storageRequest: 10G
    +          networkAttachment: internalapi
    +        ovndbcluster-sb:
    +          dbType: SB
    +          storageRequest: 10G
    +          networkAttachment: internalapi
    +      ovnNorthd:
    +        networkAttachment: internalapi
    +        replicas: 1
    +      ovnController:
    +        networkAttachment: tenant
    +
    +  placement:
    +    enabled: false
    +    template: {}
    +
    +  rabbitmq:
    +    templates:
    +      rabbitmq:
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.85
    +            spec:
    +              type: LoadBalancer
    +      rabbitmq-cell1:
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.86
    +            spec:
    +              type: LoadBalancer
    +
    +  ceilometer:
    +    enabled: false
    +    template: {}
    +
    +  autoscaling:
    +    enabled: false
    +    template: {}
    +EOF
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check that MariaDB is running.

    +
    +
    +
    oc get pod openstack-galera-0 -o jsonpath='{.status.phase}{"\n"}'
    +oc get pod openstack-cell1-galera-0 -o jsonpath='{.status.phase}{"\n"}'
    +
    +
    +
  • +
+
+
+
+
+

Stopping OpenStack services

+
+

Before you start the adoption, you must stop the OpenStack services.

+
+
+

This is an important step to avoid inconsistencies in the data migrated for the data-plane adoption procedure caused by resource changes after the DB has been +copied to the new deployment.

+
+
+

Some services are easy to stop because they only perform short asynchronous operations, but other services are a bit more complex to gracefully stop because they perform synchronous or long running operations that you might want to complete instead of aborting them.

+
+
+

Since gracefully stopping all services is non-trivial and beyond the scope of this guide, the following procedure uses the force method and presents +recommendations on how to check some things in the services.

+
+
+

Note that you should not stop the infrastructure management services yet, such as database, RabbitMQ, and HAProxy Load Balancer, nor should you stop the +Nova compute service and containerized modular libvirt daemons.

+
+
+

Variables

+
+

Define the shell variables used in the following steps. The values are illustrative and refer to a single node standalone director deployment. Use values that are correct for your environment:

+
+
+
+
CONTROLLER1_SSH="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100"
+CONTROLLER2_SSH=""
+CONTROLLER3_SSH=""
+
+
+
+

This example uses ssh variables with ssh commands instead of ansible to create instructions that are independent on where they are running. However, you can use ansible commands to achieve the same result if you are in the right host. For example, to stop a service:

+
+
+
+
. stackrc ansible -i $(which tripleo-ansible-inventory) Controller -m shell -a "sudo systemctl stop tripleo_horizon.service" -b
+
+
+
+
+

Pre-checks

+
+

You can stop OpenStack services at any moment, but you might leave your environment in an undesired state. However, you should confirm that there are no long running operations that require other services.

+
+
+

Ensure that there are no ongoing instance live migrations, volume migrations (online or offline), volume creation, backup restore, attaching, detaching, +etc.

+
+
+
+
openstack server list --all-projects -c ID -c Status |grep -E '\| .+ing \|'
+openstack volume list --all-projects -c ID -c Status |grep -E '\| .+ing \|'| grep -vi error
+openstack volume backup list --all-projects -c ID -c Status |grep -E '\| .+ing \|' | grep -vi error
+openstack share list --all-projects -c ID -c Status |grep -E '\| .+ing \|'| grep -vi error
+openstack image list -c ID -c Status |grep -E '\| .+ing \|'
+
+
+
+

Also collect the services topology specific configuration before stopping services required to gather it live. You will need it to compare it with the post-adoption values later on. For more information, see Pulling the OpenStack configuration.

+
+
+
+

Stopping control plane services

+
+

You can stop OpenStack services at any moment, but you might leave your environment in an undesired state. You should confirm that there are no ongoing operations.

+
+
+

1- Connect to all the controller nodes. +2- Stop the control plane services. +3- Verify the control plane services are stopped.

+
+
+

The cinder-backup service on OSP 17.1 could be running as Active-Passive under pacemaker or as Active-Active, so you must check how it is running and stop it.

+
+
+

These steps can be automated with a simple script that relies on the previously defined environmental variables and function:

+
+
+
+
# Update the services list to be stopped
+ServicesToStop=("tripleo_horizon.service"
+                "tripleo_keystone.service"
+                "tripleo_cinder_api.service"
+                "tripleo_cinder_api_cron.service"
+                "tripleo_cinder_scheduler.service"
+                "tripleo_cinder_backup.service"
+                "tripleo_glance_api.service"
+                "tripleo_manila_api.service"
+                "tripleo_manila_api_cron.service"
+                "tripleo_manila_scheduler.service"
+                "tripleo_neutron_api.service"
+                "tripleo_nova_api.service"
+                "tripleo_placement_api.service"
+                "tripleo_nova_api_cron.service"
+                "tripleo_nova_api.service"
+                "tripleo_nova_conductor.service"
+                "tripleo_nova_metadata.service"
+                "tripleo_nova_scheduler.service"
+                "tripleo_nova_vnc_proxy.service"
+                "tripleo_aodh_api.service"
+                "tripleo_aodh_api_cron.service"
+                "tripleo_aodh_evaluator.service"
+                "tripleo_aodh_listener.service"
+                "tripleo_aodh_notifier.service"
+                "tripleo_ceilometer_agent_central.service"
+                "tripleo_ceilometer_agent_compute.service"
+                "tripleo_ceilometer_agent_ipmi.service"
+                "tripleo_ceilometer_agent_notification.service"
+                "tripleo_ovn_cluster_northd.service")
+
+PacemakerResourcesToStop=("openstack-cinder-volume"
+                          "openstack-cinder-backup"
+                          "openstack-manila-share")
+
+echo "Stopping systemd OpenStack services"
+for service in ${ServicesToStop[*]}; do
+    for i in {1..3}; do
+        SSH_CMD=CONTROLLER${i}_SSH
+        if [ ! -z "${!SSH_CMD}" ]; then
+            echo "Stopping the $service in controller $i"
+            if ${!SSH_CMD} sudo systemctl is-active $service; then
+                ${!SSH_CMD} sudo systemctl stop $service
+            fi
+        fi
+    done
+done
+
+echo "Checking systemd OpenStack services"
+for service in ${ServicesToStop[*]}; do
+    for i in {1..3}; do
+        SSH_CMD=CONTROLLER${i}_SSH
+        if [ ! -z "${!SSH_CMD}" ]; then
+            echo "Checking status of $service in controller $i"
+            if ! ${!SSH_CMD} systemctl show $service | grep ActiveState=inactive >/dev/null; then
+                echo "ERROR: Service $service still running on controller $i"
+            fi
+        fi
+    done
+done
+
+echo "Stopping pacemaker OpenStack services"
+for i in {1..3}; do
+    SSH_CMD=CONTROLLER${i}_SSH
+    if [ ! -z "${!SSH_CMD}" ]; then
+        echo "Using controller $i to run pacemaker commands"
+        for resource in ${PacemakerResourcesToStop[*]}; do
+            if ${!SSH_CMD} sudo pcs resource config $resource; then
+                ${!SSH_CMD} sudo pcs resource disable $resource
+            fi
+        done
+        break
+    fi
+done
+
+
+
+
+
+

Pulling the OpenStack configuration

+
+

Before starting the adoption workflow, pull the configuration from the OpenStack services and TripleO on your file system to back up the configuration files. You can then use the files later, during the configuration of the adopted services, and for the record to compare and make sure nothing has been missed or misconfigured.

+
+
+

Make sure you have pull the os-diff repository and configure according to your environment: +link:planning.md#Configuration tooling[Configure os-diff]

+
+
+

Pull configuration from a TripleO deployment

+
+

Before starting you need to update your ssh parameters according to your environment in the os-diff.cfg. +Os-diff will use those parameters to connect to your Director node, query and download the configuration files:

+
+
+
+
ssh_cmd=ssh -F ssh.config standalone
+container_engine=podman
+connection=ssh
+remote_config_path=/tmp/tripleo
+
+
+
+

Make sure the ssh command you provide in ssh_cmd parameter is correct and with key authentication.

+
+
+

Once it’s done, you can start to pull configuration from your OpenStack servies.

+
+
+

All the services are describes in a yaml file:

+
+ +
+

You can enable or disable the services you want then you can start to pull the configuration on your local file system. +Example with default keystone:

+
+
+
+
# service name and file location
+services:
+  # Service name
+  keystone:
+    # Bool to enable/disable a service (not implemented yet)
+    enable: true
+    # Pod name, in both OCP and podman context.
+    # It could be strict match or will only just grep the podman_name
+    # and work with all the pods which matched with pod_name.
+    # To enable/disable use strict_pod_name_match: true/false
+    podman_name: keystone
+    pod_name: keystone
+    container_name: keystone-api
+    # pod options
+    # strict match for getting pod id in TripleO and podman context
+    strict_pod_name_match: false
+    # Path of the config files you want to analyze.
+    # It could be whatever path you want:
+    # /etc/<service_name> or /etc or /usr/share/<something> or even /
+    # @TODO: need to implement loop over path to support multiple paths such as:
+    # - /etc
+    # - /usr/share
+    path:
+      - /etc/
+      - /etc/keystone
+      - /etc/keystone/keystone.conf
+      - /etc/keystone/logging.conf
+
+
+
+

Duplicate the keystone example to each OpenStack services you want.

+
+
+

Then, you can pull the configuration with this command:

+
+
+
+
pushd os-diff
+./os-diff pull
+
+
+
+

The configuration will be pulled and stored by default:

+
+
+
+
/tmp/tripleo/
+
+
+
+

Once it’s done, you should have into your local path a directory per services such as:

+
+
+
+
  ▾ tmp/
+    ▾ tripleo/
+      ▾ glance/
+      ▾ keystone/
+
+
+
+
+

Get services topology specific configuration

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
CONTROLLER_SSH="ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone"
+MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified
+SOURCE_MARIADB_IP=192.168.122.100
+SOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')
+
+
+
+

Export shell variables for the following outputs to compare it with post-adoption values later on:

+
+
+
    +
  • +

    Test connection to the original DB:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_DATABASES=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysql -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo "$PULL_OPENSTACK_CONFIGURATION_DATABASES"
    +
    +
    +
    +

    Note the nova, nova_api, nova_cell0 databases residing in the same DB host.

    +
    +
  • +
  • +

    Run mysqlcheck on the original DB to look for things that are not OK:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysqlcheck --all-databases -h $SOURCE_MARIADB_IP -u root "-p$SOURCE_DB_ROOT_PASSWORD" | grep -v OK)
    +echo "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK"
    +
    +
    +
  • +
  • +

    Get Nova cells mappings from database:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" nova_api -e \
    +    'select uuid,name,transport_url,database_connection,disabled from cell_mappings;')
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS"
    +
    +
    +
  • +
  • +

    Get the host names of the registered Nova compute services:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES=$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" nova_api -e \
    +    "select host from nova.services where services.binary='nova-compute';")
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES"
    +
    +
    +
  • +
  • +

    Get the list of mapped Nova cells:

    +
    +
    +
    export PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS=$($CONTROLLER_SSH sudo podman exec -it nova_api nova-manage cell_v2 list_cells)
    +echo "$PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS"
    +
    +
    +
  • +
+
+
+

After the source control plane services shutdown, if either of the exported +values lost, it could be no longer evaluated again. Preserving the exported +values in an env file should protect you from such a situation:

+
+
+
    +
  • +

    Store exported variables for future use

    +
    +
    +
    cat > ~/.source_cloud_exported_variables << EOF
    +PULL_OPENSTACK_CONFIGURATION_DATABASES="$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysql -rsh $SOURCE_MARIADB_IP -uroot -p$SOURCE_DB_ROOT_PASSWORD -e 'SHOW databases;')"
    +PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK="$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysqlcheck --all-databases -h $SOURCE_MARIADB_IP -u root -p$SOURCE_DB_ROOT_PASSWORD | grep -v OK)"
    +PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS="$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh $SOURCE_MARIADB_IP -uroot -p$SOURCE_DB_ROOT_PASSWORD nova_api -e \
    +    'select uuid,name,transport_url,database_connection,disabled from cell_mappings;')"
    +PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES="$(podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE mysql \
    +    -rsh $SOURCE_MARIADB_IP -uroot -p$SOURCE_DB_ROOT_PASSWORD nova_api -e \
    +    "select host from nova.services where services.binary='nova-compute';")"
    +PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS="$($CONTROLLER_SSH sudo podman exec -it nova_api nova-manage cell_v2 list_cells)"
    +EOF
    +chmod 0600 ~/.source_cloud_exported_variables
    +
    +
    +
  • +
+
+
+
+
+

Migrating databases to MariaDB instances

+
+

This document describes how to move the databases from the original +OpenStack deployment to the MariaDB instances in the OpenShift +cluster.

+
+
+
+
+

NOTE This example scenario describes a simple single-cell setup. Real +multi-stack topology recommended for production use results in different +cells DBs layout, and should be using different naming schemes (not covered +here this time).

+
+
+
+
+

Prerequisites

+
+
    +
  • +

    Make sure the previous Adoption steps have been performed successfully.

    +
    +
      +
    • +

      The OpenStackControlPlane resource must be already created at this point.

      +
    • +
    • +

      Podified MariaDB and RabbitMQ are running. No other podified +control plane services are running.

      +
    • +
    • +

      Required services specific topology. For more information, see Pulling the OpenStack configuration.

      +
    • +
    • +

      OpenStack services have been stopped. For more information, see Stopping OpenStack services.

      +
    • +
    • +

      There must be network routability between:

      +
      +
        +
      • +

        The adoption host and the original MariaDB.

        +
      • +
      • +

        The adoption host and the podified MariaDB.

        +
      • +
      • +

        Note that this routability requirement might change in the +future. For example, you might require routability from the original MariaDB to +podified MariaDB.

        +
      • +
      +
      +
    • +
    +
    +
  • +
  • +

    Podman package is installed

    +
  • +
  • +

    CONTROLLER1_SSH, CONTROLLER2_SSH, and CONTROLLER3_SSH are configured.

    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
PODIFIED_MARIADB_IP=$(oc get svc --selector "mariadb/name=openstack" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_CELL1_MARIADB_IP=$(oc get svc --selector "mariadb/name=openstack-cell1" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)
+
+# The CHARACTER_SET and collation should match the source DB
+# if the do not then it will break foreign key relationships
+# for any tables that are created in the future as part of db sync
+CHARACTER_SET=utf8
+COLLATION=utf8_general_ci
+
+MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified
+# Replace with your environment's MariaDB IP:
+SOURCE_MARIADB_IP=192.168.122.100
+SOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')
+
+
+
+
+

Pre-checks

+
+
    +
  • +

    Get the count of not-OK source databases:

    +
    +
    +
    podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \
    +    mysql -h "$SOURCE_MARIADB_IP" -uroot "-p$SOURCE_DB_ROOT_PASSWORD" -e 'SHOW databases;'
    +
    +
    +
  • +
  • +

    Run mysqlcheck on the original DB to look for things that are not OK:

    +
    +
    +
    . ~/.source_cloud_exported_variables
    +test -z "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK"  || [ "$PULL_OPENSTACK_CONFIGURATION_MYSQLCHECK_NOK" = " " ]
    +
    +
    +
  • +
  • +

    Test connection to podified DBs (show databases):

    +
    +
    +
    oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \
    +    mysql -rsh "$PODIFIED_MARIADB_IP" -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;'
    +oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \
    +    mysql -rsh "$PODIFIED_CELL1_MARIADB_IP" -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;'
    +
    +
    +
  • +
+
+
+
+

Procedure - data copy

+
+
+
+

NOTE: You need to transition Nova services imported later on into a +superconductor architecture. For that, delete the old service records in +cells DBs, starting from the cell1. New records will be registered with +different hostnames provided by the Nova service operator. All Nova +services, except the compute agent, have no internal state, and its service +records can be safely deleted. You also need to rename the former default cell +to cell1.

+
+
+
+
+
    +
  • +

    Create a temporary folder to store DB dumps and make sure it’s the +working directory for the following steps:

    +
    +
    +
    mkdir ~/adoption-db
    +cd ~/adoption-db
    +
    +
    +
  • +
  • +

    Create a dump of the original databases:

    +
    +
    +
    podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $MARIADB_IMAGE bash <<EOF
    +
    +# Note Filter the information and performance schema tables
    +# Gnocchi is no longer used as a metric store, skip dumping gnocchi database as well
    +mysql -h ${SOURCE_MARIADB_IP} -u root "-p${SOURCE_DB_ROOT_PASSWORD}" -N -e 'show databases' | grep -E -v 'schema|mysql|gnocchi' | while read dbname; do
    +    echo "Dumping \${dbname}"
    +    mysqldump -h $SOURCE_MARIADB_IP -uroot "-p$SOURCE_DB_ROOT_PASSWORD" \
    +        --single-transaction --complete-insert --skip-lock-tables --lock-tables=0 \
    +        "\${dbname}" > "\${dbname}".sql
    +done
    +
    +EOF
    +
    +
    +
  • +
  • +

    Restore the databases from .sql files into the podified MariaDB:

    +
    +
    +
    # db schemas to rename on import
    +declare -A db_name_map
    +db_name_map["nova"]="nova_cell1"
    +db_name_map["ovs_neutron"]="neutron"
    +
    +# db servers to import into
    +declare -A db_server_map
    +db_server_map["default"]=${PODIFIED_MARIADB_IP}
    +db_server_map["nova_cell1"]=${PODIFIED_CELL1_MARIADB_IP}
    +
    +# db server root password map
    +declare -A db_server_password_map
    +db_server_password_map["default"]=${PODIFIED_DB_ROOT_PASSWORD}
    +db_server_password_map["nova_cell1"]=${PODIFIED_DB_ROOT_PASSWORD}
    +
    +all_db_files=$(ls *.sql)
    +for db_file in ${all_db_files}; do
    +    db_name=$(echo ${db_file} | awk -F'.' '{ print $1; }')
    +    if [[ -v "db_name_map[${db_name}]" ]]; then
    +        echo "renaming ${db_name} to ${db_name_map[${db_name}]}"
    +        db_name=${db_name_map[${db_name}]}
    +    fi
    +    db_server=${db_server_map["default"]}
    +    if [[ -v "db_server_map[${db_name}]" ]]; then
    +        db_server=${db_server_map[${db_name}]}
    +    fi
    +    db_password=${db_server_password_map["default"]}
    +    if [[ -v "db_server_password_map[${db_name}]" ]]; then
    +        db_password=${db_server_password_map[${db_name}]}
    +    fi
    +    echo "creating ${db_name} in ${db_server}"
    +    container_name=$(echo "mariadb-client-${db_name}-create" | sed 's/_/-/g')
    +    oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \
    +        mysql -h "${db_server}" -uroot "-p${db_password}" << EOF
    +CREATE DATABASE IF NOT EXISTS ${db_name} DEFAULT CHARACTER SET ${CHARACTER_SET} DEFAULT COLLATE ${COLLATION};
    +EOF
    +    echo "importing ${db_name} into ${db_server}"
    +    container_name=$(echo "mariadb-client-${db_name}-restore" | sed 's/_/-/g')
    +    oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \
    +        mysql -h "${db_server}" -uroot "-p${db_password}" "${db_name}" < "${db_file}"
    +done
    +oc exec -it openstack-galera-0 -c galera -- mysql --user=root --password=${db_server_password_map["default"]} -e \
    +    "update nova_api.cell_mappings set name='cell1' where name='default';"
    +oc exec -it openstack-cell1-galera-0 -c galera -- mysql --user=root --password=${db_server_password_map["default"]} -e \
    +    "delete from nova_cell1.services where host not like '%nova-cell1-%' and services.binary != 'nova-compute';"
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+

Compare the following outputs with the topology specific configuration. +For more information, see Pulling the OpenStack configuration.

+
+
+
    +
  • +

    Check that the databases were imported correctly:

    +
    +
    +
    . ~/.source_cloud_exported_variables
    +
    +# use 'oc exec' and 'mysql -rs' to maintain formatting
    +dbs=$(oc exec openstack-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo $dbs | grep -Eq '\bkeystone\b'
    +
    +# ensure neutron db is renamed from ovs_neutron
    +echo $dbs | grep -Eq '\bneutron\b'
    +echo $PULL_OPENSTACK_CONFIGURATION_DATABASES | grep -Eq '\bovs_neutron\b'
    +
    +# ensure nova cell1 db is extracted to a separate db server and renamed from nova to nova_cell1
    +c1dbs=$(oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" -e 'SHOW databases;')
    +echo $c1dbs | grep -Eq '\bnova_cell1\b'
    +
    +# ensure default cell renamed to cell1, and the cell UUIDs retained intact
    +novadb_mapped_cells=$(oc exec openstack-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" \
    +  nova_api -e 'select uuid,name,transport_url,database_connection,disabled from cell_mappings;')
    +uuidf='\S{8,}-\S{4,}-\S{4,}-\S{4,}-\S{12,}'
    +left_behind=$(comm -23 \
    +  <(echo $PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS | grep -oE " $uuidf \S+") \
    +  <(echo $novadb_mapped_cells | tr -s "| " " " | grep -oE " $uuidf \S+"))
    +changed=$(comm -13 \
    +  <(echo $PULL_OPENSTACK_CONFIGURATION_NOVADB_MAPPED_CELLS | grep -oE " $uuidf \S+") \
    +  <(echo $novadb_mapped_cells | tr -s "| " " " | grep -oE " $uuidf \S+"))
    +test $(grep -Ec ' \S+$' <<<$left_behind) -eq 1
    +default=$(grep -E ' default$' <<<$left_behind)
    +test $(grep -Ec ' \S+$' <<<$changed) -eq 1
    +grep -qE " $(awk '{print $1}' <<<$default) cell1$" <<<$changed
    +
    +# ensure the registered Nova compute service name has not changed
    +novadb_svc_records=$(oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot "-p$PODIFIED_DB_ROOT_PASSWORD" \
    +  nova_cell1 -e "select host from services where services.binary='nova-compute' order by host asc;")
    +diff -Z <(echo $novadb_svc_records) <(echo $PULL_OPENSTACK_CONFIGURATION_NOVA_COMPUTE_HOSTNAMES)
    +
    +
    +
  • +
  • +

    During the pre/post checks the pod mariadb-client might have returned a pod security warning +related to the restricted:latest security context constraint. This is due to default security +context constraints and will not prevent pod creation by the admission controller. You’ll see a +warning for the short-lived pod but it will not interfere with functionality. +For more information, see About pod security standards and warnings.

    +
  • +
+
+
+
+
+

Migrating OVN data

+
+

This document describes how to move OVN northbound and southbound databases +from the original OpenStack deployment to ovsdb-server instances running in the +OpenShift cluster.

+
+
+

Rationale

+
+

While it may be argued that the podified Neutron ML2/OVN driver and OVN northd +service will reconstruct the databases on startup, the reconstruction may be +time consuming on large existing clusters. The procedure below allows to speed +up data migration and avoid unnecessary data plane disruptions due to +incomplete OpenFlow table contents.

+
+
+
+

Prerequisites

+
+
    +
  • +

    Make sure the previous Adoption steps have been performed successfully.

    +
    +
      +
    • +

      The OpenStackControlPlane resource must be already created at this point.

      +
    • +
    • +

      NetworkAttachmentDefinition CRDs for the original cluster are already +defined. Specifically, openstack/internalapi network is defined.

      +
    • +
    • +

      Podified MariaDB and RabbitMQ may already run. Neutron and OVN are not +running yet.

      +
    • +
    • +

      Original OVN is older or equal to the podified version.

      +
    • +
    • +

      Original Neutron Server and OVN northd services are stopped.

      +
    • +
    • +

      There must be network routability between:

      +
      +
        +
      • +

        The adoption host and the original OVN.

        +
      • +
      • +

        The adoption host and the podified OVN.

        +
      • +
      +
      +
    • +
    +
    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
STORAGE_CLASS_NAME=crc-csi-hostpath-provisioner
+OVSDB_IMAGE=quay.io/podified-antelope-centos9/openstack-ovn-base:current-podified
+SOURCE_OVSDB_IP=172.17.1.49
+
+
+
+

The real value of the SOURCE_OVSDB_IP can be get from the puppet generated configs:

+
+
+
+
grep -rI 'ovn_[ns]b_conn' /var/lib/config-data/puppet-generated/
+
+
+
+
+

Procedure

+
+
    +
  • +

    Prepare the OVN DBs copy dir and the adoption helper pod (pick the storage requests to fit the OVN databases sizes)

    +
  • +
+
+
+
+
oc apply -f - <<EOF
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: ovn-data
+spec:
+  storageClassName: $STORAGE_CLASS_NAME
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 10Gi
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: ovn-copy-data
+  annotations:
+    openshift.io/scc: anyuid
+  labels:
+    app: adoption
+spec:
+  containers:
+  - image: $OVSDB_IMAGE
+    command: [ "sh", "-c", "sleep infinity"]
+    name: adoption
+    volumeMounts:
+    - mountPath: /backup
+      name: ovn-data
+  securityContext:
+    allowPrivilegeEscalation: false
+    capabilities:
+      drop: ALL
+    runAsNonRoot: true
+    seccompProfile:
+      type: RuntimeDefault
+  volumes:
+  - name: ovn-data
+    persistentVolumeClaim:
+      claimName: ovn-data
+EOF
+
+
+
+
    +
  • +

    Wait for the pod to come up

    +
  • +
+
+
+
+
oc wait --for=condition=Ready pod/ovn-copy-data --timeout=30s
+
+
+
+
    +
  • +

    Backup OVN databases.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client backup tcp:$SOURCE_OVSDB_IP:6641 > /backup/ovs-nb.db"
+oc exec ovn-copy-data -- bash -c "ovsdb-client backup tcp:$SOURCE_OVSDB_IP:6642 > /backup/ovs-sb.db"
+
+
+
+
    +
  • +

    Start podified OVN database services prior to import.

    +
  • +
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  ovn:
+    enabled: true
+    template:
+      ovnDBCluster:
+        ovndbcluster-nb:
+          dbType: NB
+          storageRequest: 10G
+          networkAttachment: internalapi
+        ovndbcluster-sb:
+          dbType: SB
+          storageRequest: 10G
+          networkAttachment: internalapi
+'
+
+
+
+
    +
  • +

    Wait for the OVN DB pods reaching the running phase.

    +
  • +
+
+
+
+
oc wait --for=jsonpath='{.status.phase}'=Running pod --selector=service=ovsdbserver-nb
+oc wait --for=jsonpath='{.status.phase}'=Running pod --selector=service=ovsdbserver-sb
+
+
+
+
    +
  • +

    Fetch podified OVN IP addresses on the clusterIP service network.

    +
  • +
+
+
+
+
PODIFIED_OVSDB_NB_IP=$(oc get svc --selector "statefulset.kubernetes.io/pod-name=ovsdbserver-nb-0" -ojsonpath='{.items[0].spec.clusterIP}')
+PODIFIED_OVSDB_SB_IP=$(oc get svc --selector "statefulset.kubernetes.io/pod-name=ovsdbserver-sb-0" -ojsonpath='{.items[0].spec.clusterIP}')
+
+
+
+
    +
  • +

    Upgrade database schema for the backup files.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client get-schema tcp:$PODIFIED_OVSDB_NB_IP:6641 > /backup/ovs-nb.ovsschema && ovsdb-tool convert /backup/ovs-nb.db /backup/ovs-nb.ovsschema"
+oc exec ovn-copy-data -- bash -c "ovsdb-client get-schema tcp:$PODIFIED_OVSDB_SB_IP:6642 > /backup/ovs-sb.ovsschema && ovsdb-tool convert /backup/ovs-sb.db /backup/ovs-sb.ovsschema"
+
+
+
+
    +
  • +

    Restore database backup to podified OVN database servers.

    +
  • +
+
+
+
+
oc exec ovn-copy-data -- bash -c "ovsdb-client restore tcp:$PODIFIED_OVSDB_NB_IP:6641 < /backup/ovs-nb.db"
+oc exec ovn-copy-data -- bash -c "ovsdb-client restore tcp:$PODIFIED_OVSDB_SB_IP:6642 < /backup/ovs-sb.db"
+
+
+
+
    +
  • +

    Check that podified OVN databases contain objects from backup, e.g.:

    +
  • +
+
+
+
+
oc exec -it ovsdbserver-nb-0 -- ovn-nbctl show
+oc exec -it ovsdbserver-sb-0 -- ovn-sbctl list Chassis
+
+
+
+
    +
  • +

    Finally, you can start ovn-northd service that will keep OVN northbound and southbound databases in sync.

    +
  • +
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  ovn:
+    enabled: true
+    template:
+      ovnNorthd:
+        networkAttachment: internalapi
+'
+
+
+
+
    +
  • +

    Delete the ovn-data pod and persistent volume claim with OVN databases backup (consider making a snapshot of it, before deleting)

    +
  • +
+
+
+
+
oc delete pod ovn-copy-data
+oc delete pvc ovn-data
+
+
+
+
+
+

Adopting the Identity service

+
+

Prerequisites

+
+ +
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Pre-checks

+ +
+
+

Procedure - Keystone adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Keystone:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  keystone:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      override:
    +        service:
    +          internal:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +      databaseInstance: openstack
    +      secret: osp-secret
    +'
    +
    +
    +
  • +
  • +

    Create alias to use openstack command in the adopted deployment:

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +
    +
  • +
  • +

    Clean up old services and endpoints that still point to the old +control plane (everything except Keystone service and endpoints):

    +
    +
    +
    openstack endpoint list | grep keystone | awk '/admin/{ print $2; }' | xargs ${BASH_ALIASES[openstack]} endpoint delete || true
    +
    +for service in aodh cinderv3 glance manila manilav2 neutron nova placement swift; do
    +  openstack service list | awk "/ $service /{ print \$2; }" | xargs ${BASH_ALIASES[openstack]} service delete || true
    +done
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Keystone endpoints are defined and pointing to the podified +FQDNs:

    +
    +
    +
    openstack endpoint list | grep keystone
    +
    +
    +
  • +
+
+
+
+
+

Adopting the OpenStack Networking service

+
+

Adopting Neutron means that an existing OpenStackControlPlane CR, where Neutron +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

When the procedure is over, the expectation is to see the NeutronAPI service +up and running: the Keystone endpoints should be updated and the same backend +of the source Cloud will be available. If the conditions above are met, the +adoption is considered concluded.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone and Migrating OVN data +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Neutron adoption

+
+

As already done for Keystone, the Neutron Adoption follows the same pattern.

+
+
+

Patch OpenStackControlPlane to deploy Neutron:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  neutron:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      override:
+        service:
+          internal:
+            metadata:
+              annotations:
+                metallb.universe.tf/address-pool: internalapi
+                metallb.universe.tf/allow-shared-ip: internalapi
+                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+            spec:
+              type: LoadBalancer
+      databaseInstance: openstack
+      secret: osp-secret
+      networkAttachments:
+      - internalapi
+'
+
+
+
+
+

Post-checks

+
+
Inspect the resulting neutron pods
+
+
+
NEUTRON_API_POD=`oc get pods -l service=neutron | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $NEUTRON_API_POD -c neutron-api -- cat /etc/neutron/neutron.conf
+
+
+
+
+
Check that Neutron API service is registered in Keystone
+
+
+
openstack service list | grep network
+
+
+
+
+
openstack endpoint list | grep network
+
+| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | neutron      | network      | True    | public    | http://neutron-public-openstack.apps-crc.testing  |
+| b943243e596847a9a317c8ce1800fa98 | regionOne | neutron      | network      | True    | internal  | http://neutron-internal.openstack.svc:9696        |
+| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | neutron      | network      | True    | admin     | http://192.168.122.99:9696                        |
+
+
+
+
+
Create sample resources
+
+

You can test whether the user can create networks, subnets, ports, or routers.

+
+
+
+
openstack network create net
+openstack subnet create --network net --subnet-range 10.0.0.0/24 subnet
+openstack router create router
+
+
+
+
+
+
+

Configuring a Ceph backend

+
+

If the original deployment uses a Ceph storage backend for any service +(e.g. Glance, Cinder, Nova, Manila), the same backend must be used in the +adopted deployment and CRs must be configured accordingly.

+
+
+

Prerequisites

+
+
    +
  • +

    The OpenStackControlPlane CR must already exist.

    +
  • +
+
+
+
+

Variables

+
+

Define the shell variables used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
CEPH_SSH="ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100"
+CEPH_KEY=$($CEPH_SSH "cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0")
+CEPH_CONF=$($CEPH_SSH "cat /etc/ceph/ceph.conf | base64 -w 0")
+
+
+
+
+

Modify capabilities of the "openstack" user to accommodate Manila

+
+

On TripleO environments, the CephFS driver in Manila is configured to use +its own keypair. For convenience, modify the openstack user so that you +can use it across all OpenStack services.

+
+
+

Using the same user across the services serves two purposes:

+
+
+
    +
  • +

    The capabilities of the user required to interact with the Manila service +became far simpler and hence, more became more secure with RHOSP 18.

    +
  • +
  • +

    It is simpler to create a common ceph secret (keyring and ceph config +file) and propagate the secret to all services that need it.

    +
  • +
+
+
+
+
$CEPH_SSH cephadm shell
+ceph auth caps client.openstack \
+  mgr 'allow *' \
+  mon 'allow r, profile rbd' \
+  osd 'profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=images, allow rw pool manila_data'
+
+
+
+
+

Ceph backend configuration

+
+

Create the ceph-conf-files secret, containing Ceph configuration:

+
+
+
+
oc apply -f - <<EOF
+apiVersion: v1
+data:
+  ceph.client.openstack.keyring: $CEPH_KEY
+  ceph.conf: $CEPH_CONF
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+type: Opaque
+EOF
+
+
+
+

The content of the file should look something like this:

+
+
+
+
+
+
---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: ceph-conf-files
+  namespace: openstack
+stringData:
+  ceph.client.openstack.keyring: |
+    [client.openstack]
+        key = <secret key>
+        caps mgr = "allow *"
+        caps mon = "profile rbd"
+        caps osd = "profile rbd pool=images"
+  ceph.conf: |
+    [global]
+    fsid = 7a1719e8-9c59-49e2-ae2b-d7eb08c695d4
+    mon_host = 10.1.1.2,10.1.1.3,10.1.1.4
+
+
+
+
+
+

Configure extraMounts within the OpenStackControlPlane CR:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  extraMounts:
+    - name: v1
+      region: r1
+      extraVol:
+        - propagation:
+          - CinderVolume
+          - CinderBackup
+          - GlanceAPI
+          - ManilaShare
+          extraVolType: Ceph
+          volumes:
+          - name: ceph
+            projected:
+              sources:
+              - secret:
+                  name: ceph-conf-files
+          mounts:
+          - name: ceph
+            mountPath: "/etc/ceph"
+            readOnly: true
+'
+
+
+
+
+

Getting Ceph FSID

+
+

Configuring some OpenStack services to use Ceph backend may require +the FSID value. You can fetch the value from the config like so:

+
+
+
+
CEPH_FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data."ceph.conf"' | base64 -d | grep fsid | sed -e 's/fsid = //')
+
+
+
+
+
+

Adopting the Image service

+
+

Adopting Glance means that an existing OpenStackControlPlane CR, where Glance +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

When the procedure is over, the expectation is to see the GlanceAPI service +up and running: the Keystone endpoints should be updated and the same backend +of the source Cloud will be available. If the conditions above are met, the +adoption is considered concluded.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side;

    +
  4. +
  5. +

    (optional) an internal/external Ceph cluster is reachable by both crc and +TripleO

    +
  6. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Glance adoption

+
+

As already done for Keystone, the Glance Adoption follows the same pattern.

+
+
+
Using local storage backend
+
+

When Glance should be deployed with local storage backend (not Ceph), +patch OpenStackControlPlane to deploy Glance:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  glance:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          type: single
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+'
+
+
+
+
+
Using NFS backend
+
+

When the source Cloud based on TripleO uses Glance with a NFS backend, before +patching the OpenStackControlPlane to deploy Glance it is important to validate +a few networking related prerequisites. +In the source cloud, verify the NFS parameters used by the overcloud to configure +the Glance backend. +In particular, find among the TripleO heat templates the following variables that are usually an override of the default content provided by +/usr/share/openstack-tripleo-heat-templates/environments/storage/glance-nfs.yaml[glance-nfs.yaml].:

+
+
+
+

GlanceBackend: file

+
+
+

GlanceNfsEnabled: true

+
+
+

GlanceNfsShare: 192.168.24.1:/var/nfs

+
+
+
+

In the example above, as the first variable shows, unlike Cinder, Glance has no +notion of NFS backend: the File driver is used in this scenario, and behind the +scenes, the filesystem_store_datadir which usually points to /var/lib/glance/images/ +is mapped to the export value provided by the GlanceNfsShare variable. +If the GlanceNfsShare is not exported through a network that is supposed to be +propagated to the adopted OpenStack control plane, an extra action is required +by the human administrator, who must stop the nfs-server and remap the export +to the storage network. This action usually happens when the Glance service is +stopped in the source controller nodes. +In the podified control plane, as per the +(network isolation diagram, +Glance is attached to the Storage network, propagated via the associated +NetworkAttachmentsDefinition CR, and the resulting Pods have already the right +permissions to handle the Image Service traffic through this network. +In a deployed OpenStack control plane, you can verify that the network mapping +matches with what has been deployed in the TripleO based environment by checking +both the NodeNetworkConfigPolicy (nncp) and the NetworkAttachmentDefinition +(net-attach-def) with the following commands:

+
+
+
+
$ oc get nncp
+NAME                        STATUS      REASON
+enp6s0-crc-8cf2w-master-0   Available   SuccessfullyConfigured
+
+$ oc get net-attach-def
+NAME
+ctlplane
+internalapi
+storage
+tenant
+
+$ oc get ipaddresspool -n metallb-system
+NAME          AUTO ASSIGN   AVOID BUGGY IPS   ADDRESSES
+ctlplane      true          false             ["192.168.122.80-192.168.122.90"]
+internalapi   true          false             ["172.17.0.80-172.17.0.90"]
+storage       true          false             ["172.18.0.80-172.18.0.90"]
+tenant        true          false             ["172.19.0.80-172.19.0.90"]
+
+
+
+

The above represents an example of the output that should be checked in the +openshift environment to make sure there are no issues with the propagated +networks.

+
+
+

The following steps assume that:

+
+
+
    +
  1. +

    the Storage network has been propagated to the openstack control plane

    +
  2. +
  3. +

    Glance is able to reach the Storage network and connect to the nfs-server +through the port 2049.

    +
  4. +
+
+
+

If the above conditions are met, it is possible to adopt the Glance service +and create a new default GlanceAPI instance connected with the existing +NFS share.

+
+
+
+
cat << EOF > glance_nfs_patch.yaml
+
+spec:
+  extraMounts:
+  - extraVol:
+    - extraVolType: Nfs
+      mounts:
+      - mountPath: /var/lib/glance/images
+        name: nfs
+      propagation:
+      - Glance
+      volumes:
+      - name: nfs
+        nfs:
+          path: /var/nfs
+          server: 172.17.3.20
+    name: r1
+    region: r1
+  glance:
+    enabled: true
+    template:
+      databaseInstance: openstack
+      customServiceConfig: |
+         [DEFAULT]
+         enabled_backends = default_backend:file
+         [glance_store]
+         default_backend = default_backend
+         [default_backend]
+         filesystem_store_datadir = /var/lib/glance/images/
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          type: single
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+EOF
+
+
+
+

Note:

+
+
+

Replace in glance_nfs_patch.yaml the nfs/server ip address with the IP used +to reach the nfs-server and make sure the nfs/path points to the exported +path in the nfs-server.

+
+
+

Patch OpenStackControlPlane to deploy Glance with a NFS backend:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_nfs_patch.yaml
+
+
+
+

When GlanceAPI is active, you can see a single API instance:

+
+
+
+
$ oc get pods -l service=glance
+NAME                      READY   STATUS    RESTARTS
+glance-default-single-0   3/3     Running   0
+
+
+
+

and the description of the pod must report:

+
+
+
+
Mounts:
+...
+  nfs:
+    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
+    Server:    {{ server ip address }}
+    Path:      {{ nfs export path }}
+    ReadOnly:  false
+...
+
+
+
+

It is also possible to double check the mountpoint by running the following:

+
+
+
+
oc rsh -c glance-api glance-default-single-0
+
+sh-5.1# mount
+...
+...
+{{ ip address }}:/var/nfs on /var/lib/glance/images type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.18.0.5,local_lock=none,addr=172.18.0.5)
+...
+...
+
+
+
+

You can run an openstack image create command and double check, on the NFS +node, the uuid has been created in the exported directory.

+
+
+

For example:

+
+
+
+
$ oc rsh openstackclient
+$ openstack image list
+
+sh-5.1$  curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img
+...
+...
+
+sh-5.1$ openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img cirros
+...
+...
+
+sh-5.1$ openstack image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 634482ca-4002-4a6d-b1d5-64502ad02630 | cirros | active |
++--------------------------------------+--------+--------+
+
+
+
+

On the nfs-server node, the same uuid is in the exported /var/nfs:

+
+
+
+
$ ls /var/nfs/
+634482ca-4002-4a6d-b1d5-64502ad02630
+
+
+
+
+
Using Ceph storage backend
+
+

If a Ceph backend is used, the customServiceConfig parameter should +be used to inject the right configuration to the GlanceAPI instance.

+
+
+

Make sure the Ceph-related secret (ceph-conf-files) was created in +the openstack namespace and that the extraMounts property of the +OpenStackControlPlane CR has been configured properly. These tasks +are described in an earlier Adoption step Configuring a Ceph backend.

+
+
+
+
cat << EOF > glance_patch.yaml
+spec:
+  glance:
+    enabled: true
+    template:
+      databaseInstance: openstack
+      customServiceConfig: |
+        [DEFAULT]
+        enabled_backends=default_backend:rbd
+        [glance_store]
+        default_backend=default_backend
+        [default_backend]
+        rbd_store_ceph_conf=/etc/ceph/ceph.conf
+        rbd_store_user=openstack
+        rbd_store_pool=images
+        store_description=Ceph glance store backend.
+      storageClass: "local-storage"
+      storageRequest: 10G
+      glanceAPIs:
+        default:
+          replicas: 1
+          override:
+            service:
+              internal:
+                metadata:
+                  annotations:
+                    metallb.universe.tf/address-pool: internalapi
+                    metallb.universe.tf/allow-shared-ip: internalapi
+                    metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+          networkAttachments:
+          - storage
+EOF
+
+
+
+

If you have previously backup your OpenStack services configuration file from the old environment: +Pulling the OpenStack configuration you can use os-diff to compare and make sure the configuration is correct.

+
+
+
+
pushd os-diff
+./os-diff cdiff --service glance -c /tmp/collect_tripleo_configs/glance/etc/glance/glance-api.conf -o glance_patch.yaml
+
+
+
+

This will produce the difference between both ini configuration files.

+
+
+

Patch OpenStackControlPlane to deploy Glance with Ceph backend:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_patch.yaml
+
+
+
+
+
+

Post-checks

+
+
Test the glance service from the OpenStack CLI
+
+

You can compare and make sure the configuration has been correctly applied to the glance pods by running

+
+
+
+
./os-diff cdiff --service glance -c /etc/glance/glance.conf.d/02-config.conf  -o glance_patch.yaml --frompod -p glance-api
+
+
+
+

If no line appear, then the configuration is correctly done.

+
+
+

Inspect the resulting glance pods:

+
+
+
+
GLANCE_POD=`oc get pod |grep glance-default-external-0 | cut -f 1 -d' '`
+oc exec -t $GLANCE_POD -c glance-api -- cat /etc/glance/glance.conf.d/02-config.conf
+
+[DEFAULT]
+enabled_backends=default_backend:rbd
+[glance_store]
+default_backend=default_backend
+[default_backend]
+rbd_store_ceph_conf=/etc/ceph/ceph.conf
+rbd_store_user=openstack
+rbd_store_pool=images
+store_description=Ceph glance store backend.
+
+oc exec -t $GLANCE_POD -c glance-api -- ls /etc/ceph
+ceph.client.openstack.keyring
+ceph.conf
+
+
+
+

Ceph secrets are properly mounted, at this point let’s move to the OpenStack +CLI and check the service is active and the endpoints are properly updated.

+
+
+
+
(openstack)$ service list | grep image
+
+| fc52dbffef36434d906eeb99adfc6186 | glance    | image        |
+
+(openstack)$ endpoint list | grep image
+
+| 569ed81064f84d4a91e0d2d807e4c1f1 | regionOne | glance       | image        | True    | internal  | http://glance-internal-openstack.apps-crc.testing   |
+| 5843fae70cba4e73b29d4aff3e8b616c | regionOne | glance       | image        | True    | public    | http://glance-public-openstack.apps-crc.testing     |
+| 709859219bc24ab9ac548eab74ad4dd5 | regionOne | glance       | image        | True    | admin     | http://glance-admin-openstack.apps-crc.testing      |
+
+
+
+

Check that the images that you previously listed in the source Cloud are available in the adopted service:

+
+
+
+
(openstack)$ image list
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |
++--------------------------------------+--------+--------+
+
+
+
+
+
Image upload
+
+

You can test that an image can be created on the adopted service.

+
+
+
+
(openstack)$ alias openstack="oc exec -t openstackclient -- openstack"
+(openstack)$ curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img
+    qemu-img convert -O raw /tmp/cirros-0.5.2-x86_64-disk.img /tmp/cirros-0.5.2-x86_64-disk.img.raw
+    openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img.raw cirros2
+    openstack image list
+  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
+                                 Dload  Upload   Total   Spent    Left  Speed
+100   273  100   273    0     0   1525      0 --:--:-- --:--:-- --:--:--  1533
+  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
+100 15.5M  100 15.5M    0     0  17.4M      0 --:--:-- --:--:-- --:--:-- 17.4M
+
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| Field            | Value                                                                                                                                      |
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| container_format | bare                                                                                                                                       |
+| created_at       | 2023-01-31T21:12:56Z                                                                                                                       |
+| disk_format      | raw                                                                                                                                        |
+| file             | /v2/images/46a3eac1-7224-40bc-9083-f2f0cd122ba4/file                                                                                       |
+| id               | 46a3eac1-7224-40bc-9083-f2f0cd122ba4                                                                                                       |
+| min_disk         | 0                                                                                                                                          |
+| min_ram          | 0                                                                                                                                          |
+| name             | cirros                                                                                                                                     |
+| owner            | 9f7e8fdc50f34b658cfaee9c48e5e12d                                                                                                           |
+| properties       | os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros', owner_specified.openstack.sha256='' |
+| protected        | False                                                                                                                                      |
+| schema           | /v2/schemas/image                                                                                                                          |
+| status           | queued                                                                                                                                     |
+| tags             |                                                                                                                                            |
+| updated_at       | 2023-01-31T21:12:56Z                                                                                                                       |
+| visibility       | shared                                                                                                                                     |
++------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+
++--------------------------------------+--------+--------+
+| ID                                   | Name   | Status |
++--------------------------------------+--------+--------+
+| 46a3eac1-7224-40bc-9083-f2f0cd122ba4 | cirros2| active |
+| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |
++--------------------------------------+--------+--------+
+
+
+(openstack)$ oc rsh ceph
+sh-4.4$ ceph -s
+r  cluster:
+    id:     432d9a34-9cee-4109-b705-0c59e8973983
+    health: HEALTH_OK
+
+  services:
+    mon: 1 daemons, quorum a (age 4h)
+    mgr: a(active, since 4h)
+    osd: 1 osds: 1 up (since 4h), 1 in (since 4h)
+
+  data:
+    pools:   5 pools, 160 pgs
+    objects: 46 objects, 224 MiB
+    usage:   247 MiB used, 6.8 GiB / 7.0 GiB avail
+    pgs:     160 active+clean
+
+sh-4.4$ rbd -p images ls
+46a3eac1-7224-40bc-9083-f2f0cd122ba4
+c3158cad-d50b-452f-bec1-f250562f5c1f
+
+
+
+
+
+
+

Adopting the Placement service

+
+

Prerequisites

+
+ +
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Procedure - Placement adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Placement:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  placement:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      databaseInstance: openstack
    +      secret: osp-secret
    +      override:
    +        service:
    +          internal:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +'
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Placement endpoints are defined and pointing to the +podified FQDNs and that Placement API responds.

    +
    +
    +
    alias openstack="oc exec -t openstackclient -- openstack"
    +
    +openstack endpoint list | grep placement
    +
    +
    +# Without OpenStack CLI placement plugin installed:
    +PLACEMENT_PUBLIC_URL=$(openstack endpoint list -c 'Service Name' -c 'Service Type' -c URL | grep placement | grep public | awk '{ print $6; }')
    +oc exec -t openstackclient -- curl "$PLACEMENT_PUBLIC_URL"
    +
    +# With OpenStack CLI placement plugin installed:
    +openstack resource class list
    +
    +
    +
  • +
+
+
+
+
+

Adopting the Compute service

+
+

NOTE This example scenario describes a simple single-cell setup. Real +multi-stack topology recommended for production use results in different +cells DBs layout, and should be using different naming schemes (not covered +here this time).

+
+
+

Prerequisites

+
+ +
+
+
+

Variables

+
+

Define the shell variables and aliases used in the steps below. The values are +just illustrative, use values that are correct for your environment:

+
+
+
+
alias openstack="oc exec -t openstackclient -- openstack"
+
+
+
+
+

Procedure - Nova adoption

+
+

NOTE: This procedure assumes that Nova Metadata is deployed on the top level and not on each cell level, so this example imports it the same way. If the source deployment has a per cell metadata deployment, adjust the given below patch as needed. Metadata service cannot be run in cell0.

+
+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Nova:

    +
    +
    +
    oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '
    +spec:
    +  nova:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      secret: osp-secret
    +      apiServiceTemplate:
    +        override:
    +          service:
    +            internal:
    +              metadata:
    +                annotations:
    +                  metallb.universe.tf/address-pool: internalapi
    +                  metallb.universe.tf/allow-shared-ip: internalapi
    +                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +              spec:
    +                type: LoadBalancer
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      metadataServiceTemplate:
    +        enabled: true # deploy single nova metadata on the top level
    +        override:
    +          service:
    +            metadata:
    +              annotations:
    +                metallb.universe.tf/address-pool: internalapi
    +                metallb.universe.tf/allow-shared-ip: internalapi
    +                metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +            spec:
    +              type: LoadBalancer
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      schedulerServiceTemplate:
    +        customServiceConfig: |
    +          [workarounds]
    +          disable_compute_service_check_for_ffu=true
    +      cellTemplates:
    +        cell0:
    +          conductorServiceTemplate:
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +        cell1:
    +          metadataServiceTemplate:
    +            enabled: false # enable here to run it in a cell instead
    +            override:
    +                service:
    +                  metadata:
    +                    annotations:
    +                      metallb.universe.tf/address-pool: internalapi
    +                      metallb.universe.tf/allow-shared-ip: internalapi
    +                      metallb.universe.tf/loadBalancerIPs: 172.17.0.80
    +                  spec:
    +                    type: LoadBalancer
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +          conductorServiceTemplate:
    +            customServiceConfig: |
    +              [workarounds]
    +              disable_compute_service_check_for_ffu=true
    +'
    +
    +
    +
  • +
  • +

    Wait for Nova control plane services' CRs to become ready:

    +
    +
    +
    oc wait --for condition=Ready --timeout=300s Nova/nova
    +
    +
    +
    +

    The local Conductor services will be started for each cell, while the superconductor runs in cell0. +Note that disable_compute_service_check_for_ffu is mandatory for all imported Nova services, until the external data plane is imported, and until Nova Compute services fast-forward upgraded. For more information, see Adopting EDPM.

    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check that Nova endpoints are defined and pointing to the +podified FQDNs and that Nova API responds.

    +
    +
    +
    openstack endpoint list | grep nova
    +openstack server list
    +
    +
    +
  • +
+
+
+

Compare the following outputs with the topology specific configuration in Pulling the OpenStack configuration.

+
+
+
    +
  • +

    Query the superconductor for cell1 existance and compare it to pre-adoption values:

    +
    +
    +
    . ~/.source_cloud_exported_variables
    +echo $PULL_OPENSTACK_CONFIGURATION_NOVAMANAGE_CELL_MAPPINGS
    +oc rsh nova-cell0-conductor-0 nova-manage cell_v2 list_cells | grep -F '| cell1 |'
    +
    +
    +
    +

    The expected changes to happen:

    +
    +
    +
      +
    • +

      cell1’s nova DB and user name become nova_cell1.

      +
    • +
    • +

      Default cell is renamed to cell1 (in a multi-cell setup, it should become indexed as the last cell instead).

      +
    • +
    • +

      RabbitMQ transport URL no longer uses guest.

      +
    • +
    +
    +
  • +
+
+
+

NOTE At this point, Nova control plane services have yet taken control over +existing Nova compute workloads. That would become possible to verify only after +EDPM adoption is completed. For more information, see Adopting EDPM.

+
+
+
+
+

Adopting the Block Storage service

+
+

Adopting a director deployed Cinder service into OpenStack may require some +thought because it’s not always a simple process.

+
+
+

Usually the adoption process entails:

+
+
+
    +
  • +

    Checking existing limitations.

    +
  • +
  • +

    Considering the placement of the cinder services.

    +
  • +
  • +

    Preparing the OpenShift nodes where volume and backup services will run.

    +
  • +
  • +

    Crafting the manifest based on the existing cinder.conf file.

    +
  • +
  • +

    Deploying Cinder.

    +
  • +
  • +

    Validating the new deployment.

    +
  • +
+
+
+

This guide provides necessary knowledge to complete these steps in most +situations, but it still requires knowledge on how OpenStack services work and +the structure of a Cinder configuration file.

+
+
+

Limitations

+
+

There are currently some limitations that are worth highlighting; some are +related to this guideline while some to the operator:

+
+
+
    +
  • +

    There is no global nodeSelector for all cinder volumes, so it needs to be +specified per backend. This may change in the future.

    +
  • +
  • +

    There is no global customServiceConfig or customServiceConfigSecrets for +all cinder volumes, so it needs to be specified per backend. This may change in +the future.

    +
  • +
  • +

    Adoption of LVM backends, where the volume data is stored in the compute +nodes, is not currently being documented in this process. It may get documented +in the future.

    +
  • +
  • +

    Support for Cinder backends that require kernel modules not included in RHEL +has not been tested in Operator deployed OpenStack so it is not documented in +this guide.

    +
  • +
  • +

    Adoption of DCN/Edge deployment is not currently described in this guide.

    +
  • +
+
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, cinder service must have been +stopped and the service databases must already be imported into the podified +MariaDB.

    +
  • +
  • +

    Storage network has been properly configured on the OpenShift cluster.

    +
  • +
+
+
+
+

Variables

+
+

No new environmental variables need to be defined, though you use the +CONTROLLER1_SSH that was defined in a previous step for the pre-checks.

+
+
+
+

Pre-checks

+
+

You need the contents of cinder.conf file. Download the file so that you can access it locally:

+
+
+
+
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf > cinder.conf
+
+
+
+
+

Prepare OpenShift

+
+

As explained in Planning the new deployment, before deploying OpenStack in OpenShift, you must ensure that the networks are ready, that you have decided the node selection, and also make sure any necessary changes to the OpenShift nodes have been made. For Cinder volume and backup services all these 3 must be carefully considered.

+
+
+
Node Selection
+
+

You might need, or want, to restrict the OpenShift nodes where cinder volume and +backup services can run.

+
+
+

The best example of when you need to do node selection for a specific cinder +service is when you deploy Cinder with the LVM driver. In that scenario, the +LVM data where the volumes are stored only exists in a specific host, so you +need to pin the cinder-volume service to that specific OpenShift node. Running +the service on any other OpenShift node would not work. Since nodeSelector +only works on labels, you cannot use the OpenShift host node name to restrict +the LVM backend and you need to identify it using a unique label, an existing label, or new label:

+
+
+
+
$ oc label nodes worker0 lvm=cinder-volumes
+
+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  secret: osp-secret
+  storageClass: local-storage
+  cinder:
+    enabled: true
+    template:
+      cinderVolumes:
+        lvm-iscsi:
+          nodeSelector:
+            lvm: cinder-volumes
+< . . . >
+
+
+
+

As mentioned in the About node selector, an example where you need to use labels is when using FC storage and you do not have HBA cards in all your OpenShift nodes. In this scenario you need to restrict all the cinder volume backends (not only the FC one) as well as the backup services.

+
+
+

Depending on the cinder backends, their configuration, and the usage of Cinder, +you can have network intensive cinder volume services with lots of I/O as well as +cinder backup services that are not only network intensive but also memory and +CPU intensive. This may be a concern for the OpenShift human operators, and +they may want to use the nodeSelector to prevent these service from +interfering with their other OpenShift workloads. For more information about node selection, see About node selector.

+
+
+

When selecting the nodes where cinder volume is going to run please remember +that cinder-volume may also use local storage when downloading a glance image +for the create volume from image operation, and it can require a considerable +amount of space when having concurrent operations and not using cinder volume +cache.

+
+
+

If you do not have nodes with enough local disk space for the temporary images, you can use a remote NFS location for the images. You had to manually set this up in Director deployments, but with operators, you can do it +automatically using the extra volumes feature ()extraMounts.

+
+
+
+
Transport protocols
+
+

Due to the specifics of the storage transport protocols some changes may be +required on the OpenShift side, and although this is something that must be +documented by the Vendor here wer are going to provide some generic +instructions that can serve as a guide for the different transport protocols.

+
+
+

Check the backend sections in your cinder.conf file that are listed in the +enabled_backends configuration option to figure out the transport storage +protocol used by the backend.

+
+
+

Depending on the backend, you can find the transport protocol:

+
+
+
    +
  • +

    Looking at the volume_driver configuration option, as it may contain the +protocol itself: RBD, iSCSI, FC…​

    +
  • +
  • +

    Looking at the target_protocol configuration option

    +
  • +
+
+
+ + + + + +
+
Warning
+
+Any time a MachineConfig is used to make changes to OpenShift +nodes the node will reboot!! Act accordingly. +
+
+
+
NFS
+
+

There is nothing to do for NFS. OpenShift can connect to NFS backends without +any additional changes.

+
+
+
+
RBD/Ceph
+
+

There is nothing to do for RBD/Ceph in terms of preparing the nodes, OpenShift +can connect to Ceph backends without any additional changes. Credentials and +configuration files will need to be provided to the services though.

+
+
+
+
iSCSI
+
+

Connecting to iSCSI volumes requires that the iSCSI initiator is running on the +OpenShift hosts where volume and backup services are going to run, because +the Linux Open iSCSI initiator does not currently support network namespaces, so +you must only run 1 instance of the service for the normal OpenShift usage, plus +the OpenShift CSI plugins, plus the OpenStack services.

+
+
+

If you are not already running iscsid on the OpenShift nodes, then you need +to apply a MachineConfig similar to this one:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-enable-iscsid
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    systemd:
+      units:
+      - enabled: true
+        name: iscsid.service
+
+
+
+

If you are using labels to restrict the nodes where cinder services are running you need to use a MachineConfigPool as described in +the About node selector to limit the effects of the +MachineConfig to only the nodes where your services may run.

+
+
+

If you are using a toy single node deployment to test the process, you might need to replace worker with master in the MachineConfig.

+
+
+
+
FC
+
+

There is nothing to do for FC volumes to work, but the cinder volume and cinder +backup services need to run in an OpenShift host that has HBAs, so if there +are nodes that do not have HBAs then you need to use labels to restrict where +these services can run, as mentioned in the [node selection section] +(#node-selection).

+
+
+

This also means that for virtualized OpenShift clusters using FC you need to +expose the host’s HBAs inside the VM.

+
+
+
+
NVMe-oF
+
+

Connecting to NVMe-oF volumes requires that the nvme kernel modules are loaded +on the OpenShift hosts.

+
+
+

If you are not already loading the nvme-fabrics module on the OpenShift nodes +where volume and backup services are going to run then you need to apply a +MachineConfig similar to this one:

+
+
+
+
apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-load-nvme-fabrics
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    storage:
+      files:
+        - path: /etc/modules-load.d/nvme_fabrics.conf
+          overwrite: false
+          # Mode must be decimal, this is 0644
+          mode: 420
+          user:
+            name: root
+          group:
+            name: root
+          contents:
+            # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.
+            # This is the rfc2397 text/plain string format
+            source: data:,nvme-fabrics
+
+
+
+

If you are using labels to restrict the nodes where cinder +services are running, you need to use a MachineConfigPool as described in +the About node selector to limit the effects of the +MachineConfig to only the nodes where your services may run.

+
+
+

If you are using a toy single node deployment to test the process you migt need to replace worker with master in the MachineConfig.

+
+
+

You are only loading the nvme-fabrics module because it takes care of loading +the transport specific modules (tcp, rdma, fc) as needed.

+
+
+

For production deployments using NVMe-oF volumes it is recommended that you use +multipathing. For NVMe-oF volumes OpenStack uses native multipathing, called +ANA.

+
+
+

Once the OpenShift nodes have rebooted and are loading the nvme-fabrics module +you can confirm that the Operating System is configured and supports ANA by +checking on the host:

+
+
+
+
cat /sys/module/nvme_core/parameters/multipath
+
+
+
+ + + + + +
+
Important
+
+ANA doesn’t use the Linux Multipathing Device Mapper, but the +*current OpenStack +code requires multipathd on compute nodes to be running for Nova to be able to +use multipathing, so please remember to follow the multipathing part for compute +nodes on the multipathing section. +
+
+
+
+
Multipathing
+
+

For iSCSI and FC protocols, using multipathing is recommended, which +has 4 parts:

+
+
+
    +
  • +

    Prepare the OpenShift hosts

    +
  • +
  • +

    Configure the Cinder services

    +
  • +
  • +

    Prepare the Nova computes

    +
  • +
  • +

    Configure the Nova service

    +
  • +
+
+
+

To prepare the OpenShift hosts, you need to ensure that the Linux Multipath +Device Mapper is configured and running on the OpenShift hosts, and you do +that using MachineConfig like this one:

+
+
+
+
# Includes the /etc/multipathd.conf contents and the systemd unit changes
+apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker
+    service: cinder
+  name: 99-master-cinder-enable-multipathd
+spec:
+  config:
+    ignition:
+      version: 3.2.0
+    storage:
+      files:
+        - path: /etc/multipath.conf
+          overwrite: false
+          # Mode must be decimal, this is 0600
+          mode: 384
+          user:
+            name: root
+          group:
+            name: root
+          contents:
+            # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.
+            # This is the rfc2397 text/plain string format
+            source: data:,defaults%20%7B%0A%20%20user_friendly_names%20no%0A%20%20recheck_wwid%20yes%0A%20%20skip_kpartx%20yes%0A%20%20find_multipaths%20yes%0A%7D%0A%0Ablacklist%20%7B%0A%7D
+    systemd:
+      units:
+      - enabled: true
+        name: multipathd.service
+
+
+
+

If you are using labels to restrict the nodes where cinder +services are running you need to use a MachineConfigPool as described in +the About node selector to limit the effects of the +MachineConfig to only the nodes where your services may run.

+
+
+

If you are using a toy single node deployment to test the process you might need to replace worker with master in the MachineConfig.

+
+
+

To configure the cinder services to use multipathing you need to enable the +use_multipath_for_image_xfer configuration option in all the backend sections +and in the [DEFAULT] section for the backup service, but in Podified +deployments you do not need to worry about it, because that’s the default. So as +long as you do not override it setting use_multipath_for_image_xfer = false then multipathing will work as long as the service is running on the OpenShift host.

+
+
+
+
+
+

Configurations

+
+

As described in Planning the new deployment, Cinder is configured using +configuration snippets instead of using obscure configuration parameters +defined by the installer.

+
+
+

The recommended way to deploy Cinder volume backends has changed to remove old +limitations, add flexibility, and improve operations in general.

+
+
+

When deploying with Director you used to run a single Cinder volume service with +all your backends (each backend would run on its own process), and even though +that way of deploying is still supported, it is not recommended. It is recommended to use a volume service per backend since it is a superior deployment model.

+
+
+

So for an LVM and a Ceph backend you would have 2 entries in cinderVolume and, +as mentioned in the limitations section, you cannot set global defaults for all +volume services, so you have to define it for each of them, like this:

+
+
+
+
apiVersion: core.openstack.org/v1beta1
+kind: OpenStackControlPlane
+metadata:
+  name: openstack
+spec:
+  cinder:
+    enabled: true
+    template:
+      cinderVolume:
+        lvm:
+          customServiceConfig: |
+            [DEFAULT]
+            debug = True
+            [lvm]
+< . . . >
+        ceph:
+          customServiceConfig: |
+            [DEFAULT]
+            debug = True
+            [ceph]
+< . . . >
+
+
+
+

Reminder that for volume backends that have sensitive information using Secret +and the customServiceConfigSecrets key is the recommended way to go.

+
+
+
+

Prepare the configuration

+
+

For adoption instead of using a whole deployment manifest you use a targeted +patch, like you did with other services, and in this patch you will enable the +different cinder services with their specific configurations.

+
+
+

WARNING: Check that all configuration options are still valid for the new +OpenStack version, since configuration options may have been deprecated, +removed, or added. This applies to both backend driver specific configuration +options and other generic options.

+
+
+

There are 2 ways to prepare a cinder configuration for adoption, tailor-making +it or doing it quick and dirty. There is no difference in how Cinder will +operate with both methods, though tailor-making it is recommended whenever possible.

+
+
+

The high level explanation of the tailor-made approach is:

+
+
+
    +
  1. +

    Determine what part of the configuration is generic for all the cinder +services and remove anything that would change when deployed in OpenShift, like +the connection in the [dabase] section, the transport_url and log_dir in +[DEFAULT], the whole [coordination] section. This configuration goes into +the customServiceConfig (or a Secret and then used in +customServiceConfigSecrets) at the cinder: template: level.

    +
  2. +
  3. +

    Determine if there’s any scheduler specific configuration and add it to the +customServiceConfig section in cinder: template: cinderScheduler.

    +
  4. +
  5. +

    Determine if there’s any API specific configuration and add it to the +customServiceConfig section in cinder: template: cinderAPI.

    +
  6. +
  7. +

    If you have cinder backup deployed, then you get the cinder backup relevant +configuration options and add them to customServiceConfig (or a Secret and +then used in customServiceConfigSecrets) at the cinder: template: +cinderBackup: level. You should remove the host configuration in the +[DEFAULT] section to facilitate supporting multiple replicas in the future.

    +
  8. +
  9. +

    Determine the individual volume backend configuration for each of the +drivers. The configuration will not only be the specific driver section, it +should also include the [backend_defaults] section and FC zoning sections is +they are being used, because the cinder operator doesn’t support a +customServiceConfig section global for all volume services. Each backend +would have its own section under cinder: template: cinderVolumes and the +configuration would go in customServiceConfig (or a Secret and then used in +customServiceConfigSecrets).

    +
  10. +
  11. +

    Check if any of the cinder volume drivers being used requires a custom vendor +image. If they do, find the location of the image in the vendor’s instruction +available in the w OpenStack Cinder ecosystem +page +and add it under the specific’s driver section using the containerImage key. +For example, if you had a Pure Storage array and the driver was already certified +for OSP18, then you would have something like this:

    +
    +
    +
    spec:
    +  cinder:
    +    enabled: true
    +    template:
    +      cinderVolume:
    +        pure:
    +          containerImage: registry.connect.redhat.com/purestorage/openstack-cinder-volume-pure-rhosp-18-0'
    +          customServiceConfigSecrets:
    +            - openstack-cinder-pure-cfg
    +< . . . >
    +
    +
    +
  12. +
  13. +

    External files: Cinder services sometimes use external files, for example for +a custom policy, or to store credentials, or SSL CA bundles to connect to a +storage array, and you need to make those files available to the right +containers. To achieve this, you use Secrets or ConfigMap to store the +information in OpenShift and then the extraMounts key. For example, for the +Ceph credentials stored in a Secret called ceph-conf-files you patch +the top level extraMounts in OpenstackControlPlane:

    +
    +
    +
    spec:
    +  extraMounts:
    +  - extraVol:
    +    - extraVolType: Ceph
    +      mounts:
    +      - mountPath: /etc/ceph
    +        name: ceph
    +        readOnly: true
    +      propagation:
    +      - CinderVolume
    +      - CinderBackup
    +      - Glance
    +      volumes:
    +      - name: ceph
    +        projected:
    +          sources:
    +          - secret:
    +              name: ceph-conf-files
    +
    +
    +
    +

    But for a service specific one, like the API policy, you do it directly +on the service itself. In this example, you include the cinder API +configuration that references the policy you are adding from a ConfigMap +called my-cinder-conf that has a key policy with the contents of the +policy:

    +
    +
    +
    +
    spec:
    +  cinder:
    +    enabled: true
    +    template:
    +      cinderAPI:
    +        customServiceConfig: |
    +           [oslo_policy]
    +           policy_file=/etc/cinder/api/policy.yaml
    +      extraMounts:
    +      - extraVol:
    +        - extraVolType: Ceph
    +          mounts:
    +          - mountPath: /etc/cinder/api
    +            name: policy
    +            readOnly: true
    +          propagation:
    +          - CinderAPI
    +          volumes:
    +          - name: policy
    +            projected:
    +              sources:
    +              - configMap:
    +                  name: my-cinder-conf
    +                  items:
    +                    - key: policy
    +                      path: policy.yaml
    +
    +
    +
  14. +
+
+
+

The quick and dirty process is more straightforward:

+
+
+
    +
  1. +

    Create an agnostic configuration file removing any specifics from the old +deployment’s cinder.conf file, like the connection in the [dabase] +section, the transport_url and log_dir in [DEFAULT], the whole +[coordination] section, etc..

    +
  2. +
  3. +

    Assuming the configuration has sensitive information, drop the modified +contents of the whole file into a Secret.

    +
  4. +
  5. +

    Reference this secret in all the services, creating a cinder volumes section +for each backend and just adding the respective enabled_backends option.

    +
  6. +
  7. +

    Add external files as mentioned in the last bullet of the tailor-made +configuration explanation.

    +
  8. +
+
+
+

Example of what the quick and dirty configuration patch would look like:

+
+
+
+
   spec:
+     cinder:
+       enabled: true
+       template:
+         cinderAPI:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderScheduler:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderBackup:
+           customServiceConfigSecrets:
+             - cinder-conf
+         cinderVolume:
+           lvm1:
+             customServiceConfig: |
+               [DEFAULT]
+               enabled_backends = lvm1
+             customServiceConfigSecrets:
+               - cinder-conf
+           lvm2:
+             customServiceConfig: |
+               [DEFAULT]
+               enabled_backends = lvm2
+             customServiceConfigSecrets:
+               - cinder-conf
+
+
+
+
Configuration generation helper tool
+
+

Creating the right Cinder configuration files to deploy using Operators may +sometimes be a complicated experience, especially the first times, so you have a +helper tool that can create a draft of the files from a cinder.conf file.

+
+
+

This tool is not meant to be a automation tool. It is mostly to help you get the +gist of it, maybe point out some potential pitfalls and reminders.

+
+
+ + + + + +
+
Important
+
+The tools requires PyYAML Python package to be installed (pip +install PyYAML). +
+
+
+

This cinder-cfg.py script defaults to reading the +cinder.conf file from the current directory (unless --config option is used) +and outputs files to the current directory (unless --out-dir option is used).

+
+
+

In the output directory you always get a cinder.patch file with the Cinder +specific configuration patch to apply to the OpenStackControlPlane CR but you might also get an additional file called cinder-prereq.yaml file with some +Secrets and MachineConfigs.

+
+
+

Example of an invocation setting input and output explicitly to the defaults for +a Ceph backend:

+
+
+
+
$ python cinder-cfg.py --config cinder.conf --out-dir ./
+WARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using "extraMounts" or remove the option.
+
+WARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.
+
+WARNING:root:You were using user ['nova'] to talk to Nova, but in podified using the service keystone username is preferred in this case ['cinder']. Dropping that configuration.
+
+WARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!
+
+Output written at ./: cinder.patch
+
+
+
+

The script outputs some warnings to let you know things that you might need to do +manually -adding the custom policy, provide the ceph configuration files- and +also let you know a change in how the service_user has been removed.

+
+
+

A different example when using multiple backends, one of them being a 3PAR FC +could be:

+
+
+
+
$ python cinder-cfg.py --config cinder.conf --out-dir ./
+WARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using "extraMounts" or remove the option.
+
+ERROR:root:Backend hpe_fc requires a vendor container image, but there is no certified image available yet. Patch will use the last known image for reference, but IT WILL NOT WORK
+
+WARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.
+
+WARNING:root:You were using user ['nova'] to talk to Nova, but in podified using the service keystone username is preferred, in this case ['cinder']. Dropping that configuration.
+
+WARNING:root:Configuration is using FC, please ensure all your OpenShift nodes have HBAs or use labels to ensure that Volume and Backup services are scheduled on nodes with HBAs.
+
+WARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!
+
+Output written at ./: cinder.patch, cinder-prereq.yaml
+
+
+
+

In this case there are additional messages. The following list provides an explanation of each one:

+
+
+
    +
  • +

    There is one message mentioning how this backend driver needs external vendor +dependencies so the standard container image will not work. Unfortunately this +image is still not available, so an older image is used in the output patch file +for reference. You can then replace this image with one that you build or +with a Red Hat official image once the image is available. In this case you can see in your cinder.patch file:

    +
    +
    +
          cinderVolumes:
    +      hpe-fc:
    +        containerImage: registry.connect.redhat.com/hpe3parcinder/openstack-cinder-volume-hpe3parcinder17-0
    +
    +
    +
  • +
  • +

    The FC message reminds you that this transport protocol requires specific HBA +cards to be present on the nodes where cinder services are running.

    +
  • +
  • +

    In this case it has created the cinder-prereq.yaml file and within the file +there is one MachineConfig and one Secret. The MachineConfig is called 99-master-cinder-enable-multipathd and like the name suggests enables multipathing on all the OCP worker nodes. The Secret is +called openstackcinder-volumes-hpe_fc and contains the 3PAR backend +configuration because it has sensitive information (credentials). The +cinder.patch file uses the following configuration:

    +
    +
    +
       cinderVolumes:
    +      hpe-fc:
    +        customServiceConfigSecrets:
    +        - openstackcinder-volumes-hpe_fc
    +
    +
    +
  • +
+
+
+
+
+

Procedure - Cinder adoption

+
+

Assuming you have already stopped cinder services, prepared the OpenShift nodes, +deployed the OpenStack operators and a bare OpenStack manifest, and migrated the +database, and prepared the patch manifest with the Cinder service configuration, +you must apply the patch and wait for the operator to apply the changes and deploy the Cinder services.

+
+
+

It is recommended to write the patch manifest into a file, for example +cinder.patch and then apply it with something like:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file=cinder.patch
+
+
+
+

For example, for the RBD deployment from the Development Guide the +cinder.patch would look like this:

+
+
+
+
spec:
+  extraMounts:
+  - extraVol:
+    - extraVolType: Ceph
+      mounts:
+      - mountPath: /etc/ceph
+        name: ceph
+        readOnly: true
+      propagation:
+      - CinderVolume
+      - CinderBackup
+      - Glance
+      volumes:
+      - name: ceph
+        projected:
+          sources:
+          - secret:
+              name: ceph-conf-files
+  cinder:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      cinderAPI:
+        override:
+          service:
+            internal:
+              metadata:
+                annotations:
+                  metallb.universe.tf/address-pool: internalapi
+                  metallb.universe.tf/allow-shared-ip: internalapi
+                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+        replicas: 1
+        customServiceConfig: |
+          [DEFAULT]
+          default_volume_type=tripleo
+      cinderScheduler:
+        replicas: 1
+      cinderBackup:
+        networkAttachments:
+        - storage
+        replicas: 1
+        customServiceConfig: |
+          [DEFAULT]
+          backup_driver=cinder.backup.drivers.ceph.CephBackupDriver
+          backup_ceph_conf=/etc/ceph/ceph.conf
+          backup_ceph_user=openstack
+          backup_ceph_pool=backups
+      cinderVolumes:
+        ceph:
+          networkAttachments:
+          - storage
+          replicas: 1
+          customServiceConfig: |
+            [tripleo_ceph]
+            backend_host=hostgroup
+            volume_backend_name=tripleo_ceph
+            volume_driver=cinder.volume.drivers.rbd.RBDDriver
+            rbd_ceph_conf=/etc/ceph/ceph.conf
+            rbd_user=openstack
+            rbd_pool=volumes
+            rbd_flatten_volume_from_snapshot=False
+            report_discard_supported=True
+
+
+
+

Once the services have been deployed you need to clean up the old scheduler +and backup services which will appear as being down while you have others that +appear as being up:

+
+
+
+
openstack volume service list
+
++------------------+------------------------+------+---------+-------+----------------------------+
+| Binary           | Host                   | Zone | Status  | State | Updated At                 |
++------------------+------------------------+------+---------+-------+----------------------------+
+| cinder-backup    | standalone.localdomain | nova | enabled | down  | 2023-06-28T11:00:59.000000 |
+| cinder-scheduler | standalone.localdomain | nova | enabled | down  | 2023-06-28T11:00:29.000000 |
+| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | up    | 2023-06-28T17:00:03.000000 |
+| cinder-scheduler | cinder-scheduler-0     | nova | enabled | up    | 2023-06-28T17:00:02.000000 |
+| cinder-backup    | cinder-backup-0        | nova | enabled | up    | 2023-06-28T17:00:01.000000 |
++------------------+------------------------+------+---------+-------+----------------------------+
+
+
+
+

In this case you need to remove services for hosts standalone.localdomain

+
+
+
+
oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-backup standalone.localdomain
+oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-scheduler standalone.localdomain
+
+
+
+

The reason why we haven’t preserved the name of the backup service is because +we have taken the opportunity to change its configuration to support +Active-Active, even though we are not doing so right now because we have 1 +replica.

+
+
+

Now that the Cinder services are running, the DB schema migration has been completed and you can proceed to apply the DB data migrations. +While it is not necessary to run these data migrations at this precise moment, +because you can run them right before the next upgrade, for adoption it is best to run them now to make sure there are no issues before running production workloads on the deployment.

+
+
+

The command to run the DB data migrations is:

+
+
+
+
oc exec -it cinder-scheduler-0 -- cinder-manage db online_data_migrations
+
+
+
+
+

Post-checks

+
+

Before you can run any checks you need to set the right cloud configuration for +the openstack command to be able to connect to your OpenShift control plane.

+
+
+

Ensure that the openstack alias is defined:

+
+
+
+
alias openstack="oc exec -t openstackclient -- openstack"
+
+
+
+

Now you can run a set of tests to confirm that the deployment is using your +old database contents:

+
+
+
    +
  • +

    See that Cinder endpoints are defined and pointing to the podified +FQDNs:

    +
    +
    +
    openstack endpoint list --service cinderv3
    +
    +
    +
  • +
  • +

    Check that the cinder services are running and up. The API won’t show but if +you get a response you know it’s up as well:

    +
    +
    +
    openstack volume service list
    +
    +
    +
  • +
  • +

    Check that your old volume types, volumes, snapshots, and backups are there:

    +
    +
    +
    openstack volume type list
    +openstack volume list
    +openstack volume snapshot list
    +openstack volume backup list
    +
    +
    +
  • +
+
+
+

To confirm that the configuration is working, the following basic operations are recommended:

+
+
+
    +
  • +

    Create a volume from an image to check that the connection to glance is +working.

    +
    +
    +
    openstack volume create --image cirros --bootable --size 1 disk_new
    +
    +
    +
  • +
  • +

    Backup the old attached volume to a new backup. Example:

    +
    +
    +
    openstack --os-volume-api-version 3.47 volume create --backup backup restored
    +
    +
    +
  • +
+
+
+

You do not boot a nova instance using the new volume from image or try to detach +the old volume because nova and cinder are still not connected.

+
+
+
+
+

Adopting the OpenStack Dashboard

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, Memcached and +keystone should be already adopted.

    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Procedure - Horizon adoption

+
+
    +
  • +

    Patch OpenStackControlPlane to deploy Horizon:

    +
    +
    +
    oc patch openstackcontrolplane openstack --type=merge --patch '
    +spec:
    +  horizon:
    +    enabled: true
    +    apiOverride:
    +      route: {}
    +    template:
    +      memcachedInstance: memcached
    +      secret: osp-secret
    +'
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    See that Horizon instance is successfully deployed and ready

    +
  • +
+
+
+
+
oc get horizon
+
+
+
+
    +
  • +

    Check that dashboard is reachable and returns status code 200

    +
  • +
+
+
+
+
PUBLIC_URL=$(oc get horizon horizon -o jsonpath='{.status.endpoint}')
+curl --silent --output /dev/stderr --head --write-out "%{http_code}" "$PUBLIC_URL/dashboard/auth/login/?next=/dashboard/" | grep 200
+
+
+
+
+
+

Adopting the Shared File Systems service

+
+

OpenStack Manila is the Shared File Systems service. It provides OpenStack +users with a self-service API to create and manage file shares. File +shares (or simply, "shares"), are built for concurrent read/write access by +any number of clients. This, coupled with the inherent elasticity of the +underlying storage makes the Shared File Systems service essential in +cloud environments with require RWX ("read write many") persistent storage.

+
+
+

Networking

+
+

File shares in OpenStack are accessed directly over a network. Hence, it is +essential to plan the networking of the cloud to create a successful and +sustainable orchestration layer for shared file systems.

+
+
+

Manila supports two levels of storage networking abstractions - one where +users can directly control the networking for their respective file shares; +and another where the storage networking is configured by the OpenStack +administrator. It is important to ensure that the networking in the Red Hat +OpenStack Platform 17.1 matches the network plans for your new cloud after +adoption. This ensures that tenant workloads remain connected to +storage through the adoption process, even as the control plane suffers a +minor interruption. Manila’s control plane services are not in the data +path; and shutting down the API, scheduler and share manager services will +not impact access to existing shared file systems.

+
+
+

Typically, storage and storage device management networks are separate. +Manila services only need access to the storage device management network. +For example, if a Ceph cluster was used in the deployment, the "storage" +network refers to the Ceph cluster’s public network, and Manila’s share +manager service needs to be able to reach it.

+
+
+
+

Prerequisites

+
+
    +
  • +

    Ensure that manila systemd services (api, cron, scheduler) are +stopped. For more information, see Stopping OpenStack services.

    +
  • +
  • +

    Ensure that manila pacemaker services ("openstack-manila-share") are +stopped. For more information, see Stopping OpenStack services.

    +
  • +
  • +

    Ensure that the database migration has completed. For more information, see Migrating databases to MariaDB instances.

    +
  • +
  • +

    Ensure that OpenShift nodes where manila-share service will be deployed +can reach the management network that the storage system is in.

    +
  • +
  • +

    Ensure that services such as keystone and memcached are available prior to +adopting manila services.

    +
  • +
  • +

    If tenant-driven networking was enabled (driver_handles_share_servers=True), +ensure that neutron has been deployed prior to +adopting manila services.

    +
  • +
+
+
+
+

Procedure - Manila adoption

+
+
Copying configuration from the RHOSP 17.1 deployment
+
+

Define the CONTROLLER1_SSH environment variable, if it hasn’t been +defined already. Then copy the +configuration file from RHOSP 17.1 for reference.

+
+
+
+
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/manila/etc/manila/manila.conf | awk '!/^ *#/ && NF' > ~/manila.conf
+
+
+
+

Review this configuration, alongside any configuration changes that were noted +since RHOSP 17.1. Not all of it makes sense to bring into the new cloud +environment:

+
+
+
    +
  • +

    The manila operator is capable of setting up database related configuration +([database]), service authentication (auth_strategy, +[keystone_authtoken]), message bus configuration +(transport_url, control_exchange), the default paste config +(api_paste_config) and inter-service communication configuration (` +, `[nova], [cinder], [glance] [oslo_messaging_*]). So +all of these can be ignored.

    +
  • +
  • +

    Ignore the osapi_share_listen configuration. In RHOSP 18, you rely on +OpenShift routes and ingress.

    +
  • +
  • +

    Pay attention to policy overrides. In RHOSP 18, manila ships with a secure +default RBAC, and overrides may not be necessary. Please review RBAC +defaults by using the Oslo policy generator +tool. If a custom policy is necessary, you must provide it as a +ConfigMap. The following sample spec illustrates how a +ConfigMap called manila-policy can be set up with the contents of a +file called policy.yaml.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        manilaAPI:
+          customServiceConfig: |
+             [oslo_policy]
+             policy_file=/etc/manila/policy.yaml
+        extraMounts:
+        - extraVol:
+          - extraVolType: Undefined
+            mounts:
+            - mountPath: /etc/manila/
+              name: policy
+              readOnly: true
+            propagation:
+            - ManilaAPI
+            volumes:
+            - name: policy
+              projected:
+                sources:
+                - configMap:
+                    name: manila-policy
+                    items:
+                      - key: policy
+                        path: policy.yaml
+
+
+
+
    +
  • +

    The Manila API service needs the enabled_share_protocols option to be +added in the customServiceConfig section in manila: template: manilaAPI.

    +
  • +
  • +

    If you had scheduler overrides, add them to the customServiceConfig +section in manila: template: manilaScheduler.

    +
  • +
  • +

    If you had multiple storage backend drivers configured with RHOSP 17.1, +you will need to split them up when deploying RHOSP 18. Each storage +backend driver needs to use its own instance of the manila-share +service.

    +
  • +
  • +

    If a storage backend driver needs a custom container image, find it on the +RHOSP Ecosystem Catalog +and set manila: template: manilaShares: <custom name> : containerImage +value. The following example illustrates multiple storage backend drivers, +using custom container images.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        manilaAPI:
+          customServiceConfig: |
+            [DEFAULT]
+            enabled_share_protocols = nfs
+          replicas: 3
+        manilaScheduler:
+          replicas: 3
+        manilaShares:
+         netapp:
+           customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends = netapp
+             [netapp]
+             driver_handles_share_servers = False
+             share_backend_name = netapp
+             share_driver = manila.share.drivers.netapp.common.NetAppDriver
+             netapp_storage_family = ontap_cluster
+             netapp_transport_type = http
+           replicas: 1
+         pure:
+            customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends=pure-1
+             [pure-1]
+             driver_handles_share_servers = False
+             share_backend_name = pure-1
+             share_driver = manila.share.drivers.purestorage.flashblade.FlashBladeShareDriver
+             flashblade_mgmt_vip = 203.0.113.15
+             flashblade_data_vip = 203.0.10.14
+            containerImage: registry.connect.redhat.com/purestorage/openstack-manila-share-pure-rhosp-18-0
+            replicas: 1
+
+
+
+
    +
  • +

    If providing sensitive information, such as passwords, hostnames and +usernames, it is recommended to use OpenShift secrets, and the +customServiceConfigSecrets key. An example:

    +
  • +
+
+
+
+
cat << __EOF__ > ~/netapp_secrets.conf
+
+[netapp]
+netapp_server_hostname = 203.0.113.10
+netapp_login = fancy_netapp_user
+netapp_password = secret_netapp_password
+netapp_vserver = mydatavserver
+__EOF__
+
+oc create secret generic osp-secret-manila-netapp --from-file=~/netapp_secrets.conf -n openstack
+
+
+
+
    +
  • +

    customConfigSecrets can be used in any service, the following is a +config example using the secret you created above.

    +
  • +
+
+
+
+
  spec:
+    manila:
+      enabled: true
+      template:
+        < . . . >
+        manilaShares:
+         netapp:
+           customServiceConfig: |
+             [DEFAULT]
+             debug = true
+             enabled_share_backends = netapp
+             [netapp]
+             driver_handles_share_servers = False
+             share_backend_name = netapp
+             share_driver = manila.share.drivers.netapp.common.NetAppDriver
+             netapp_storage_family = ontap_cluster
+             netapp_transport_type = http
+           customServiceConfigSecrets:
+             - osp-secret-manila-netapp
+           replicas: 1
+    < . . . >
+
+
+
+
    +
  • +

    If you need to present extra files to any of the services, you can use +extraMounts. For example, when using ceph, you’d need Manila’s ceph +user’s keyring file as well as the ceph.conf configuration file +available. These are mounted via extraMounts as done with the example +below.

    +
  • +
  • +

    Ensure that the names of the backends (share_backend_name) remain as they +did on RHOSP 17.1.

    +
  • +
  • +

    It is recommended to set the replica count of the manilaAPI service and +the manilaScheduler service to 3. You should ensure to set the replica +count of the manilaShares service/s to 1.

    +
  • +
  • +

    Ensure that the appropriate storage management network is specified in the +manilaShares section. The example below connects the manilaShares +instance with the CephFS backend driver to the storage network.

    +
  • +
+
+
+
+
Deploying the manila control plane
+
+

Patch OpenStackControlPlane to deploy Manila; here’s an example that uses +Native CephFS:

+
+
+
+
cat << __EOF__ > ~/manila.patch
+spec:
+  manila:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      manilaAPI:
+        replicas: 3
+        customServiceConfig: |
+          [DEFAULT]
+          enabled_share_protocols = cephfs
+        override:
+          service:
+            internal:
+              metadata:
+                annotations:
+                  metallb.universe.tf/address-pool: internalapi
+                  metallb.universe.tf/allow-shared-ip: internalapi
+                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer
+      manilaScheduler:
+        replicas: 3
+      manilaShares:
+        cephfs:
+          replicas: 1
+          customServiceConfig: |
+            [DEFAULT]
+            enabled_share_backends = tripleo_ceph
+            [tripleo_ceph]
+            driver_handles_share_servers=False
+            share_backend_name=tripleo_ceph
+            share_driver=manila.share.drivers.cephfs.driver.CephFSDriver
+            cephfs_conf_path=/etc/ceph/ceph.conf
+            cephfs_auth_id=openstack
+            cephfs_cluster_name=ceph
+            cephfs_volume_mode=0755
+            cephfs_protocol_helper_type=CEPHFS
+          networkAttachments:
+              - storage
+__EOF__
+
+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file=~/manila.patch
+
+
+
+
+
+

Post-checks

+
+
Inspect the resulting manila service pods
+
+
+
oc get pods -l service=manila
+
+
+
+
+
Check that Manila API service is registered in Keystone
+
+
+
openstack service list | grep manila
+
+
+
+
+
openstack endpoint list | grep manila
+
+| 1164c70045d34b959e889846f9959c0e | regionOne | manila       | share        | True    | internal  | http://manila-internal.openstack.svc:8786/v1/%(project_id)s        |
+| 63e89296522d4b28a9af56586641590c | regionOne | manilav2     | sharev2      | True    | public    | https://manila-public-openstack.apps-crc.testing/v2                |
+| af36c57adcdf4d50b10f484b616764cc | regionOne | manila       | share        | True    | public    | https://manila-public-openstack.apps-crc.testing/v1/%(project_id)s |
+| d655b4390d7544a29ce4ea356cc2b547 | regionOne | manilav2     | sharev2      | True    | internal  | http://manila-internal.openstack.svc:8786/v2                       |
+
+
+
+
+
Verify resources
+
+

Test the health of the service:

+
+
+
+
openstack share service list
+openstack share pool list --detail
+
+
+
+

Check on existing workloads:

+
+
+
+
openstack share list
+openstack share snapshot list
+
+
+
+

You can create further resources:

+
+
+
+
openstack share create cephfs 10 --snapshot mysharesnap --name myshareclone
+
+
+
+
+
+
+

Adopting the Bare Metal Provisioning service

+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, the service databases +must already be imported into the podified MariaDB.

    +
  • +
+
+
+
+

Variables

+
+

(There are no shell variables necessary currently.)

+
+
+
+

Pre-checks

+
+

TODO

+
+
+
+

Procedure - Ironic adoption

+
+

TODO

+
+
+
+

Post-checks

+
+

TODO

+
+
+
+
+

Adopting Heat

+
+

Adopting Heat means that an existing OpenStackControlPlane CR, where Heat +is supposed to be disabled, should be patched to start the service with the +configuration parameters provided by the source environment.

+
+
+

After the adoption process has been completed, a user can expect that they +will then have CR’s for Heat, HeatAPI, HeatEngine and HeatCFNAPI. +Additionally, a user should have endpoints created within Keystone to facilitate +the above mentioned servies.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A OpenShift environment is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. Notably, MariaDB and Keystone +should be already adopted.

    +
  • +
  • +

    In addition, if your existing Heat stacks contain resources from other services +such as Neutron, Nova, Swift, etc. Those services should be adopted first before +trying to adopt Heat.

    +
  • +
+
+
+
+

Procedure - Heat adoption

+
+

As already done for Keystone, the Heat Adoption follows a similar pattern.

+
+
+

Patch the osp-secret to update the HeatAuthEncryptionKey and HeatPassword. This needs +to match what you have configured in the existing TripleO Heat configuration.

+
+
+

You can retrieve and verify the existing auth_encryption_key and service passwords via:

+
+
+
+
[stack@rhosp17 ~]$ grep -E 'HeatPassword|HeatAuth' ~/overcloud-deploy/overcloud/overcloud-passwords.yaml
+  HeatAuthEncryptionKey: Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2
+  HeatPassword: dU2N0Vr2bdelYH7eQonAwPfI3
+
+
+
+

And verifying on one of the Controllers that this is indeed the value in use:

+
+
+
+
[stack@rhosp17 ~]$ ansible -i overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml overcloud-controller-0 -m shell -a "grep auth_encryption_key /var/lib/config-data/puppet-generated/heat/etc/heat/heat.conf | grep -Ev '^#|^$'" -b
+overcloud-controller-0 | CHANGED | rc=0 >>
+auth_encryption_key=Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2
+
+
+
+

This password needs to be base64 encoded and added to the osp-secret

+
+
+
+
❯ echo Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2 | base64
+UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK
+
+❯ oc patch secret osp-secret --type='json' -p='[{"op" : "replace" ,"path" : "/data/HeatAuthEncryptionKey" ,"value" : "UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK"}]'
+secret/osp-secret patched
+
+
+
+

Patch OpenStackControlPlane to deploy Heat:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch '
+spec:
+  heat:
+    enabled: true
+    apiOverride:
+      route: {}
+    template:
+      databaseInstance: openstack
+      secret: osp-secret
+      memcachedInstance: memcached
+      passwordSelectors:
+        authEncryptionKey: HeatAuthEncryptionKey
+        database: HeatDatabasePassword
+        service: HeatPassword
+'
+
+
+
+
+

Post-checks

+
+

Ensure all of the CR’s reach the "Setup Complete" state:

+
+
+
+
❯ oc get Heat,HeatAPI,HeatEngine,HeatCFNAPI
+NAME                           STATUS   MESSAGE
+heat.heat.openstack.org/heat   True     Setup complete
+
+NAME                                  STATUS   MESSAGE
+heatapi.heat.openstack.org/heat-api   True     Setup complete
+
+NAME                                        STATUS   MESSAGE
+heatengine.heat.openstack.org/heat-engine   True     Setup complete
+
+NAME                                        STATUS   MESSAGE
+heatcfnapi.heat.openstack.org/heat-cfnapi   True     Setup complete
+
+
+
+
Check that Heat service is registered in Keystone
+
+
+
 oc exec -it openstackclient -- openstack service list -c Name -c Type
++------------+----------------+
+| Name       | Type           |
++------------+----------------+
+| heat       | orchestration  |
+| glance     | image          |
+| heat-cfn   | cloudformation |
+| ceilometer | Ceilometer     |
+| keystone   | identity       |
+| placement  | placement      |
+| cinderv3   | volumev3       |
+| nova       | compute        |
+| neutron    | network        |
++------------+----------------+
+
+
+
+
+
❯ oc exec -it openstackclient -- openstack endpoint list --service=heat -f yaml
+- Enabled: true
+  ID: 1da7df5b25b94d1cae85e3ad736b25a5
+  Interface: public
+  Region: regionOne
+  Service Name: heat
+  Service Type: orchestration
+  URL: http://heat-api-public-openstack-operators.apps.okd.bne-shift.net/v1/%(tenant_id)s
+- Enabled: true
+  ID: 414dd03d8e9d462988113ea0e3a330b0
+  Interface: internal
+  Region: regionOne
+  Service Name: heat
+  Service Type: orchestration
+  URL: http://heat-api-internal.openstack-operators.svc:8004/v1/%(tenant_id)s
+
+
+
+
+
Check Heat engine services are up
+
+
+
 oc exec -it openstackclient -- openstack orchestration service list -f yaml
+- Binary: heat-engine
+  Engine ID: b16ad899-815a-4b0c-9f2e-e6d9c74aa200
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:01.000000'
+- Binary: heat-engine
+  Engine ID: 887ed392-0799-4310-b95c-ac2d3e6f965f
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:00.000000'
+- Binary: heat-engine
+  Engine ID: 26ed9668-b3f2-48aa-92e8-2862252485ea
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:00.000000'
+- Binary: heat-engine
+  Engine ID: 1011943b-9fea-4f53-b543-d841297245fd
+  Host: heat-engine-6d47856868-p7pzz
+  Hostname: heat-engine-6d47856868-p7pzz
+  Status: up
+  Topic: engine
+  Updated At: '2023-10-11T21:48:01.000000'
+
+
+
+
+
Verify you can now see your Heat stacks again
+
+

Test whether you can create networks, subnets, ports, or routers:

+
+
+
+
❯ openstack stack list -f yaml
+- Creation Time: '2023-10-11T22:03:20Z'
+  ID: 20f95925-7443-49cb-9561-a1ab736749ba
+  Project: 4eacd0d1cab04427bc315805c28e66c9
+  Stack Name: test-networks
+  Stack Status: CREATE_COMPLETE
+  Updated Time: null
+
+
+
+
+
+
+

Adopting Telemetry services

+
+

Adopting Telemetry means that an existing OpenStackControlPlane CR, where Telemetry services are supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. MariaDB, Keystone and EDPM should be already adopted.

    +
  • +
+
+
+
+

Procedure - Telemetry adoption

+
+

Patch OpenStackControlPlane to deploy Ceilometer services:

+
+
+
+
cat << EOF > ceilometer_patch.yaml
+spec:
+  ceilometer:
+    enabled: true
+    template:
+      centralImage: quay.io/podified-antelope-centos9/openstack-ceilometer-central:current-podified
+      computeImage: quay.io/podified-antelope-centos9/openstack-ceilometer-compute:current-podified
+      customServiceConfig: |
+        [DEFAULT]
+        debug=true
+      ipmiImage: quay.io/podified-antelope-centos9/openstack-ceilometer-ipmi:current-podified
+      nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
+      notificationImage: quay.io/podified-antelope-centos9/openstack-ceilometer-notification:current-podified
+      secret: osp-secret
+      sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
+EOF
+
+
+
+
+
+

If you have previously backed up your OpenStack services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. For more information, see Pulling the OpenStack configuration.

+
+
+
+
+
+
pushd os-diff
+./os-diff cdiff --service ceilometer -c /tmp/collect_tripleo_configs/ceilometer/etc/ceilometer/ceilometer.conf -o ceilometer_patch.yaml
+
+
+
+
+
+

This will produce the difference between both ini configuration files.

+
+
+
+
+

Patch OpenStackControlPlane to deploy Ceilometer services:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file ceilometer_patch.yaml
+
+
+
+
+

Post-checks

+
+
Inspect the resulting Ceilometer pods
+
+
+
CEILOMETETR_POD=`oc get pods -l service=ceilometer | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $CEILOMETETR_POD -c ceilometer-central-agent -- cat /etc/ceilometer/ceilometer.conf
+
+
+
+
+
Inspect the resulting Ceilometer IPMI agent pod on Data Plane nodes
+
+
+
podman ps | grep ceilometer-ipmi
+
+
+
+
+
Inspecting enabled pollsters
+
+
+
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml']}"  | base64 -d
+
+
+
+
+
Enabling pollsters according to requirements
+
+
+
cat << EOF > polling.yaml
+---
+sources:
+    - name: pollsters
+      interval: 300
+      meters:
+        - volume.size
+        - image.size
+        - cpu
+        - memory
+EOF
+
+oc patch secret ceilometer-config-data  --patch="{\"data\": { \"polling.yaml\": \"$(base64 -w0 polling.yaml)\"}}"
+
+
+
+
+
+
+

Adopting autoscaling

+
+

Adopting autoscaling means that an existing OpenStackControlPlane CR, where Aodh services are supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.

+
+
+

This guide also assumes that:

+
+
+
    +
  1. +

    A TripleO environment (the source Cloud) is running on one side;

    +
  2. +
  3. +

    A SNO / CodeReadyContainers is running on the other side.

    +
  4. +
+
+
+

Prerequisites

+
+
    +
  • +

    Previous Adoption steps completed. MariaDB, Keystone, Heat and Telemetry +should be already adopted.

    +
  • +
+
+
+
+

Procedure - Autoscaling adoption

+
+

Patch OpenStackControlPlane to deploy autoscaling services:

+
+
+
+
cat << EOF > aodh_patch.yaml
+spec:
+  autoscaling:
+    enabled: true
+    prometheus:
+      deployPrometheus: false
+    aodh:
+      customServiceConfig: |
+        [DEFAULT]
+        debug=true
+      secret: osp-secret
+      apiImage: "quay.io/podified-antelope-centos9/openstack-aodh-api:current-podified"
+      evaluatorImage: "quay.io/podified-antelope-centos9/openstack-aodh-evaluator:current-podified"
+      notifierImage: "quay.io/podified-antelope-centos9/openstack-aodh-notifier:current-podified"
+      listenerImage: "quay.io/podified-antelope-centos9/openstack-aodh-listener:current-podified"
+      passwordSelectors:
+      databaseUser: aodh
+      databaseInstance: openstack
+      memcachedInstance: memcached
+EOF
+
+
+
+
+
+

If you have previously backed up your OpenStack services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. For more information, see Pulling the OpenStack configuration.

+
+
+
+
+
+
pushd os-diff
+./os-diff cdiff --service aodh -c /tmp/collect_tripleo_configs/aodh/etc/aodh/aodh.conf -o aodh_patch.yaml
+
+
+
+
+
+

This will producre the difference between both ini configuration files.

+
+
+
+
+

Patch OpenStackControlPlane to deploy Aodh services:

+
+
+
+
oc patch openstackcontrolplane openstack --type=merge --patch-file aodh_patch.yaml
+
+
+
+
+

Post-checks

+
+
If autoscaling services are enabled inspect Aodh pods
+
+
+
AODH_POD=`oc get pods -l service=aodh | tail -n 1 | cut -f 1 -d' '`
+oc exec -t $AODH_POD -c aodh-api -- cat /etc/aodh/aodh.conf
+
+
+
+
+
Check whether Aodh API service is registered in Keystone
+
+
+
openstack endpoint list | grep aodh
+| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | aodh      | network      | True    | public    | http://aodh-public-openstack.apps-crc.testing  |
+| b943243e596847a9a317c8ce1800fa98 | regionOne | aodh      | network      | True    | internal  | http://aodh-internal.openstack.svc:9696        |
+| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | aodh      | network      | True    | admin     | http://192.168.122.99:9696                     |
+
+
+
+
+
Create sample resources
+
+

You can test whether you can create alarms.

+
+
+
+
openstack alarm create \
+--name low_alarm \
+--type gnocchi_resources_threshold \
+--metric cpu \
+--resource-id b7ac84e4-b5ca-4f9e-a15c-ece7aaf68987 \
+--threshold 35000000000 \
+--comparison-operator lt \
+--aggregation-method rate:mean \
+--granularity 300 \
+--evaluation-periods 3 \
+--alarm-action 'log:\\' \
+--ok-action 'log:\\' \
+--resource-type instance
+
+
+
+
+
+
+

Stopping infrastructure management and Compute services

+
+

Before you start the EDPM adoption, make sure that you stop the Compute, +libvirt, load balancing, messaging, and database services on the source cloud. You also need to disable repositories for modular libvirt daemons on Compute hosts.

+
+
+

After this step, the source cloud’s control plane can be decomissioned, +which is taking down only cloud controllers, database and messaging nodes. +Nodes that must remain functional are those running the compute, storage, +or networker roles (in terms of composable roles covered by Tripleo Heat +Templates).

+
+
+

Variables

+
+

Define the shell variables used in the steps below. +Define the map of compute node name, IP pairs. +The values are just illustrative and refer to a single node standalone director deployment, use values that are correct for your environment:

+
+
+
+
EDPM_PRIVATEKEY_PATH="~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa"
+declare -A computes
+computes=(
+  ["standalone.localdomain"]="192.168.122.100"
+  # ...
+)
+
+
+
+

These ssh variables with the ssh commands are used instead of ansible to try to create instructions that are independent on where they are running. But ansible commands could be used to achieve the same result if you are in the right host, for example to stop a service:

+
+
+
+
. stackrc
+ansible -i $(which tripleo-ansible-inventory) Compute -m shell -a "sudo systemctl stop tripleo_virtqemud.service" -b
+
+
+
+
+

Stopping remaining services

+
+

Remove the conflicting repositories and packages (in case of a devsetup that +uses Standalone TripleO) from all compute hosts. That is required to install +libvirt packages, when these hosts become adopted as External DataPlane Managed +(EDPM) nodes, where modular libvirt daemons are no longer running in podman +containers.

+
+
+

These steps can be automated with a simple script that relies on the previously +defined environmental variables and function:

+
+
+
+
ComputeServicesToStop=(
+                "tripleo_nova_compute.service"
+                "tripleo_nova_libvirt.target"
+                "tripleo_nova_migration_target.service"
+                "tripleo_nova_virtlogd_wrapper.service"
+                "tripleo_nova_virtnodedevd.service"
+                "tripleo_nova_virtproxyd.service"
+                "tripleo_nova_virtqemud.service"
+                "tripleo_nova_virtsecretd.service"
+                "tripleo_nova_virtstoraged.service")
+
+PacemakerResourcesToStop=(
+                "galera-bundle"
+                "haproxy-bundle"
+                "rabbitmq-bundle")
+
+echo "Disabling systemd units and cleaning up for compute services"
+for i in "${!computes[@]}"; do
+    SSH_CMD="ssh -i $EDPM_PRIVATEKEY_PATH root@${computes[$i]}"
+    for service in ${ComputeServicesToStop[*]}; do
+        echo "Stopping the $service in compute $i"
+        if ${SSH_CMD} sudo systemctl is-active $service; then
+            ${SSH_CMD} sudo systemctl disable --now $service
+            ${SSH_CMD} test -f /etc/systemd/system/$service '||' sudo systemctl mask $service
+        fi
+    done
+done
+
+echo "Stopping pacemaker services"
+for i in {1..3}; do
+    SSH_CMD=CONTROLLER${i}_SSH
+    if [ ! -z "${!SSH_CMD}" ]; then
+        echo "Using controller $i to run pacemaker commands"
+        for resource in ${PacemakerResourcesToStop[*]}; do
+            if ${!SSH_CMD} sudo pcs resource config $resource; then
+                ${!SSH_CMD} sudo pcs resource disable $resource
+            fi
+        done
+        break
+    fi
+done
+
+
+
+
+
+

Adopting EDPM

+
+

Prerequisites

+
+ +
+
+
+
+

WARNING This step is a "point of no return" in the EDPM adoption +procedure. The source control plane and data plane services must not +be ever enabled back, after EDPM is deployed, and Podified control +plane has taken control over it.

+
+
+
+
+
+

Variables

+
+

Define the shell variables used in the Fast-forward upgrade steps below. +Set FIP to the floating IP address of the test VM pre-created earlier on the source cloud. +Define the map of compute node name, IP pairs. +The values are just illustrative, use values that are correct for your environment:

+
+
+
+
PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)
+
+alias openstack="oc exec -t openstackclient -- openstack"
+FIP=192.168.122.20
+declare -A computes
+export computes=(
+  ["standalone.localdomain"]="192.168.122.100"
+  # ...
+)
+
+
+
+
+

Pre-checks

+
+
    +
  • +

    Make sure the IPAM is configured

    +
  • +
+
+
+
+
oc apply -f - <<EOF
+apiVersion: network.openstack.org/v1beta1
+kind: NetConfig
+metadata:
+  name: netconfig
+spec:
+  networks:
+  - name: ctlplane
+    dnsDomain: ctlplane.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 192.168.122.120
+        start: 192.168.122.100
+      - end: 192.168.122.200
+        start: 192.168.122.150
+      cidr: 192.168.122.0/24
+      gateway: 192.168.122.1
+  - name: internalapi
+    dnsDomain: internalapi.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.17.0.250
+        start: 172.17.0.100
+      cidr: 172.17.0.0/24
+      vlan: 20
+  - name: External
+    dnsDomain: external.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 10.0.0.250
+        start: 10.0.0.100
+      cidr: 10.0.0.0/24
+      gateway: 10.0.0.1
+  - name: storage
+    dnsDomain: storage.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.18.0.250
+        start: 172.18.0.100
+      cidr: 172.18.0.0/24
+      vlan: 21
+  - name: storagemgmt
+    dnsDomain: storagemgmt.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.20.0.250
+        start: 172.20.0.100
+      cidr: 172.20.0.0/24
+      vlan: 23
+  - name: tenant
+    dnsDomain: tenant.example.com
+    subnets:
+    - name: subnet1
+      allocationRanges:
+      - end: 172.19.0.250
+        start: 172.19.0.100
+      cidr: 172.19.0.0/24
+      vlan: 22
+EOF
+
+
+
+
+

Procedure - EDPM adoption

+
+
    +
  • +

    Temporary fix until the OSP 17 backport of the stable compute UUID feature +lands.

    +
    +

    For each compute node grab the UUID of the compute service and write it too +the stable compute_id file in /var/lib/nova/ directory.

    +
    +
    +
    +
    for name in "${!computes[@]}";
    +do
    +  uuid=$(\
    +    openstack hypervisor show $name \
    +    -f value -c 'id'\
    +  )
    +  echo "Writing $uuid to /var/lib/nova/compute_id on $name"
    +  ssh \
    +    -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa \
    +    root@"${computes[$name]}" \
    +    "echo $uuid > /var/lib/nova/compute_id"
    +done
    +
    +
    +
  • +
  • +

    Create a ssh authentication secret for the EDPM nodes:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: v1
    +kind: Secret
    +metadata:
    +    name: dataplane-adoption-secret
    +    namespace: openstack
    +data:
    +    ssh-privatekey: |
    +$(cat ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa | base64 | sed 's/^/        /')
    +EOF
    +
    +
    +
  • +
  • +

    Generate an ssh key-pair nova-migration-ssh-key secret

    +
    +
    +
    cd "$(mktemp -d)"
    +ssh-keygen -f ./id -t ecdsa-sha2-nistp521 -N ''
    +oc get secret nova-migration-ssh-key || oc create secret generic nova-migration-ssh-key \
    +  -n openstack \
    +  --from-file=ssh-privatekey=id \
    +  --from-file=ssh-publickey=id.pub \
    +  --type kubernetes.io/ssh-auth
    +rm -f id*
    +cd -
    +
    +
    +
  • +
  • +

    Create a Nova Compute Extra Config service

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: v1
    +kind: ConfigMap
    +metadata:
    +  name: nova-compute-extraconfig
    +  namespace: openstack
    +data:
    +  19-nova-compute-cell1-workarounds.conf: |
    +    [workarounds]
    +    disable_compute_service_check_for_ffu=true
    +---
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneService
    +metadata:
    +  name: nova-compute-extraconfig
    +  namespace: openstack
    +spec:
    +  label: nova.compute.extraconfig
    +  configMaps:
    +    - nova-compute-extraconfig
    +  secrets:
    +    - nova-cell1-compute-config
    +    - nova-migration-ssh-key
    +  playbook: osp.edpm.nova
    +EOF
    +
    +
    +
    +

    The secret nova-cell<X>-compute-config is auto-generated for each +cell<X>. That secret, alongside nova-migration-ssh-key, should +always be specified for each custom OpenStackDataPlaneService related to Nova.

    +
    +
  • +
  • +

    Create a repo-setup service to configure Antelope repositories

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneService
    +metadata:
    +  name: repo-setup
    +  namespace: openstack
    +spec:
    +  label: dataplane.deployment.repo.setup
    +  play: |
    +    - hosts: all
    +      strategy: linear
    +      tasks:
    +        - name: Enable podified-repos
    +          become: true
    +          ansible.builtin.shell: |
    +            # TODO: Use subscription-manager and a valid OSP18 repos instead
    +            # This is a hack to deploy RDO Delorean repos to RHEL as if it were Centos 9 Stream
    +            set -euxo pipefail
    +            curl -sL https://github.com/openstack-k8s-operators/repo-setup/archive/refs/heads/main.tar.gz | tar -xz
    +            python3 -m venv ./venv
    +            PBR_VERSION=0.0.0 ./venv/bin/pip install ./repo-setup-main
    +            # This is required for FIPS enabled until trunk.rdoproject.org
    +            # is not being served from a centos7 host, tracked by
    +            # https://issues.redhat.com/browse/RHOSZUUL-1517
    +            dnf -y install crypto-policies
    +            update-crypto-policies --set FIPS:NO-ENFORCE-EMS
    +            # FIXME: perform dnf upgrade for other packages in EDPM ansible
    +            # here we only ensuring that decontainerized libvirt can start
    +            ./venv/bin/repo-setup current-podified -b antelope -d centos9 --stream
    +            dnf -y upgrade openstack-selinux
    +            rm -f /run/virtlogd.pid
    +            rm -rf repo-setup-main
    +EOF
    +
    +
    +
  • +
  • +

    Deploy OpenStackDataPlaneNodeSet:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneNodeSet
    +metadata:
    +  name: openstack
    +spec:
    +  networkAttachments:
    +      - ctlplane
    +  preProvisioned: true
    +  services:
    +    - repo-setup
    +    - download-cache
    +    - bootstrap
    +    - configure-network
    +    - validate-network
    +    - install-os
    +    - configure-os
    +    - run-os
    +    - reboot-os
    +    - install-certs
    +    - libvirt
    +    - nova-compute-extraconfig
    +    - ovn
    +    - neutron-metadata
    +  env:
    +    - name: ANSIBLE_CALLBACKS_ENABLED
    +      value: "profile_tasks"
    +    - name: ANSIBLE_FORCE_COLOR
    +      value: "True"
    +  nodes:
    +    standalone:
    +      hostName: standalone
    +      ansible:
    +        ansibleHost: ${computes[standalone.localdomain]}
    +      networks:
    +      - defaultRoute: true
    +        fixedIP: ${computes[standalone.localdomain]}
    +        name: ctlplane
    +        subnetName: subnet1
    +      - name: internalapi
    +        subnetName: subnet1
    +      - name: storage
    +        subnetName: subnet1
    +      - name: tenant
    +        subnetName: subnet1
    +  nodeTemplate:
    +    ansibleSSHPrivateKeySecret: dataplane-adoption-secret
    +    managementNetwork: ctlplane
    +    ansible:
    +      ansibleUser: root
    +      ansiblePort: 22
    +      ansibleVars:
    +        service_net_map:
    +          nova_api_network: internalapi
    +          nova_libvirt_network: internalapi
    +
    +        # edpm_network_config
    +        # Default nic config template for a EDPM compute node
    +        # These vars are edpm_network_config role vars
    +        edpm_network_config_override: ""
    +        edpm_network_config_template: |
    +           ---
    +           {% set mtu_list = [ctlplane_mtu] %}
    +           {% for network in role_networks %}
    +           {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
    +           {%- endfor %}
    +           {% set min_viable_mtu = mtu_list | max %}
    +           network_config:
    +           - type: ovs_bridge
    +             name: {{ neutron_physical_bridge_name }}
    +             mtu: {{ min_viable_mtu }}
    +             use_dhcp: false
    +             dns_servers: {{ ctlplane_dns_nameservers }}
    +             domain: {{ dns_search_domains }}
    +             addresses:
    +             - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
    +             routes: {{ ctlplane_host_routes }}
    +             members:
    +             - type: interface
    +               name: nic1
    +               mtu: {{ min_viable_mtu }}
    +               # force the MAC address of the bridge to this interface
    +               primary: true
    +           {% for network in role_networks %}
    +             - type: vlan
    +               mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}
    +               vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}
    +               addresses:
    +               - ip_netmask:
    +                   {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}
    +               routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}
    +           {% endfor %}
    +
    +        edpm_network_config_hide_sensitive_logs: false
    +        #
    +        # These vars are for the network config templates themselves and are
    +        # considered EDPM network defaults.
    +        neutron_physical_bridge_name: br-ctlplane
    +        neutron_public_interface_name: eth0
    +        role_networks:
    +        - InternalApi
    +        - Storage
    +        - Tenant
    +        networks_lower:
    +          External: external
    +          InternalApi: internalapi
    +          Storage: storage
    +          Tenant: tenant
    +
    +        # edpm_nodes_validation
    +        edpm_nodes_validation_validate_controllers_icmp: false
    +        edpm_nodes_validation_validate_gateway_icmp: false
    +
    +        timesync_ntp_servers:
    +        - hostname: clock.redhat.com
    +        - hostname: clock2.redhat.com
    +
    +        edpm_ovn_controller_agent_image: quay.io/podified-antelope-centos9/openstack-ovn-controller:current-podified
    +        edpm_iscsid_image: quay.io/podified-antelope-centos9/openstack-iscsid:current-podified
    +        edpm_logrotate_crond_image: quay.io/podified-antelope-centos9/openstack-cron:current-podified
    +        edpm_nova_compute_container_image: quay.io/podified-antelope-centos9/openstack-nova-compute:current-podified
    +        edpm_nova_libvirt_container_image: quay.io/podified-antelope-centos9/openstack-nova-libvirt:current-podified
    +        edpm_ovn_metadata_agent_image: quay.io/podified-antelope-centos9/openstack-neutron-metadata-agent-ovn:current-podified
    +
    +        gather_facts: false
    +        enable_debug: false
    +        # edpm firewall, change the allowed CIDR if needed
    +        edpm_sshd_configure_firewall: true
    +        edpm_sshd_allowed_ranges: ['192.168.122.0/24']
    +        # SELinux module
    +        edpm_selinux_mode: enforcing
    +        plan: overcloud
    +
    +        # Do not attempt OVS 3.2 major upgrades here
    +        edpm_ovs_packages:
    +        - openvswitch3.1
    +EOF
    +
    +
    +
  • +
  • +

    Deploy OpenStackDataPlaneDeployment:

    +
    +
    +
    oc apply -f - <<EOF
    +apiVersion: dataplane.openstack.org/v1beta1
    +kind: OpenStackDataPlaneDeployment
    +metadata:
    +  name: openstack
    +spec:
    +  nodeSets:
    +  - openstack
    +EOF
    +
    +
    +
  • +
+
+
+
+

Post-checks

+
+
    +
  • +

    Check if all the Ansible EE pods reaches Completed status:

    +
    +
    +
      # watching the pods
    +  watch oc get pod -l app=openstackansibleee
    +
    +
    +
    +
    +
      # following the ansible logs with:
    +  oc logs -l app=openstackansibleee -f --max-log-requests 10
    +
    +
    +
  • +
  • +

    Wait for the dataplane node set to reach the Ready status:

    +
    +
    +
      oc wait --for condition=Ready osdpns/openstack --timeout=30m
    +
    +
    +
  • +
+
+
+
+

Nova compute services fast-forward upgrade from Wallaby to Antelope

+
+

Nova services rolling upgrade cannot be done during adoption, +there is in a lock-step with Nova control plane services, because those +are managed independently by EDPM ansible, and Kubernetes operators. +Nova service operator and OpenStack Dataplane operator ensure upgrading +is done independently of each other, by configuring +[upgrade_levels]compute=auto for Nova services. Nova control plane +services apply the change right after CR is patched. Nova compute EDPM +services will catch up the same config change with ansible deployment +later on.

+
+
+
+
+

NOTE: Additional orchestration happening around the FFU workarounds +configuration for Nova compute EDPM service is a subject of future changes.

+
+
+
+
+
    +
  • +

    Wait for cell1 Nova compute EDPM services version updated (it may take some time):

    +
    +
    +
      oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot -p$PODIFIED_DB_ROOT_PASSWORD \
    +      -e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';"
    +
    +
    +
    +

    The above query should return an empty result as a completion criterion.

    +
    +
  • +
  • +

    Remove pre-FFU workarounds for Nova control plane services:

    +
    +
    +
      oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '
    +  spec:
    +    nova:
    +      template:
    +        cellTemplates:
    +          cell0:
    +            conductorServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +          cell1:
    +            metadataServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +            conductorServiceTemplate:
    +              customServiceConfig: |
    +                [workarounds]
    +                disable_compute_service_check_for_ffu=false
    +        apiServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +        metadataServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +        schedulerServiceTemplate:
    +          customServiceConfig: |
    +            [workarounds]
    +            disable_compute_service_check_for_ffu=false
    +  '
    +
    +
    +
  • +
  • +

    Wait for Nova control plane services' CRs to become ready:

    +
    +
    +
      oc wait --for condition=Ready --timeout=300s Nova/nova
    +
    +
    +
  • +
  • +

    Remove pre-FFU workarounds for Nova compute EDPM services:

    +
    +
    +
      oc apply -f - <<EOF
    +  apiVersion: v1
    +  kind: ConfigMap
    +  metadata:
    +    name: nova-compute-ffu
    +    namespace: openstack
    +  data:
    +    20-nova-compute-cell1-ffu-cleanup.conf: |
    +      [workarounds]
    +      disable_compute_service_check_for_ffu=false
    +  ---
    +  apiVersion: dataplane.openstack.org/v1beta1
    +  kind: OpenStackDataPlaneService
    +  metadata:
    +    name: nova-compute-ffu
    +    namespace: openstack
    +  spec:
    +    label: nova.compute.ffu
    +    configMaps:
    +      - nova-compute-ffu
    +    secrets:
    +      - nova-cell1-compute-config
    +      - nova-migration-ssh-key
    +    playbook: osp.edpm.nova
    +  ---
    +  apiVersion: dataplane.openstack.org/v1beta1
    +  kind: OpenStackDataPlaneDeployment
    +  metadata:
    +    name: openstack-nova-compute-ffu
    +    namespace: openstack
    +  spec:
    +    nodeSets:
    +      - openstack
    +    servicesOverride:
    +      - nova-compute-ffu
    +  EOF
    +
    +
    +
  • +
  • +

    Wait for Nova compute EDPM service to become ready:

    +
    +
    +
      oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m
    +
    +
    +
  • +
  • +

    Run Nova DB online migrations to complete FFU:

    +
    +
    +
      oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations
    +  oc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations
    +
    +
    +
  • +
  • +

    Verify if Nova services can stop the existing test VM instance:

    +
    +
    +
    ${BASH_ALIASES[openstack]} server list | grep -qF '| test | ACTIVE |' && openstack server stop test
    +${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |'
    +${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test | grep "it is in power state shutdown" || echo PASS
    +
    +
    +
  • +
  • +

    Verify if Nova services can start the existing test VM instance:

    +
    +
    +
    ${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |' && openstack server start test
    +${BASH_ALIASES[openstack]} server list | grep -F '| test | ACTIVE |'
    +${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test --fit-width -f json | jq -r '.state' | grep running
    +
    +
    +
  • +
+
+
+
+
+

Troubleshooting adoption

+
+

This document contains information about various issues you might face +and how to solve them.

+
+
+

ErrImagePull due to missing authentication

+
+

The deployed containers pull the images from private containers registries that +can potentially return authentication errors like:

+
+
+
+
Failed to pull image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0":
+rpc error: code = Unknown desc = unable to retrieve auth token: invalid
+username/password: unauthorized: Please login to the Red Hat Registry using
+your Customer Portal credentials.
+
+
+
+

An example of a failed pod:

+
+
+
+
  Normal   Scheduled       3m40s                  default-scheduler  Successfully assigned openstack/rabbitmq-server-0 to worker0
+  Normal   AddedInterface  3m38s                  multus             Add eth0 [10.101.0.41/23] from ovn-kubernetes
+  Warning  Failed          2m16s (x6 over 3m38s)  kubelet            Error: ImagePullBackOff
+  Normal   Pulling         2m5s (x4 over 3m38s)   kubelet            Pulling image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0"
+  Warning  Failed          2m5s (x4 over 3m38s)   kubelet            Failed to pull image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0": rpc error: code  ... can be found here: https://access.redhat.com/RegistryAuthentication
+  Warning  Failed          2m5s (x4 over 3m38s)   kubelet            Error: ErrImagePull
+  Normal   BackOff         110s (x7 over 3m38s)   kubelet            Back-off pulling image "registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0"
+
+
+
+

To solve this issue you need a valid pull-secret from the official Red +Hat console site, +store this pull secret locally in a machine with access to the Kubernetes API +(service node), and then run:

+
+
+
+
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull_secret_location.json>
+
+
+
+

The previous command will make available the authentication information in all +the cluster’s compute nodes, then trigger a new pod deployment to pull the +container image with:

+
+
+
+
kubectl delete pod rabbitmq-server-0 -n openstack
+
+
+
+

And the pod should be able to pull the image successfully. For more +information about what container registries requires what type of +authentication, check the official +docs.

+
+
+
+
+
+
+

Ceph migration

+
+
+

Migrating Ceph RBD

+
+

In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated +Storage nodes, the daemons living in the OpenStack control plane should be +moved/migrated into the existing external RHEL nodes (typically the compute +nodes for an HCI environment or dedicated storage nodes in all the remaining +use cases).

+
+
+

Requirements

+
+
    +
  • +

    Ceph is >= 5 and managed by cephadm/orchestrator.

    +
  • +
  • +

    Ceph NFS (ganesha) migrated from a TripleO based deployment to cephadm.

    +
  • +
  • +

    Both the Ceph public and cluster networks are propagated, via TripleO, to the target nodes.

    +
  • +
  • +

    Ceph Mons need to keep their IPs (to avoid cold migration).

    +
  • +
+
+
+
+

Scenario 1: Migrate mon and mgr from controller nodes

+
+

The goal of the first POC is to prove that you are able to successfully drain a +controller node, in terms of ceph daemons, and move them to a different node. +The initial target of the POC is RBD only, which means you are going to move only +mon and mgr daemons. For the purposes of this POC, you will deploy a ceph cluster +with only mon, mgrs, and osds to simulate the environment a customer will be in +before starting the migration. +The goal of the first POC is to ensure that:

+
+
+
    +
  • +

    You can keep the mon IP addresses moving them to the Ceph Storage nodes.

    +
  • +
  • +

    You can drain the existing controller nodes and shut them down.

    +
  • +
  • +

    You can deploy additional monitors to the existing nodes, promoting them as +_admin nodes that can be used by administrators to manage the Ceph cluster +and perform day2 operations against it.

    +
  • +
  • +

    You can keep the cluster operational during the migration.

    +
  • +
+
+
+
Prerequisites
+
+

The Storage Nodes should be configured to have both storage and storage_mgmt +network to make sure that you can use both Ceph public and cluster networks.

+
+
+

This step is the only one where the interaction with TripleO is required. From +17+ you do not have to run any stack update. However, there are commands that you +should perform to run os-net-config on the bare-metal node and configure +additional networks.

+
+
+

Make sure the network is defined in metalsmith.yaml for the CephStorageNodes:

+
+
+
+
  - name: CephStorage
+    count: 2
+    instances:
+      - hostname: oc0-ceph-0
+        name: oc0-ceph-0
+      - hostname: oc0-ceph-1
+        name: oc0-ceph-1
+    defaults:
+      networks:
+        - network: ctlplane
+          vif: true
+        - network: storage_cloud_0
+            subnet: storage_cloud_0_subnet
+        - network: storage_mgmt_cloud_0
+            subnet: storage_mgmt_cloud_0_subnet
+      network_config:
+        template: templates/single_nic_vlans/single_nic_vlans_storage.j2
+
+
+
+

Then run:

+
+
+
+
openstack overcloud node provision \
+  -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \
+  --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam
+
+
+
+

Verify that the storage network is running on the node:

+
+
+
+
(undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a
+Warning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts.
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+
+
+
+
+
Migrate mon(s) and mgr(s) on the two existing CephStorage nodes
+
+

Create a ceph spec based on the default roles with the mon/mgr on the +controller nodes.

+
+
+
+
openstack overcloud ceph spec -o ceph_spec.yaml -y  \
+   --stack overcloud-0     overcloud-baremetal-deployed-0.yaml
+
+
+
+

Deploy the Ceph cluster:

+
+
+
+
 openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \
+    --stack overcloud-0 -o deployed_ceph.yaml \
+    --network-data ~/oc0-network-data.yaml \
+    --ceph-spec ~/ceph_spec.yaml
+
+
+
+

Note:

+
+
+

The ceph_spec.yaml, which is the OSP-generated description of the ceph cluster, +will be used, later in the process, as the basic template required by cephadm +to update the status/info of the daemons.

+
+
+

Check the status of the cluster:

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_OK
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m)
+    mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk
+    osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host ls
+HOST              ADDR           LABELS          STATUS
+oc0-ceph-0        192.168.24.14  osd
+oc0-ceph-1        192.168.24.7   osd
+oc0-controller-0  192.168.24.15  _admin mgr mon
+oc0-controller-1  192.168.24.23  _admin mgr mon
+oc0-controller-2  192.168.24.13  _admin mgr mon
+
+
+
+

The goal of the next section is to migrate the oc0-controller-{1,2} daemons +into oc0-ceph-{0,1} as the very basic scenario that demonstrates that you can +actually make this kind of migration using cephadm.

+
+
+
+
Migrate oc0-controller-1 into oc0-ceph-0
+
+

ssh into controller-0, then

+
+
+
+
cephadm shell -v /home/ceph-admin/specs:/specs
+
+
+
+

ssh into ceph-0, then

+
+
+
+
sudo “watch podman ps”  # watch the new mon/mgr being deployed here
+
+
+
+

(optional) if mgr is active in the source node, then:

+
+
+
+
ceph mgr fail <mgr instance>
+
+
+
+

From the cephadm shell, remove the labels on oc0-controller-1

+
+
+
+
    for label in mon mgr _admin; do
+           ceph orch host rm label oc0-controller-1 $label;
+    done
+
+
+
+

Add the missing labels to oc0-ceph-0

+
+
+
+
[ceph: root@oc0-controller-0 /]#
+> for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done
+Added label mon to host oc0-ceph-0
+Added label mgr to host oc0-ceph-0
+Added label _admin to host oc0-ceph-0
+
+
+
+

Drain and force-remove the oc0-controller-1 node

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1
+Scheduled to remove the following daemons from host 'oc0-controller-1'
+type                 id
+-------------------- ---------------
+mon                  oc0-controller-1
+mgr                  oc0-controller-1.mtxohd
+crash                oc0-controller-1
+
+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force
+Removed  host 'oc0-controller-1'
+
+[ceph: root@oc0-controller-0 /]# ceph orch host ls
+HOST              ADDR           LABELS          STATUS
+oc0-ceph-0        192.168.24.14  osd
+oc0-ceph-1        192.168.24.7   osd
+oc0-controller-0  192.168.24.15  mgr mon _admin
+oc0-controller-2  192.168.24.13  _admin mgr mon
+
+
+
+

If you have only 3 mon nodes, and the drain of the node doesn’t work as +expected (the containers are still there), then SSH to controller-1 and +force-purge the containers in the node:

+
+
+
+
[root@oc0-controller-1 ~]# sudo podman ps
+CONTAINER ID  IMAGE                                                                                        COMMAND               CREATED         STATUS             PORTS       NAMES
+5c1ad36472bc  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1
+3b14cc7bf4dd  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mgr.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd
+
+[root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force
+
+[root@oc0-controller-1 ~]# sudo podman ps
+CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
+
+
+
+ + + + + +
+
Note
+
+Cephadm rm-cluster on a node that is not part of the cluster anymore has the +effect of removing all the containers and doing some cleanup on the filesystem. +
+
+
+

Before shutting the oc0-controller-1 down, move the IP address (on the same +network) to the oc0-ceph-0 node:

+
+
+
+
mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0]
+
+[root@oc0-controller-1 ~]# ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-ex    inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\       valid_lft forever preferred_lft forever
+6: vlan100    inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\       valid_lft forever preferred_lft forever
+7: vlan12    inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+8: vlan11    inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+9: vlan13    inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\       valid_lft forever preferred_lft forever
+10: vlan70    inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\       valid_lft forever preferred_lft forever
+11: vlan1    inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+12: vlan14    inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\       valid_lft forever preferred_lft forever
+
+
+
+

On the oc0-ceph-0:

+
+
+
+
[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+[heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11
+[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
+1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
+5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
+6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
+7: vlan11    inet 172.16.11.121/32 scope global vlan11\       valid_lft forever preferred_lft forever
+8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
+
+
+
+

Poweroff oc0-controller-1.

+
+
+

Add the new mon on oc0-ceph-0 using the old IP address:

+
+
+
+
[ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121
+Deployed mon.oc0-ceph-0 on host 'oc0-ceph-0'
+
+
+
+

Check the new container in the oc0-ceph-0 node:

+
+
+
+
b581dc8bbb78  quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-ceph-0...  24 seconds ago  Up 24 seconds ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0
+
+
+
+

On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec +removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:

+
+
+
+
cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file
+
+[ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml
+
+--- ceph_spec.yaml.bkp  2022-07-29 15:41:34.516329643 +0000
++++ ceph_spec.yaml      2022-07-29 15:28:26.455329643 +0000
+@@ -7,14 +7,6 @@
+ - mgr
+ service_type: host
+ ---
+-addr: 192.168.24.12
+-hostname: oc0-controller-1
+-labels:
+-- _admin
+-- mon
+-- mgr
+-service_type: host
+ ----
+ addr: 192.168.24.19
+ hostname: oc0-controller-2
+ labels:
+@@ -38,7 +30,7 @@
+ placement:
+   hosts:
+   - oc0-controller-0
+-  - oc0-controller-1
++  - oc0-ceph-0
+   - oc0-controller-2
+ service_id: mon
+ service_name: mon
+@@ -47,8 +39,8 @@
+ placement:
+   hosts:
+   - oc0-controller-0
+-  - oc0-controller-1
+   - oc0-controller-2
++  - oc0-ceph-0
+ service_id: mgr
+ service_name: mgr
+ service_type: mgr
+
+
+
+

Apply the resulting spec:

+
+
+
+
ceph orch apply -i ceph_spec.yaml
+
+ The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm
+
+[ceph: root@oc0-controller-0 specs]# ceph orch ls
+NAME                     PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
+crash                               4/4  5m ago     61m  *
+mgr                                 3/3  5m ago     69s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
+mon                                 3/3  5m ago     70s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
+osd.default_drive_group               8  2m ago     69s  oc0-ceph-0;oc0-ceph-1
+
+[ceph: root@oc0-controller-0 specs]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_WARN
+            1 stray host(s) with 1 daemon(s) not managed by cephadm
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m)
+    mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb
+    osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+

Fix the warning by refreshing the mgr:

+
+
+
+
ceph mgr fail oc0-controller-0.xzgtvo
+
+
+
+

And at this point the cluster is clean:

+
+
+
+
[ceph: root@oc0-controller-0 specs]# ceph -s
+  cluster:
+    id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
+    health: HEALTH_OK
+
+  services:
+    mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m)
+    mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb
+    osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs
+
+  data:
+    pools:   1 pools, 1 pgs
+    objects: 0 objects, 0 B
+    usage:   43 MiB used, 400 GiB / 400 GiB avail
+    pgs:     1 active+clean
+
+
+
+

oc0-controller-1 has been removed and powered off without leaving traces on the ceph cluster.

+
+
+

The same approach and the same steps can be applied to migrate oc0-controller-2 to oc0-ceph-1.

+
+
+
+
Screen Recording:
+ +
+
+
+

Useful resources

+ +
+
+
+

Migrating Ceph RGW

+
+

In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated +Storage nodes, the RGW daemons living in the OpenStack Controller nodes will be +migrated into the existing external RHEL nodes (typically the Compute nodes +for an HCI environment or CephStorage nodes in the remaining use cases).

+
+
+

Requirements

+
+
    +
  • +

    Ceph is >= 5 and managed by cephadm/orchestrator

    +
  • +
  • +

    An undercloud is still available: nodes and networks are managed by TripleO

    +
  • +
+
+
+
+

Ceph Daemon Cardinality

+
+

Ceph 5+ applies strict constraints in the way daemons can be colocated +within the same node. The resulting topology depends on the available hardware, +as well as the amount of Ceph services present in the Controller nodes which are +going to be retired. The following document describes the procedure required +to migrate the RGW component (and keep an HA model using the Ceph Ingress +daemon in a common TripleO scenario where Controller nodes represent the +spec placement where the service is deployed. As a general rule, the +number of services that can be migrated depends on the number of available +nodes in the cluster. The following diagrams cover the distribution of the Ceph +daemons on the CephStorage nodes where at least three nodes are required in a +scenario that sees only RGW and RBD (no dashboard):

+
+
+
+
|    |                     |             |
+|----|---------------------|-------------|
+| osd | mon/mgr/crash      | rgw/ingress |
+| osd | mon/mgr/crash      | rgw/ingress |
+| osd | mon/mgr/crash      | rgw/ingress |
+
+
+
+

With dashboard, and without Manila at least four nodes are required (dashboard +has no failover):

+
+
+
+
|     |                     |             |
+|-----|---------------------|-------------|
+| osd | mon/mgr/crash | rgw/ingress       |
+| osd | mon/mgr/crash | rgw/ingress       |
+| osd | mon/mgr/crash | dashboard/grafana |
+| osd | rgw/ingress   | (free)            |
+
+
+
+

With dashboard and Manila 5 nodes minimum are required (and dashboard has no +failover):

+
+
+
+
|     |                     |                         |
+|-----|---------------------|-------------------------|
+| osd | mon/mgr/crash       | rgw/ingress             |
+| osd | mon/mgr/crash       | rgw/ingress             |
+| osd | mon/mgr/crash       | mds/ganesha/ingress     |
+| osd | rgw/ingress         | mds/ganesha/ingress     |
+| osd | mds/ganesha/ingress | dashboard/grafana       |
+
+
+
+
+

Current Status

+
+
+
(undercloud) [stack@undercloud-0 ~]$ metalsmith list
+
+
+    +------------------------+    +----------------+
+    | IP Addresses           |    |  Hostname      |
+    +------------------------+    +----------------+
+    | ctlplane=192.168.24.25 |    | cephstorage-0  |
+    | ctlplane=192.168.24.10 |    | cephstorage-1  |
+    | ctlplane=192.168.24.32 |    | cephstorage-2  |
+    | ctlplane=192.168.24.28 |    | compute-0      |
+    | ctlplane=192.168.24.26 |    | compute-1      |
+    | ctlplane=192.168.24.43 |    | controller-0   |
+    | ctlplane=192.168.24.7  |    | controller-1   |
+    | ctlplane=192.168.24.41 |    | controller-2   |
+    +------------------------+    +----------------+
+
+
+
+

SSH into controller-0 and check the pacemaker status. This will help you +identify the relevant information that you need before you start the +RGW migration.

+
+
+
+
Full List of Resources:
+  * ip-192.168.24.46	(ocf:heartbeat:IPaddr2):     	Started controller-0
+  * ip-10.0.0.103   	(ocf:heartbeat:IPaddr2):     	Started controller-1
+  * ip-172.17.1.129 	(ocf:heartbeat:IPaddr2):     	Started controller-2
+  * ip-172.17.3.68  	(ocf:heartbeat:IPaddr2):     	Started controller-0
+  * ip-172.17.4.37  	(ocf:heartbeat:IPaddr2):     	Started controller-1
+  * Container bundle set: haproxy-bundle
+
+[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:
+    * haproxy-bundle-podman-0   (ocf:heartbeat:podman):  Started controller-2
+    * haproxy-bundle-podman-1   (ocf:heartbeat:podman):  Started controller-0
+    * haproxy-bundle-podman-2   (ocf:heartbeat:podman):  Started controller-1
+
+
+
+

Use the ip command to identify the ranges of the storage networks.

+
+
+
+
[heat-admin@controller-0 ~]$ ip -o -4 a
+
+1: lo	inet 127.0.0.1/8 scope host lo\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+7: br-ex	inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+8: vlan70	inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\   	valid_lft forever preferred_lft forever
+8: vlan70	inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\   	valid_lft forever preferred_lft forever
+9: vlan50	inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\   	valid_lft forever preferred_lft forever
+10: vlan30	inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+10: vlan30	inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+11: vlan20	inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\   	valid_lft forever preferred_lft forever
+12: vlan40	inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\   	valid_lft forever preferred_lft forever
+
+
+
+

In this example:

+
+
+
    +
  • +

    vlan30 represents the Storage Network, where the new RGW instances should be +started on the CephStorage nodes

    +
  • +
  • +

    br-ex represents the External Network, which is where in the current +environment, haproxy has the frontend VIP assigned

    +
  • +
+
+
+
+

Prerequisite: check the frontend network (Controller nodes)

+
+

Identify the network that you previously had in haproxy and propagate it (via +TripleO) to the CephStorage nodes. This network is used to reserve a new VIP +that will be owned by Ceph and used as the entry point for the RGW service.

+
+
+

ssh into controller-0 and check the current HaProxy configuration until you +find ceph_rgw section:

+
+
+
+
$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
+
+...
+...
+listen ceph_rgw
+  bind 10.0.0.103:8080 transparent
+  bind 172.17.3.68:8080 transparent
+  mode http
+  balance leastconn
+  http-request set-header X-Forwarded-Proto https if { ssl_fc }
+  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
+  http-request set-header X-Forwarded-Port %[dst_port]
+  option httpchk GET /swift/healthcheck
+  option httplog
+  option forwardfor
+  server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2
+  server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2
+  server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2
+
+
+
+

Double check the network used as HaProxy frontend:

+
+
+
+
[controller-0]$ ip -o -4 a
+
+...
+7: br-ex	inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+...
+
+
+
+

As described in the previous section, the check on controller-0 shows that you +are exposing the services using the external network, which is not present in +the Ceph Storage nodes, and you need to propagate it via TripleO.

+
+
+
+

Propagate the HaProxy frontend network to CephStorage nodes

+
+

Change the NIC template used to define the ceph-storage network interfaces and +add the new config section.

+
+
+
+
---
+network_config:
+- type: interface
+  name: nic1
+  use_dhcp: false
+  dns_servers: {{ ctlplane_dns_nameservers }}
+  addresses:
+  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }}
+  routes: {{ ctlplane_host_routes }}
+- type: vlan
+  vlan_id: {{ storage_mgmt_vlan_id }}
+  device: nic1
+  addresses:
+  - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }}
+  routes: {{ storage_mgmt_host_routes }}
+- type: interface
+  name: nic2
+  use_dhcp: false
+  defroute: false
+- type: vlan
+  vlan_id: {{ storage_vlan_id }}
+  device: nic2
+  addresses:
+  - ip_netmask: {{ storage_ip }}/{{ storage_cidr }}
+  routes: {{ storage_host_routes }}
+- type: ovs_bridge
+  name: {{ neutron_physical_bridge_name }}
+  dns_servers: {{ ctlplane_dns_nameservers }}
+  domain: {{ dns_search_domains }}
+  use_dhcp: false
+  addresses:
+  - ip_netmask: {{ external_ip }}/{{ external_cidr }}
+  routes: {{ external_host_routes }}
+  members:
+  - type: interface
+    name: nic3
+    primary: true
+
+
+
+

In addition, add the External Network to the baremetal.yaml file used by +metalsmith and run the overcloud node provision command passing the +--network-config option:

+
+
+
+
- name: CephStorage
+  count: 3
+  hostname_format: cephstorage-%index%
+  instances:
+  - hostname: cephstorage-0
+  name: ceph-0
+  - hostname: cephstorage-1
+  name: ceph-1
+  - hostname: cephstorage-2
+  name: ceph-2
+  defaults:
+  profile: ceph-storage
+  network_config:
+      template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2
+  networks:
+  - network: ctlplane
+      vif: true
+  - network: storage
+  - network: storage_mgmt
+  - network: external
+
+
+
+
+
(undercloud) [stack@undercloud-0]$
+
+openstack overcloud node provision
+   -o overcloud-baremetal-deployed-0.yaml
+   --stack overcloud
+   --network-config -y
+  $PWD/network/baremetal_deployment.yaml
+
+
+
+

Check the new network on the CephStorage nodes:

+
+
+
+
[root@cephstorage-0 ~]# ip -o -4 a
+
+1: lo	inet 127.0.0.1/8 scope host lo\   	valid_lft forever preferred_lft forever
+2: enp1s0	inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\   	valid_lft forever preferred_lft forever
+11: vlan40	inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\   	valid_lft forever preferred_lft forever
+12: vlan30	inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\   	valid_lft forever preferred_lft forever
+14: br-ex	inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\   	valid_lft forever preferred_lft forever
+
+
+
+

And now it’s time to start migrating the RGW backends and build the ingress on +top of them.

+
+
+
+

Migrate the RGW backends

+
+

To match the cardinality diagram, you use cephadm labels to refer to a group of +nodes where a given daemon type should be deployed.

+
+
+

Add the RGW label to the cephstorage nodes:

+
+
+
+
for i in 0 1 2; {
+    ceph orch host label add cephstorage-$i rgw;
+}
+
+
+
+
+
[ceph: root@controller-0 /]#
+
+for i in 0 1 2; {
+    ceph orch host label add cephstorage-$i rgw;
+}
+
+Added label rgw to host cephstorage-0
+Added label rgw to host cephstorage-1
+Added label rgw to host cephstorage-2
+
+[ceph: root@controller-0 /]# ceph orch host ls
+
+HOST       	ADDR       	LABELS      	STATUS
+cephstorage-0  192.168.24.54  osd rgw
+cephstorage-1  192.168.24.44  osd rgw
+cephstorage-2  192.168.24.30  osd rgw
+controller-0   192.168.24.45  _admin mon mgr
+controller-1   192.168.24.11  _admin mon mgr
+controller-2   192.168.24.38  _admin mon mgr
+
+6 hosts in cluster
+
+
+
+

During the overcloud deployment, RGW is applied at step2 +(external_deployment_steps), and a cephadm compatible spec is generated in +/home/ceph-admin/specs/rgw from the ceph_mkspec ansible module. +Find and patch the RGW spec, specifying the right placement using the labels +approach, and change the rgw backend port to 8090 to avoid conflicts +with the Ceph Ingress Daemon (*)

+
+
+
+
[root@controller-0 heat-admin]# cat rgw
+
+networks:
+- 172.17.3.0/24
+placement:
+  hosts:
+  - controller-0
+  - controller-1
+  - controller-2
+service_id: rgw
+service_name: rgw.rgw
+service_type: rgw
+spec:
+  rgw_frontend_port: 8080
+  rgw_realm: default
+  rgw_zone: default
+
+
+
+

Patch the spec replacing controller nodes with the label key

+
+
+
+
---
+networks:
+- 172.17.3.0/24
+placement:
+  label: rgw
+service_id: rgw
+service_name: rgw.rgw
+service_type: rgw
+spec:
+  rgw_frontend_port: 8090
+  rgw_realm: default
+  rgw_zone: default
+
+
+ +
+

Apply the new RGW spec using the orchestrator CLI:

+
+
+
+
$ cephadm shell -m /home/ceph-admin/specs/rgw
+$ cephadm shell -- ceph orch apply -i /mnt/rgw
+
+
+
+

Which triggers the redeploy:

+
+
+
+
...
+osd.9                     	cephstorage-2
+rgw.rgw.cephstorage-0.wsjlgx  cephstorage-0  172.17.3.23:8090   starting
+rgw.rgw.cephstorage-1.qynkan  cephstorage-1  172.17.3.26:8090   starting
+rgw.rgw.cephstorage-2.krycit  cephstorage-2  172.17.3.81:8090   starting
+rgw.rgw.controller-1.eyvrzw   controller-1   172.17.3.146:8080  running (5h)
+rgw.rgw.controller-2.navbxa   controller-2   172.17.3.66:8080   running (5h)
+
+...
+osd.9                     	cephstorage-2
+rgw.rgw.cephstorage-0.wsjlgx  cephstorage-0  172.17.3.23:8090  running (19s)
+rgw.rgw.cephstorage-1.qynkan  cephstorage-1  172.17.3.26:8090  running (16s)
+rgw.rgw.cephstorage-2.krycit  cephstorage-2  172.17.3.81:8090  running (13s)
+
+
+
+

At this point, you need to make sure that the new RGW backends are reachable on +the new ports, but you are going to enable an IngressDaemon on port 8080 +later in the process. For this reason, ssh on each RGW node (the CephStorage +nodes) and add the iptables rule to allow connections to both 8080 and 8090 +ports in the CephStorage nodes.

+
+
+
+
iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment "ceph rgw ingress" -j ACCEPT
+
+iptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment "ceph rgw backends" -j ACCEPT
+
+for port in 8080 8090; {
+    for i in 25 10 32; {
+       ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \
+       -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \
+       -j ACCEPT;
+   }
+}
+
+
+
+

From a Controller node (e.g. controller-0) try to reach (curl) the rgw backends:

+
+
+
+
for i in 26 23 81; do {
+    echo "---"
+    curl 172.17.3.$i:8090;
+    echo "---"
+    echo
+done
+
+
+
+

And you should observe the following:

+
+
+
+
---
+Query 172.17.3.23
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+---
+Query 172.17.3.26
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+---
+Query 172.17.3.81
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
+---
+
+
+
+
NOTE
+
+

In case RGW backends are migrated in the CephStorage nodes, there’s no +“internalAPI” network(this is not true in the case of HCI). Reconfig the RGW +keystone endpoint, pointing to the external Network that has been propagated +(see the previous section)

+
+
+
+
[ceph: root@controller-0 /]# ceph config dump | grep keystone
+global   basic rgw_keystone_url  http://172.16.1.111:5000
+
+[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000
+
+
+
+
+
+

Deploy a Ceph IngressDaemon

+
+

HaProxy is managed by TripleO via Pacemaker: the three running instances at +this point will point to the old RGW backends, resulting in a wrong, not +working configuration. +Since you are going to deploy the Ceph Ingress Daemon, the first thing to do +is remove the existing ceph_rgw config, clean up the config created by TripleO +and restart the service to make sure other services are not affected by this +change.

+
+
+

ssh on each Controller node and remove the following is the section from +/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg:

+
+
+
+
listen ceph_rgw
+  bind 10.0.0.103:8080 transparent
+  mode http
+  balance leastconn
+  http-request set-header X-Forwarded-Proto https if { ssl_fc }
+  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
+  http-request set-header X-Forwarded-Port %[dst_port]
+  option httpchk GET /swift/healthcheck
+  option httplog
+  option forwardfor
+   server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2
+  server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2
+  server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2
+
+
+
+

Restart haproxy-bundle and make sure it’s started:

+
+
+
+
[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle
+haproxy-bundle successfully restarted
+
+
+[root@controller-0 ~]# sudo pcs status | grep haproxy
+
+  * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:
+    * haproxy-bundle-podman-0   (ocf:heartbeat:podman):  Started controller-0
+    * haproxy-bundle-podman-1   (ocf:heartbeat:podman):  Started controller-1
+    * haproxy-bundle-podman-2   (ocf:heartbeat:podman):  Started controller-2
+
+
+
+

Double check no process is bound to 8080 anymore`"

+
+
+
+
[root@controller-0 ~]# ss -antop | grep 8080
+[root@controller-0 ~]#
+
+
+
+

And the swift CLI should fail at this point:

+
+
+
+
(overcloud) [root@cephstorage-0 ~]# swift list
+
+HTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc41beb0430>: Failed to establish a new connection: [Errno 111] Connection refused'))
+
+
+
+

You can start deploying the Ceph IngressDaemon on the CephStorage nodes.

+
+
+

Set the required images for both HaProxy and Keepalived

+
+
+
+
[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3
+
+[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5
+
+
+
+

Prepare the ingress spec and mount it to cephadm:

+
+
+
+
$ sudo vim /home/ceph-admin/specs/rgw_ingress
+
+
+
+

and paste the following content:

+
+
+
+
---
+service_type: ingress
+service_id: rgw.rgw
+placement:
+  label: rgw
+spec:
+  backend_service: rgw.rgw
+  virtual_ip: 10.0.0.89/24
+  frontend_port: 8080
+  monitor_port: 8898
+  virtual_interface_networks:
+    - 10.0.0.0/24
+
+
+
+

Mount the generated spec and apply it using the orchestrator CLI:

+
+
+
+
$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress
+$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress
+
+
+
+

Wait until the ingress is deployed and query the resulting endpoint:

+
+
+
+
[ceph: root@controller-0 /]# ceph orch ls
+
+NAME                 	PORTS            	RUNNING  REFRESHED  AGE  PLACEMENT
+crash                                         	6/6  6m ago 	3d   *
+ingress.rgw.rgw      	10.0.0.89:8080,8898  	6/6  37s ago	60s  label:rgw
+mds.mds                   3/3  6m ago 	3d   controller-0;controller-1;controller-2
+mgr                       3/3  6m ago 	3d   controller-0;controller-1;controller-2
+mon                       3/3  6m ago 	3d   controller-0;controller-1;controller-2
+osd.default_drive_group   15  37s ago	3d   cephstorage-0;cephstorage-1;cephstorage-2
+rgw.rgw   ?:8090          3/3  37s ago	4m   label:rgw
+
+
+
+
+
[ceph: root@controller-0 /]# curl  10.0.0.89:8080
+
+---
+<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>[ceph: root@controller-0 /]#
+—
+
+
+
+

The result above shows that you are able to reach the backend from the +IngressDaemon, which means you are almost ready to interact with it using the +swift CLI.

+
+
+
+

Update the object-store endpoints

+
+

The endpoints still point to the old VIP owned by pacemaker, but because it is +still used by other services and you reserved a new VIP on the same network, +before any other action you should update the object-store endpoint.

+
+
+

List the current endpoints:

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object
+
+| 1326241fb6b6494282a86768311f48d1 | regionOne | swift    	| object-store   | True	| internal  | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |
+| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift    	| object-store   | True	| public	| http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s  |
+| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift    	| object-store   | True	| admin 	| http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |
+
+
+
+

Update the endpoints pointing to the Ingress VIP:

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url "http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s" 95596a2d92c74c15b83325a11a4f07a3
+
+(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store
+| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift    	| object-store   | True	| internal  | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |
+| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift    	| object-store   | True	| public	| http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s   |
+| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift    	| object-store   | True	| admin 	| http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |
+
+
+
+

And repeat the same action for both internal and admin. +Test the migrated service.

+
+
+
+
(overcloud) [stack@undercloud-0 ~]$ swift list --debug
+
+DEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint
+DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens
+DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000
+DEBUG:urllib3.connectionpool:http://10.0.0.115:5000 "POST /v3/auth/tokens HTTP/1.1" 201 7795
+DEBUG:keystoneclient.auth.identity.v3.base:{"token": {"methods": ["password"], "user": {"domain": {"id": "default", "name": "Default"}, "id": "6f87c7ffdddf463bbc633980cfd02bb3", "name": "admin", "password_expires_at": null},
+
+
+...
+...
+...
+
+DEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H "X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos" -H "Accept-Encoding: gzip"
+DEBUG:swiftclient:RESP STATUS: 200 OK
+DEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'}
+DEBUG:swiftclient:RESP BODY: b'[]'
+
+
+
+

Run tempest tests against object-storage:

+
+
+
+
(overcloud) [stack@undercloud-0 tempest-dir]$  tempest run --regex tempest.api.object_storage
+...
+...
+...
+======
+Totals
+======
+Ran: 141 tests in 606.5579 sec.
+ - Passed: 128
+ - Skipped: 13
+ - Expected Fail: 0
+ - Unexpected Success: 0
+ - Failed: 0
+Sum of execute time for each test: 657.5183 sec.
+
+==============
+Worker Balance
+==============
+ - Worker 0 (1 tests) => 0:10:03.400561
+ - Worker 1 (2 tests) => 0:00:24.531916
+ - Worker 2 (4 tests) => 0:00:10.249889
+ - Worker 3 (30 tests) => 0:00:32.730095
+ - Worker 4 (51 tests) => 0:00:26.246044
+ - Worker 5 (6 tests) => 0:00:20.114803
+ - Worker 6 (20 tests) => 0:00:16.290323
+ - Worker 7 (27 tests) => 0:00:17.103827
+
+
+
+
+

Additional Resources

+
+

A screen recording is available.

+
+
+
+
+
+
+ + + + + + + \ No newline at end of file