diff --git a/openstack/edpm_adoption/index.html b/openstack/edpm_adoption/index.html index 06caac559..ea047e107 100644 --- a/openstack/edpm_adoption/index.html +++ b/openstack/edpm_adoption/index.html @@ -728,6 +728,15 @@ + + +
(There are no shell variables necessary currently.)
+Define the shell variables used in the Fast-forward upgrade steps below. +The values are just illustrative, use values that are correct for your environment:
+PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)
+
oc wait --for condition=Ready osdpns/openstack --timeout=30m
+Nova services rolling upgrade cannot be done during adoption,
+there is in a lock-step with Nova control plane services, because those
+are managed independently by EDPM ansible, and Kubernetes operators.
+Nova service operator and OpenStack Dataplane operator ensure upgrading
+is done independently of each other, by configuring
+[upgrade_levels]compute=auto
for Nova services. Nova control plane
+services apply the change right after CR is patched. Nova compute EDPM
+services will catch up the same config change with ansible deployment
+later on.
++NOTE: Additional orchestration happening around the FFU workarounds +configuration for Nova compute EDPM service is a subject of future changes.
+
Wait for cell1 Nova compute EDPM services version updated (it may take some time):
+oc exec -it mariadb-openstack-cell1 -- mysql --user=root --password=${PODIFIED_DB_ROOT_PASSWORD} \
+ -e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';"
+
Remove pre-FFU workarounds for Nova control plane services:
+oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '
+spec:
+ nova:
+ template:
+ cellTemplates:
+ cell0:
+ conductorServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+ cell1:
+ metadataServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+ conductorServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+ apiServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+ metadataServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+ schedulerServiceTemplate:
+ customServiceConfig: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+'
+
Wait for Nova control plane services' CRs to become ready:
+oc wait --for condition=Ready --timeout=300s Nova/nova
+
Remove pre-FFU workarounds for Nova compute EDPM services:
+oc apply -f - <<EOF
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: nova-compute-ffu
+ namespace: openstack
+data:
+ 20-nova-compute-cell1-ffu-cleanup.conf: |
+ [workarounds]
+ disable_compute_service_check_for_ffu=false
+---
+apiVersion: dataplane.openstack.org/v1beta1
+kind: OpenStackDataPlaneService
+metadata:
+ name: nova-compute-ffu
+ namespace: openstack
+spec:
+ label: nova.compute.ffu
+ configMaps:
+ - nova-compute-ffu
+ secrets:
+ - nova-cell1-compute-config
+ - nova-migration-ssh-key
+ playbook: osp.edpm.nova
+---
+apiVersion: dataplane.openstack.org/v1beta1
+kind: OpenStackDataPlaneDeployment
+metadata:
+ name: openstack-nova-compute-ffu
+ namespace: openstack
+spec:
+ nodeSets:
+ - openstack
+ servicesOverride:
+ - nova-compute-ffu
+EOF
+
Wait for Nova compute EDPM service to become ready:
+oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m
+
Run Nova DB online migrations to complete FFU:
+oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations
+oc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations
+
This is a procedure for adopting an OpenStack cloud.
Perform the actions from the sub-documents in the following order:
Planning the new deployment
Deploy podified backend services
Pull Openstack configuration
Stop OpenStack services
Copy MariaDB data
OVN adoption
Keystone adoption
Neutron adoption
Ceph backend configuration (if applicable)
Glance adoption
Placement adoption
Nova adoption
Cinder adoption
Manila adoption
Horizon adoption
Dataplane adoption
Ironic adoption
If you face issues during adoption, check the Troubleshooting document for common problems and solutions.
"},{"location":"#post-openstack-ceph-adoption","title":"Post-OpenStack Ceph adoption","text":"If the environment includes Ceph and some of its services are collocated on the Controller hosts (\"internal Ceph\"), then Ceph services need to be moved out of Controller hosts as the last step of the OpenStack adoption. Follow this documentation:
For information about contributing to the docs and how to run tests, see:
Contributing to documentation - how to build docs locally, docs patterns and tips.
Development environment - how set up a local development environment where Adoption can be executed (either manually or via the test suite).
Tests - information about the test suite and how to run it.
In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated Storage nodes, the daemons living in the OpenStack control plane should be moved/migrated into the existing external RHEL nodes (typically the compute nodes for an HCI environment or dedicated storage nodes in all the remaining use cases).
"},{"location":"ceph/ceph_rbd/#requirements","title":"Requirements","text":"The goal of the first POC is to prove we are able to successfully drain a controller node, in terms of ceph daemons, and move them to a different node. The initial target of the POC is RBD only, which means we\u2019re going to move only mon and mgr daemons. For the purposes of this POC, we'll deploy a ceph cluster with only mon, mgrs, and osds to simulate the environment a customer will be in before starting the migration. The goal of the first POC is to ensure that: - We can keep the mon IP addresses moving them to the CephStorage nodes. - We can drain the existing controller nodes and shut them down. - We can deploy additional monitors to the existing nodes, promoting them as _admin nodes that can be used by administrators to manage the ceph cluster and perform day2 operations against it. - We can keep the cluster operational during the migration.
"},{"location":"ceph/ceph_rbd/#prerequisites","title":"Prerequisites","text":"The Storage Nodes should be configured to have both storage and storage_mgmt network to make sure we can use both Ceph public and cluster networks.
This step is the only one where the interaction with TripleO is required. From 17+ we don\u2019t have to run any stack update, however, we have commands that should be performed to run os-net-config on the bare-metal node and configure additional networks.
Make sure the network is defined in metalsmith.yaml for the CephStorageNodes:
- name: CephStorage\n count: 2\n instances:\n - hostname: oc0-ceph-0\n name: oc0-ceph-0\n - hostname: oc0-ceph-1\n name: oc0-ceph-1\n defaults:\n networks:\n - network: ctlplane\n vif: true\n - network: storage_cloud_0\n subnet: storage_cloud_0_subnet\n - network: storage_mgmt_cloud_0\n subnet: storage_mgmt_cloud_0_subnet\n network_config:\n template: templates/single_nic_vlans/single_nic_vlans_storage.j2\n
Then run:
openstack overcloud node provision \\\n -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \\\n --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam\n
Verify that the storage network is running on the node:
(undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a\nWarning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts.\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n
"},{"location":"ceph/ceph_rbd/#migrate-mons-and-mgrs-on-the-two-existing-cephstorage-nodes","title":"Migrate mon(s) and mgr(s) on the two existing CephStorage nodes","text":"Create a ceph spec based on the default roles with the mon/mgr on the controller nodes.
openstack overcloud ceph spec -o ceph_spec.yaml -y \\\n --stack overcloud-0 overcloud-baremetal-deployed-0.yaml\n
Deploy the Ceph cluster
openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \\\n --stack overcloud-0 -o deployed_ceph.yaml \\\n --network-data ~/oc0-network-data.yaml \\\n --ceph-spec ~/ceph_spec.yaml\n
Note:
The ceph_spec.yaml, which is the OSP-generated description of the ceph cluster, will be used, later in the process, as the basic template required by cephadm to update the status/info of the daemons.
Check the status of the cluster:
[ceph: root@oc0-controller-0 /]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_OK\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m)\n mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk\n osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
[ceph: root@oc0-controller-0 /]# ceph orch host ls\nHOST ADDR LABELS STATUS\noc0-ceph-0 192.168.24.14 osd\noc0-ceph-1 192.168.24.7 osd\noc0-controller-0 192.168.24.15 _admin mgr mon\noc0-controller-1 192.168.24.23 _admin mgr mon\noc0-controller-2 192.168.24.13 _admin mgr mon\n
The goal of the next section is to migrate the oc0-controller-{1,2} daemons into oc0-ceph-{0,1} as the very basic scenario that demonstrates we can actually make this kind of migration using cephadm.
"},{"location":"ceph/ceph_rbd/#migrate-oc0-controller-1-into-oc0-ceph-0","title":"Migrate oc0-controller-1 into oc0-ceph-0","text":"ssh into controller-0, then
cephadm shell -v /home/ceph-admin/specs:/specs
ssh into ceph-0, then
sudo \u201cwatch podman ps\u201d # watch the new mon/mgr being deployed here
(optional) if mgr is active in the source node, then:
ceph mgr fail <mgr instance>\n
From the cephadm shell, remove the labels on oc0-controller-1
for label in mon mgr _admin; do\n ceph orch host rm label oc0-controller-1 $label;\n done\n
Add the missing labels to oc0-ceph-0
[ceph: root@oc0-controller-0 /]#\n> for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done\nAdded label mon to host oc0-ceph-0\nAdded label mgr to host oc0-ceph-0\nAdded label _admin to host oc0-ceph-0\n
Drain and force-remove the oc0-controller-1 node
[ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1\nScheduled to remove the following daemons from host 'oc0-controller-1'\ntype id\n-------------------- ---------------\nmon oc0-controller-1\nmgr oc0-controller-1.mtxohd\ncrash oc0-controller-1\n
[ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force\nRemoved host 'oc0-controller-1'\n\n[ceph: root@oc0-controller-0 /]# ceph orch host ls\nHOST ADDR LABELS STATUS\noc0-ceph-0 192.168.24.14 osd\noc0-ceph-1 192.168.24.7 osd\noc0-controller-0 192.168.24.15 mgr mon _admin\noc0-controller-2 192.168.24.13 _admin mgr mon\n
If you have only 3 mon nodes, and the drain of the node doesn\u2019t work as expected (the containers are still there), then SSH to controller-1 and force-purge the containers in the node:
[root@oc0-controller-1 ~]# sudo podman ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n5c1ad36472bc quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1\n3b14cc7bf4dd quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mgr.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd\n\n[root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force\n\n[root@oc0-controller-1 ~]# sudo podman ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n
Note: cephadm rm-cluster on a node that is not part of the cluster anymore has the effect of removing all the containers and doing some cleanup on the filesystem.
Before shutting the oc0-controller-1 down, move the IP address (on the same network) to the oc0-ceph-0 node:
mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0]\n\n[root@oc0-controller-1 ~]# ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-ex inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n6: vlan100 inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\\ valid_lft forever preferred_lft forever\n7: vlan12 inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n8: vlan11 inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n9: vlan13 inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\\ valid_lft forever preferred_lft forever\n10: vlan70 inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n11: vlan1 inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n12: vlan14 inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\\ valid_lft forever preferred_lft forever\n
On the oc0-ceph-0:
[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n[heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11\n[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.121/32 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n
Poweroff oc0-controller-1.
Add the new mon on oc0-ceph-0 using the old IP address:
[ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121\nDeployed mon.oc0-ceph-0 on host 'oc0-ceph-0'\n
Check the new container in the oc0-ceph-0 node:
b581dc8bbb78 quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-ceph-0... 24 seconds ago Up 24 seconds ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0\n
On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:
cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file\n\n[ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml\n\n--- ceph_spec.yaml.bkp 2022-07-29 15:41:34.516329643 +0000\n+++ ceph_spec.yaml 2022-07-29 15:28:26.455329643 +0000\n@@ -7,14 +7,6 @@\n - mgr\n service_type: host\n ---\n-addr: 192.168.24.12\n-hostname: oc0-controller-1\n-labels:\n-- _admin\n-- mon\n-- mgr\n-service_type: host\n----\n addr: 192.168.24.19\n hostname: oc0-controller-2\n labels:\n@@ -38,7 +30,7 @@\n placement:\n hosts:\n - oc0-controller-0\n- - oc0-controller-1\n+ - oc0-ceph-0\n - oc0-controller-2\n service_id: mon\n service_name: mon\n@@ -47,8 +39,8 @@\n placement:\n hosts:\n - oc0-controller-0\n- - oc0-controller-1\n - oc0-controller-2\n+ - oc0-ceph-0\n service_id: mgr\n service_name: mgr\n service_type: mgr\n
Apply the resulting spec:
ceph orch apply -i ceph_spec.yaml \n\n The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm\n\n[ceph: root@oc0-controller-0 specs]# ceph orch ls\nNAME PORTS RUNNING REFRESHED AGE PLACEMENT\ncrash 4/4 5m ago 61m *\nmgr 3/3 5m ago 69s oc0-controller-0;oc0-ceph-0;oc0-controller-2\nmon 3/3 5m ago 70s oc0-controller-0;oc0-ceph-0;oc0-controller-2\nosd.default_drive_group 8 2m ago 69s oc0-ceph-0;oc0-ceph-1\n\n[ceph: root@oc0-controller-0 specs]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_WARN\n 1 stray host(s) with 1 daemon(s) not managed by cephadm\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m)\n mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb\n osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
Fix the warning by refreshing the mgr:
ceph mgr fail oc0-controller-0.xzgtvo\n
And at this point the cluster is clean:
[ceph: root@oc0-controller-0 specs]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_OK\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m)\n mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb\n osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
oc0-controller-1 has been removed and powered off without leaving traces on the ceph cluster.
The same approach and the same steps can be applied to migrate oc0-controller-2 to oc0-ceph-1.
"},{"location":"ceph/ceph_rbd/#screen-recording","title":"Screen Recording:","text":"In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated Storage nodes, the RGW daemons living in the OpenStack Controller nodes will be migrated into the existing external RHEL nodes (typically the Compute nodes for an HCI environment or CephStorage nodes in the remaining use cases).
"},{"location":"ceph/ceph_rgw/#requirements","title":"Requirements","text":"Ceph 5+ applies strict constraints in the way daemons can be colocated within the same node. The resulting topology depends on the available hardware, as well as the amount of Ceph services present in the Controller nodes which are going to be retired. The following document describes the procedure required to migrate the RGW component (and keep an HA model using the Ceph Ingress daemon in a common TripleO scenario where Controller nodes represent the spec placement where the service is deployed. As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the Ceph daemons on the CephStorage nodes where at least three nodes are required in a scenario that sees only RGW and RBD (no dashboard):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingressWith dashboard, and without Manila at least four nodes are required (dashboard has no failover):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash dashboard/grafana osd rgw/ingress (free)With dashboard and Manila 5 nodes minimum are required (and dashboard has no failover):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash mds/ganesha/ingress osd rgw/ingress mds/ganesha/ingress osd mds/ganesha/ingress dashboard/grafana"},{"location":"ceph/ceph_rgw/#current-status","title":"Current Status","text":"(undercloud) [stack@undercloud-0 ~]$ metalsmith list\n\n\n +------------------------+ +----------------+\n | IP Addresses | | Hostname |\n +------------------------+ +----------------+\n | ctlplane=192.168.24.25 | | cephstorage-0 |\n | ctlplane=192.168.24.10 | | cephstorage-1 |\n | ctlplane=192.168.24.32 | | cephstorage-2 |\n | ctlplane=192.168.24.28 | | compute-0 |\n | ctlplane=192.168.24.26 | | compute-1 |\n | ctlplane=192.168.24.43 | | controller-0 |\n | ctlplane=192.168.24.7 | | controller-1 |\n | ctlplane=192.168.24.41 | | controller-2 |\n +------------------------+ +----------------+\n
SSH into controller-0
and check the pacemaker
status: this will help identify the relevant information that we need to know before starting the RGW migration.
Full List of Resources:\n * ip-192.168.24.46 (ocf:heartbeat:IPaddr2): Started controller-0\n * ip-10.0.0.103 (ocf:heartbeat:IPaddr2): Started controller-1\n * ip-172.17.1.129 (ocf:heartbeat:IPaddr2): Started controller-2\n * ip-172.17.3.68 (ocf:heartbeat:IPaddr2): Started controller-0\n * ip-172.17.4.37 (ocf:heartbeat:IPaddr2): Started controller-1\n * Container bundle set: haproxy-bundle\n\n[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:\n * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-2\n * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-0\n * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-1\n
Use the ip
command to identify the ranges of the storage networks.
[heat-admin@controller-0 ~]$ ip -o -4 a\n\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n7: br-ex inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n8: vlan70 inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n8: vlan70 inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n9: vlan50 inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\\ valid_lft forever preferred_lft forever\n10: vlan30 inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n10: vlan30 inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n11: vlan20 inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\\ valid_lft forever preferred_lft forever\n12: vlan40 inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\\ valid_lft forever preferred_lft forever\n
In this example:
Identify the network that we previously had in haproxy and propagate it (via TripleO) to the CephStorage nodes. This network is used to reserve a new VIP that will be owned by Ceph and used as the entry point for the RGW service.
ssh into controller-0
and check the current HaProxy configuration until we find ceph_rgw
section:
$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg\n\n...\n...\nlisten ceph_rgw\n bind 10.0.0.103:8080 transparent\n bind 172.17.3.68:8080 transparent\n mode http\n balance leastconn\n http-request set-header X-Forwarded-Proto https if { ssl_fc }\n http-request set-header X-Forwarded-Proto http if !{ ssl_fc }\n http-request set-header X-Forwarded-Port %[dst_port]\n option httpchk GET /swift/healthcheck\n option httplog\n option forwardfor\n server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2\n server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2\n server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2\n
Double check the network used as HaProxy frontend:
[controller-0]$ ip -o -4 a\n\n...\n7: br-ex inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n...\n
As described in the previous section, the check on controller-0 shows that we are exposing the services using the external network, which is not present in the CephStorage nodes, and we need to propagate it via TripleO.
"},{"location":"ceph/ceph_rgw/#propagate-the-haproxy-frontend-network-to-cephstorage-nodes","title":"Propagate theHaProxy
frontend network to CephStorage
nodes","text":"Change the nic template used to define the ceph-storage network interfaces and add the new config section.
---\nnetwork_config:\n- type: interface\n name: nic1\n use_dhcp: false\n dns_servers: {{ ctlplane_dns_nameservers }}\n addresses:\n - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}\n routes: {{ ctlplane_host_routes }}\n- type: vlan\n vlan_id: {{ storage_mgmt_vlan_id }}\n device: nic1\n addresses:\n - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }}\n routes: {{ storage_mgmt_host_routes }}\n- type: interface\n name: nic2\n use_dhcp: false\n defroute: false\n- type: vlan\n vlan_id: {{ storage_vlan_id }}\n device: nic2\n addresses:\n - ip_netmask: {{ storage_ip }}/{{ storage_cidr }}\n routes: {{ storage_host_routes }}\n- type: ovs_bridge\n name: {{ neutron_physical_bridge_name }}\n dns_servers: {{ ctlplane_dns_nameservers }}\n domain: {{ dns_search_domains }}\n use_dhcp: false\n addresses:\n - ip_netmask: {{ external_ip }}/{{ external_cidr }}\n routes: {{ external_host_routes }}\n members:\n - type: interface\n name: nic3\n primary: true\n
In addition, add the External Network to the baremetal.yaml
file used by metalsmith and run the overcloud node provision
command passing the --network-config
option:
- name: CephStorage\n count: 3\n hostname_format: cephstorage-%index%\n instances:\n - hostname: cephstorage-0\n name: ceph-0\n - hostname: cephstorage-1\n name: ceph-1\n - hostname: cephstorage-2\n name: ceph-2\n defaults:\n profile: ceph-storage\n network_config:\n template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2\n networks:\n - network: ctlplane\n vif: true\n - network: storage\n - network: storage_mgmt\n - network: external\n
(undercloud) [stack@undercloud-0]$\n\nopenstack overcloud node provision\n -o overcloud-baremetal-deployed-0.yaml\n --stack overcloud\n --network-config -y\n $PWD/network/baremetal_deployment.yaml\n
Check the new network on the CephStorage
nodes:
[root@cephstorage-0 ~]# ip -o -4 a\n\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n11: vlan40 inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\\ valid_lft forever preferred_lft forever\n12: vlan30 inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n14: br-ex inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n
And now it\u2019s time to start migrating the RGW backends and build the ingress on top of them.
"},{"location":"ceph/ceph_rgw/#migrate-the-rgw-backends","title":"Migrate the RGW backends","text":"To match the cardinality diagram we use cephadm labels to refer to a group of nodes where a given daemon type should be deployed.
Add the RGW label to the cephstorage nodes:
for i in 0 1 2; {\n ceph orch host label add cephstorage-$i rgw;\n}\n
[ceph: root@controller-0 /]#\n\nfor i in 0 1 2; {\n ceph orch host label add cephstorage-$i rgw;\n}\n\nAdded label rgw to host cephstorage-0\nAdded label rgw to host cephstorage-1\nAdded label rgw to host cephstorage-2\n\n[ceph: root@controller-0 /]# ceph orch host ls\n\nHOST ADDR LABELS STATUS\ncephstorage-0 192.168.24.54 osd rgw\ncephstorage-1 192.168.24.44 osd rgw\ncephstorage-2 192.168.24.30 osd rgw\ncontroller-0 192.168.24.45 _admin mon mgr\ncontroller-1 192.168.24.11 _admin mon mgr\ncontroller-2 192.168.24.38 _admin mon mgr\n\n6 hosts in cluster\n
During the overcloud deployment, RGW is applied at step2 (external_deployment_steps), and a cephadm compatible spec is generated in /home/ceph-admin/specs/rgw
from the ceph_mkspec ansible module. Find and patch the RGW spec, specifying the right placement using the labels approach, and change the rgw backend port to 8090 to avoid conflicts with the Ceph Ingress Daemon (*)
[root@controller-0 heat-admin]# cat rgw\n\nnetworks:\n- 172.17.3.0/24\nplacement:\n hosts:\n - controller-0\n - controller-1\n - controller-2\nservice_id: rgw\nservice_name: rgw.rgw\nservice_type: rgw\nspec:\n rgw_frontend_port: 8080\n rgw_realm: default\n rgw_zone: default\n
Patch the spec replacing controller nodes with the label key
---\nnetworks:\n- 172.17.3.0/24\nplacement:\n label: rgw\nservice_id: rgw\nservice_name: rgw.rgw\nservice_type: rgw\nspec:\n rgw_frontend_port: 8090\n rgw_realm: default\n rgw_zone: default\n
(*) cephadm_check_port
Apply the new RGW spec using the orchestrator CLI:
$ cephadm shell -m /home/ceph-admin/specs/rgw\n$ cephadm shell -- ceph orch apply -i /mnt/rgw\n
Which triggers the redeploy:
...\nosd.9 cephstorage-2\nrgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 starting\nrgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 starting\nrgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 starting\nrgw.rgw.controller-1.eyvrzw controller-1 172.17.3.146:8080 running (5h)\nrgw.rgw.controller-2.navbxa controller-2 172.17.3.66:8080 running (5h)\n\n...\nosd.9 cephstorage-2\nrgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 running (19s)\nrgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 running (16s)\nrgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 running (13s)\n
At this point, we need to make sure that the new RGW backends are reachable on the new ports, but we\u2019re going to enable an IngressDaemon on port 8080 later in the process. For this reason, ssh on each RGW node (the CephStorage nodes) and add the iptables rule to allow connections to both 8080 and 8090 ports in the CephStorage nodes.
iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment \"ceph rgw ingress\" -j ACCEPT\n\niptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment \"ceph rgw backends\" -j ACCEPT\n\nfor port in 8080 8090; { \n for i in 25 10 32; {\n ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \\\n -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \\\n -j ACCEPT;\n }\n}\n
From a Controller node (e.g. controller-0) try to reach (curl) the rgw backends:
for i in 26 23 81; do {\n echo \"----\"\n curl 172.17.3.$i:8090;\n echo \"----\"\n echo\ndone\n
And you should observe the following:
----\nQuery 172.17.3.23\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n\n----\nQuery 172.17.3.26\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n\n----\nQuery 172.17.3.81\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n
"},{"location":"ceph/ceph_rgw/#note","title":"NOTE","text":"In case RGW backends are migrated in the CephStorage nodes, there\u2019s no \u201cinternalAPI\u201d network(this is not true in the case of HCI). Reconfig the RGW keystone endpoint, pointing to the external Network that has been propagated (see the previous section)
[ceph: root@controller-0 /]# ceph config dump | grep keystone\nglobal basic rgw_keystone_url http://172.16.1.111:5000\n\n[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000\n
"},{"location":"ceph/ceph_rgw/#deploy-a-ceph-ingressdaemon","title":"Deploy a Ceph IngressDaemon","text":"HaProxy
is managed by TripleO via Pacemaker
: the three running instances at this point will point to the old RGW backends, resulting in a wrong, not working configuration. Since we\u2019re going to deploy the Ceph Ingress Daemon, the first thing to do is remove the existing ceph_rgw
config, clean up the config created by TripleO and restart the service to make sure other services are not affected by this change.
ssh on each Controller node and remove the following is the section from /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
:
listen ceph_rgw\n bind 10.0.0.103:8080 transparent\n mode http\n balance leastconn\n http-request set-header X-Forwarded-Proto https if { ssl_fc }\n http-request set-header X-Forwarded-Proto http if !{ ssl_fc }\n http-request set-header X-Forwarded-Port %[dst_port]\n option httpchk GET /swift/healthcheck\n option httplog\n option forwardfor\n server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2\n server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2\n server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2\n
Restart haproxy-bundle
and make sure it\u2019s started:
[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle\nhaproxy-bundle successfully restarted\n\n\n[root@controller-0 ~]# sudo pcs status | grep haproxy\n\n * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:\n * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-0\n * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-1\n * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-2\n
Double check no process is bound to 8080 anymore\u201d
[root@controller-0 ~]# ss -antop | grep 8080\n[root@controller-0 ~]#\n
And the swift CLI should fail at this point:
(overcloud) [root@cephstorage-0 ~]# swift list\n\nHTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc41beb0430>: Failed to establish a new connection: [Errno 111] Connection refused'))\n
Now we can start deploying the Ceph IngressDaemon on the CephStorage nodes.
Set the required images for both HaProxy and Keepalived
[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3\n\n[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5\n
Prepare the ingress spec and mount it to cephadm:
$ sudo vim /home/ceph-admin/specs/rgw_ingress\n
and paste the following content:
---\nservice_type: ingress\nservice_id: rgw.rgw\nplacement:\n label: rgw\nspec:\n backend_service: rgw.rgw\n virtual_ip: 10.0.0.89/24\n frontend_port: 8080\n monitor_port: 8898\n virtual_interface_networks:\n - 10.0.0.0/24\n
Mount the generated spec and apply it using the orchestrator CLI:
$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress\n$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress\n
Wait until the ingress is deployed and query the resulting endpoint:
[ceph: root@controller-0 /]# ceph orch ls\n\nNAME PORTS RUNNING REFRESHED AGE PLACEMENT\ncrash 6/6 6m ago 3d *\ningress.rgw.rgw 10.0.0.89:8080,8898 6/6 37s ago 60s label:rgw\nmds.mds 3/3 6m ago 3d controller-0;controller-1;controller-2\nmgr 3/3 6m ago 3d controller-0;controller-1;controller-2\nmon 3/3 6m ago 3d controller-0;controller-1;controller-2\nosd.default_drive_group 15 37s ago 3d cephstorage-0;cephstorage-1;cephstorage-2\nrgw.rgw ?:8090 3/3 37s ago 4m label:rgw\n
[ceph: root@controller-0 /]# curl 10.0.0.89:8080\n\n---\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>[ceph: root@controller-0 /]#\n\u2014\n
The result above shows that we\u2019re able to reach the backend from the IngressDaemon, which means we\u2019re almost ready to interact with it using the swift CLI.
"},{"location":"ceph/ceph_rgw/#update-the-object-store-endpoints","title":"Update the object-store endpoints","text":"The endpoints still point to the old VIP owned by pacemaker, but given it\u2019s still used by other services and we reserved a new VIP on the same network, before any other action we should update the object-store endpoint.
List the current endpoints:
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object\n\n| 1326241fb6b6494282a86768311f48d1 | regionOne | swift | object-store | True | internal | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |\n| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift | object-store | True | public | http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s |\n| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift | object-store | True | admin | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |\n
Update the endpoints pointing to the Ingress VIP:
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url \"http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s\" 95596a2d92c74c15b83325a11a4f07a3\n\n(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store\n| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift | object-store | True | internal | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |\n| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift | object-store | True | public | http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s |\n| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift | object-store | True | admin | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |\n
And repeat the same action for both internal and admin. Test the migrated service.
(overcloud) [stack@undercloud-0 ~]$ swift list --debug\n\nDEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint\nDEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens\nDEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000\nDEBUG:urllib3.connectionpool:http://10.0.0.115:5000 \"POST /v3/auth/tokens HTTP/1.1\" 201 7795\nDEBUG:keystoneclient.auth.identity.v3.base:{\"token\": {\"methods\": [\"password\"], \"user\": {\"domain\": {\"id\": \"default\", \"name\": \"Default\"}, \"id\": \"6f87c7ffdddf463bbc633980cfd02bb3\", \"name\": \"admin\", \"password_expires_at\": null}, \n\n\n...\n...\n...\n\nDEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H \"X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos\" -H \"Accept-Encoding: gzip\"\nDEBUG:swiftclient:RESP STATUS: 200 OK\nDEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'}\nDEBUG:swiftclient:RESP BODY: b'[]'\n
Run tempest tests against object-storage:
(overcloud) [stack@undercloud-0 tempest-dir]$ tempest run --regex tempest.api.object_storage\n...\n...\n...\n======\nTotals\n======\nRan: 141 tests in 606.5579 sec.\n - Passed: 128\n - Skipped: 13\n - Expected Fail: 0\n - Unexpected Success: 0\n - Failed: 0\nSum of execute time for each test: 657.5183 sec.\n\n==============\nWorker Balance\n==============\n - Worker 0 (1 tests) => 0:10:03.400561\n - Worker 1 (2 tests) => 0:00:24.531916\n - Worker 2 (4 tests) => 0:00:10.249889\n - Worker 3 (30 tests) => 0:00:32.730095\n - Worker 4 (51 tests) => 0:00:26.246044\n - Worker 5 (6 tests) => 0:00:20.114803\n - Worker 6 (20 tests) => 0:00:16.290323\n - Worker 7 (27 tests) => 0:00:17.103827\n
"},{"location":"ceph/ceph_rgw/#additional-resources","title":"Additional Resources","text":"A screen recording is available here.
"},{"location":"contributing/development_environment/","title":"Development environment","text":"This is a guide for an install_yamls based Adoption environment with network isolation as an alternative to the CRC and Vagrant TripleO Standalone development environment guide.
The Adoption development environment utilizes install_yamls for CRC VM creation and for creation of the VM that hosts the original Wallaby OpenStack in Standalone configuration.
"},{"location":"contributing/development_environment/#environment-prep","title":"Environment prep","text":"Get install_yamls:
git clone https://github.com/openstack-k8s-operators/install_yamls.git\n
Install tools for operator development:
cd ~/install_yamls/devsetup\nmake download_tools\n
"},{"location":"contributing/development_environment/#deployment-of-crc-with-network-isolation","title":"Deployment of CRC with network isolation","text":"cd ~/install_yamls/devsetup\nPULL_SECRET=$HOME/pull-secret.txt CPUS=12 MEMORY=40000 DISK=100 make crc\n\neval $(crc oc-env)\noc login -u kubeadmin -p 12345678 https://api.crc.testing:6443\n\nmake crc_attach_default_interface\n
"},{"location":"contributing/development_environment/#development-environment-with-openstack-ironic","title":"Development environment with Openstack ironic","text":"Create the BMaaS network (crc-bmaas
) and virtual baremetal nodes controlled by a RedFish BMC emulator.
cd .. # back to install_yamls\nmake nmstate\nmake namespace\ncd devsetup # back to install_yamls/devsetup\nmake bmaas\n
A node definition YAML file to use with the openstack baremetal create <file>.yaml
command can be generated for the virtual baremetal nodes by running the bmaas_generate_nodes_yaml
make target. Store it in a temp file for later.
make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml\n
Set variables to deploy edpm Standalone with additional network (baremetal
) and compute driver ironic
.
cat << EOF > /tmp/addtional_nets.json\n[\n {\n \"type\": \"network\",\n \"name\": \"crc-bmaas\",\n \"standalone_config\": {\n \"type\": \"ovs_bridge\",\n \"name\": \"baremetal\",\n \"mtu\": 1500,\n \"vip\": true,\n \"ip_subnet\": \"172.20.1.0/24\",\n \"allocation_pools\": [\n {\n \"start\": \"172.20.1.100\",\n \"end\": \"172.20.1.150\"\n }\n ],\n \"host_routes\": [\n {\n \"destination\": \"192.168.130.0/24\",\n \"nexthop\": \"172.20.1.1\"\n }\n ]\n }\n }\n]\nEOF\nexport EDPM_COMPUTE_ADDITIONAL_NETWORKS=$(jq -c . /tmp/addtional_nets.json)\nexport STANDALONE_COMPUTE_DRIVER=ironic\nexport NTP_SERVER=pool.ntp.org # Only neccecary if not on the RedHat network ...\nexport EDPM_COMPUTE_CEPH_ENABLED=false # Optional\n
Use the install_yamls devsetup to create a virtual machine connected to the isolated networks.
Create the edpm-compute-0 virtual machine.
cd install_yamls/devsetup\nmake standalone\n
"},{"location":"contributing/development_environment/#install-the-openstack-k8s-operators-openstack-operator","title":"Install the openstack-k8s-operators (openstack-operator)","text":"cd .. # back to install_yamls\nmake crc_storage\nmake input\nmake openstack\n
"},{"location":"contributing/development_environment/#convenience-steps","title":"Convenience steps","text":"To make our life easier we can copy the deployment passwords we'll be using in the backend services deployment phase of the data plane adoption.
scp -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100:/root/tripleo-standalone-passwords.yaml ~/\n
If we want to be able to easily run openstack
commands from the host without actually installing the package and copying the configuration file from the VM we can create a simple alias:
alias openstack=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 OS_CLOUD=standalone openstack\"\n
"},{"location":"contributing/development_environment/#route-networks","title":"Route networks","text":"Route VLAN20 to have access to the MariaDB cluster:
EDPM_BRIDGE=$(sudo virsh dumpxml edpm-compute-0 | grep -oP \"(?<=bridge=').*(?=')\")\nsudo ip link add link $EDPM_BRIDGE name vlan20 type vlan id 20\nsudo ip addr add dev vlan20 172.17.0.222/24\nsudo ip link set up dev vlan20\n
"},{"location":"contributing/development_environment/#snapshotrevert","title":"Snapshot/revert","text":"When the deployment of the Standalone OpenStack is finished, it's a good time to snapshot the machine, so that multiple Adoption attempts can be done without having to deploy from scratch.
cd ~/install_yamls/devsetup\nmake standalone_snapshot\n
And when you wish to revert the Standalone deployment to the snapshotted state:
cd ~/install_yamls/devsetup\nmake standalone_revert\n
Similar snapshot could be done for the CRC virtual machine, but the developer environment reset on CRC side can be done sufficiently via the install_yamls *_cleanup
targets. This is further detailed in the section: Reset the environment to pre-adoption state
# Enroll baremetal nodes\nmake bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml\nscp -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa /tmp/ironic_nodes.yaml root@192.168.122.100:\nssh -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\n\nexport OS_CLOUD=standalone\nopenstack baremetal create /root/ironic_nodes.yaml\nexport IRONIC_PYTHON_AGENT_RAMDISK_ID=$(openstack image show deploy-ramdisk -c id -f value)\nexport IRONIC_PYTHON_AGENT_KERNEL_ID=$(openstack image show deploy-kernel -c id -f value)\nfor node in $(openstack baremetal node list -c UUID -f value); do\n openstack baremetal node set $node \\\n --driver-info deploy_ramdisk=${IRONIC_PYTHON_AGENT_RAMDISK_ID} \\\n --driver-info deploy_kernel=${IRONIC_PYTHON_AGENT_KERNEL_ID} \\\n --resource-class baremetal \\\n --property capabilities='boot_mode:uefi'\ndone\n\n# Create a baremetal flavor\nopenstack flavor create baremetal --ram 1024 --vcpus 1 --disk 15 \\\n --property resources:VCPU=0 \\\n --property resources:MEMORY_MB=0 \\\n --property resources:DISK_GB=0 \\\n --property resources:CUSTOM_BAREMETAL=1 \\\n --property capabilities:boot_mode=\"uefi\"\n\n# Create image\nIMG=Fedora-Cloud-Base-38-1.6.x86_64.qcow2\nURL=https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/$IMG\ncurl -o /tmp/${IMG} -L $URL\nDISK_FORMAT=$(qemu-img info /tmp/${IMG} | grep \"file format:\" | awk '{print $NF}')\nopenstack image create --container-format bare --disk-format ${DISK_FORMAT} Fedora-Cloud-Base-38 < /tmp/${IMG}\n\nexport BAREMETAL_NODES=$(openstack baremetal node list -c UUID -f value)\n# Manage nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal node manage $node\ndone\n\n# Wait for nodes to reach \"manageable\" state\nwatch openstack baremetal node list\n\n# Inspect baremetal nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal introspection start $node\ndone\n\n# Wait for inspection to complete\nwatch openstack baremetal introspection list\n\n# Provide nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal node provide $node\ndone\n\n# Wait for nodes to reach \"available\" state\nwatch openstack baremetal node list\n\n# Create an instance on baremetal\nopenstack server show baremetal-test || {\n openstack server create baremetal-test --flavor baremetal --image Fedora-Cloud-Base-38 --nic net-id=provisioning --wait\n}\n\n# Check instance status and network connectivity\nopenstack server show baremetal-test\nping -c 4 $(openstack server show baremetal-test -f json -c addresses | jq -r .addresses.provisioning[0])\n
","text":"export OS_CLOUD=standalone\nsource ~/install_yamls/devsetup/scripts/edpm-deploy-instance.sh\n
Confirm the image UUID can be seen in Ceph's images pool.
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l\n
Create a Cinder volume, a backup from it, and snapshot it.
openstack volume create --image cirros --bootable --size 1 disk\nopenstack volume backup create --name backup disk\nopenstack volume snapshot create --volume disk snapshot\n
Add volume to the test VM
openstack server add volume test disk\n
"},{"location":"contributing/development_environment/#performing-the-data-plane-adoption","title":"Performing the Data Plane Adoption","text":"The development environment is now set up, you can go to the Adoption documentation and perform adoption manually, or run the test suite against your environment.
"},{"location":"contributing/development_environment/#reset-the-environment-to-pre-adoption-state","title":"Reset the environment to pre-adoption state","text":"The development environment must be rolled back in case we want to execute another Adoption run.
Delete the data-plane and control-plane resources from the CRC vm
oc delete osdp openstack\noc delete oscp openstack\n
Revert the standalone vm to the snapshotted state
cd ~/install_yamls/devsetup\nmake standalone_revert\n
Clean up and initialize the storage PVs in CRC vm cd ..\nmake crc_storage_cleanup\nmake crc_storage\n
"},{"location":"contributing/development_environment/#experimenting-with-an-additional-compute-node","title":"Experimenting with an additional compute node","text":"The following is not on the critical path of preparing the development environment for Adoption, but it shows how to make the environment work with an additional compute node VM.
The remaining steps should be completed on the hypervisor hosting crc and edpm-compute-0.
"},{"location":"contributing/development_environment/#deploy-ng-control-plane-with-ceph","title":"Deploy NG Control Plane with Ceph","text":"Export the Ceph configuration from edpm-compute-0 into a secret.
SSH=$(ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100)\nKEY=$($SSH \"cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0\")\nCONF=$($SSH \"cat /etc/ceph/ceph.conf | base64 -w 0\")\n\ncat <<EOF > ceph_secret.yaml\napiVersion: v1\ndata:\n ceph.client.openstack.keyring: $KEY\n ceph.conf: $CONF\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\ntype: Opaque\nEOF\n\noc create -f ceph_secret.yaml\n
Deploy the NG control plane with Ceph as backend for Glance and Cinder. As described in the install_yamls README, use the sample config located at https://github.com/openstack-k8s-operators/openstack-operator/blob/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml but make sure to replace the _FSID_
in the sample with the one from the secret created in the previous step. curl -o /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml https://raw.githubusercontent.com/openstack-k8s-operators/openstack-operator/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\nFSID=$(oc get secret ceph-conf-files -o json | jq -r '.data.\"ceph.conf\"' | base64 -d | grep fsid | sed -e 's/fsid = //') && echo $FSID\nsed -i \"s/_FSID_/${FSID}/\" /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\noc apply -f /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\n
A NG control plane which uses the same Ceph backend should now be functional. If you create a test image on the NG system to confirm it works from the configuration above, be sure to read the warning in the next section.
Before beginning adoption testing or development you may wish to deploy an EDPM node as described in the following section.
"},{"location":"contributing/development_environment/#warning-about-two-openstacks-and-one-ceph","title":"Warning about two OpenStacks and one Ceph","text":"Though workloads can be created in the NG deployment to test, be careful not to confuse them with workloads from the Wallaby cluster to be migrated. The following scenario is now possible.
A Glance image exists on the Wallaby OpenStack to be adopted.
[stack@standalone standalone]$ export OS_CLOUD=standalone\n[stack@standalone standalone]$ openstack image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 33a43519-a960-4cd0-a593-eca56ee553aa | cirros | active |\n+--------------------------------------+--------+--------+\n[stack@standalone standalone]$\n
If you now create an image with the NG cluster, then a Glance image will exsit on the NG OpenStack which will adopt the workloads of the wallaby. [fultonj@hamfast ng]$ export OS_CLOUD=default\n[fultonj@hamfast ng]$ export OS_PASSWORD=12345678\n[fultonj@hamfast ng]$ openstack image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 4ebccb29-193b-4d52-9ffd-034d440e073c | cirros | active |\n+--------------------------------------+--------+--------+\n[fultonj@hamfast ng]$\n
Both Glance images are stored in the same Ceph pool. ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l\nInferring fsid 7133115f-7751-5c2f-88bd-fbff2f140791\nUsing recent ceph image quay.rdoproject.org/tripleowallabycentos9/daemon@sha256:aa259dd2439dfaa60b27c9ebb4fb310cdf1e8e62aa7467df350baf22c5d992d8\nNAME SIZE PARENT FMT PROT LOCK\n33a43519-a960-4cd0-a593-eca56ee553aa 273 B 2\n33a43519-a960-4cd0-a593-eca56ee553aa@snap 273 B 2 yes\n4ebccb29-193b-4d52-9ffd-034d440e073c 112 MiB 2\n4ebccb29-193b-4d52-9ffd-034d440e073c@snap 112 MiB 2 yes\n
However, as far as each Glance service is concerned each has one image. Thus, in order to avoid confusion during adoption the test Glance image on the NG OpenStack should be deleted. openstack image delete 4ebccb29-193b-4d52-9ffd-034d440e073c\n
Connecting the NG OpenStack to the existing Ceph cluster is part of the adoption procedure so that the data migration can be minimized but understand the implications of the above example."},{"location":"contributing/development_environment/#deploy-edpm-compute-1","title":"Deploy edpm-compute-1","text":"edpm-compute-0 is not available as a standard EDPM system to be managed by edpm-ansible or dataplane-operator because it hosts the wallaby deployment which will be adopted and after adoption it will only host the Ceph server.
Use the install_yamls devsetup to create additional virtual machines and be sure that the EDPM_COMPUTE_SUFFIX
is set to 1
or greater. Do not set EDPM_COMPUTE_SUFFIX
to 0
or you could delete the Wallaby system created in the previous section.
When deploying EDPM nodes add an extraMounts
like the following in the OpenStackDataPlaneNodeSet
CR nodeTemplate
so that they will be configured to use the same Ceph cluster.
edpm-compute:\n nodeTemplate:\n extraMounts:\n - extraVolType: Ceph\n volumes:\n - name: ceph\n secret:\n secretName: ceph-conf-files\n mounts:\n - name: ceph\n mountPath: \"/etc/ceph\"\n readOnly: true\n
A NG data plane which uses the same Ceph backend should now be functional. Be careful about not confusing new workloads to test the NG OpenStack with the Wallaby OpenStack as described in the previous section.
"},{"location":"contributing/development_environment/#begin-adoption-testing-or-development","title":"Begin Adoption Testing or Development","text":"We should now have:
An environment above is assumed to be available in the Glance Adoption documentation. You may now follow other Data Plane Adoption procedures described in the documentation. The same pattern can be applied to other services.
"},{"location":"contributing/documentation/","title":"Contributing to documentation","text":""},{"location":"contributing/documentation/#rendering-documentation-locally","title":"Rendering documentation locally","text":"Install docs build requirements into virtualenv:
python3 -m venv local/docs-venv\nsource local/docs-venv/bin/activate\npip install -r docs/doc_requirements.txt\n
Serve docs site on localhost:
mkdocs serve\n
Click the link it outputs. As you save changes to files modified in your editor, the browser will automatically show the new content.
"},{"location":"contributing/documentation/#patterns-and-tips-for-contributing-to-documentation","title":"Patterns and tips for contributing to documentation","text":"Pages concerning individual components/services should make sense in the context of the broader adoption procedure. While adopting a service in isolation is an option for developers, let's write the documentation with the assumption the adoption procedure is being done in full, going step by step (one doc after another).
The procedure should be written with production use in mind. This repository could be used as a starting point for product technical documentation. We should not tie the documentation to something that wouldn't translate well from dev envs to production.
If possible, try to make code snippets copy-pastable. Use shell variables if the snippets should be parametrized. Use oc
rather than kubectl
in snippets.
Focus on the \"happy path\" in the docs as much as possible, troubleshooting info can go into the Troubleshooting page, or alternatively a troubleshooting section at the end of the document, visibly separated from the main procedure.
The full procedure will inevitably happen to be quite long, so let's try to be concise in writing to keep the docs consumable (but not to a point of making things difficult to understand or omitting important things).
A bash alias can be created for long command however when implementing them in the test roles you should transform them to avoid command not found errors. From:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n\nopenstack endpoint list | grep network\n
TO: alias openstack=\"oc exec -t openstackclient -- openstack\"\n\n${BASH_ALIASES[openstack]} endpoint list | grep network\n
The adoption docs repository also includes a test suite for Adoption. There are targets in the Makefile which can be used to execute the test suite:
test-minimal
- a minimal test scenario, the eventual set of services in this scenario should be the \"core\" services needed to launch a VM. This scenario assumes local storage backend for services like Glance and Cinder.
test-with-ceph
- like 'minimal' but with Ceph storage backend for Glance and Cinder.
Create tests/vars.yaml
and tests/secrets.yaml
by copying the included samples (tests/vars.sample.yaml
, tests/secrets.sample.yaml
).
Walk through the tests/vars.yaml
and tests/secrets.yaml
files and see if you need to edit any values. If you are using the documented development environment, majority of the defaults should work out of the box. The comments in the YAML files will guide you regarding the expected values. You may want to double check that these variables suit your environment:
install_yamls_path
tripleo_passwords
controller*_ssh
edpm_privatekey_path
timesync_ntp_servers
The interface between the execution infrastructure and the test suite is an Ansible inventory and variables files. Inventory and variable samples are provided. To run the tests, follow this procedure:
sudo dnf -y install python-devel\npython3 -m venv venv\nsource venv/bin/activate\npip install openstackclient osc_placement jmespath\nansible-galaxy collection install community.general\n
make test-with-ceph
(the documented development environment does include Ceph).If you are using Ceph-less environment, you should run make test-minimal
.
Please be aware of the following when changing the test suite:
The purpose of the test suite is to verify what the user would run if they were following the docs. We don't want to loosely rewrite the docs into Ansible code following Ansible best practices. We want to test the exact same bash commands/snippets that are written in the docs. This often means that we should be using the shell
module and do a verbatim copy/paste from docs, instead of using the best Ansible module for the task at hand.
The following instructions create OpenStackControlPlane CR with basic backend services deployed, and all the OpenStack services disabled. This will be the foundation of the podified control plane.
In subsequent steps, we'll import the original databases and then add podified OpenStack control plane services.
"},{"location":"openstack/backend_services_deployment/#prerequisites","title":"Prerequisites","text":"The cloud which we want to adopt is up and running. It's on OpenStack Wallaby release.
The openstack-operator
is deployed, but OpenStackControlPlane
is not deployed.
For developer/CI environments, the openstack operator can be deployed by running make openstack
inside install_yamls repo.
For production environments, the deployment method will likely be different.
For developer/CI environments driven by install_yamls, make sure you've run make crc_storage
.
ADMIN_PASSWORD=SomePassword\n
To use the existing OpenStack deployment password:
ADMIN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AdminPassword:' | awk -F ': ' '{ print $2; }')\n
E.g. in developer environments with TripleO Standalone, the passwords can be extracted like this:
CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')\nGLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')\nHEAT_AUTH_ENCRYPTION_KEY=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatAuthEncryptionKey:' | awk -F ': ' '{ print $2; }')\nHEAT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatPassword:' | awk -F ': ' '{ print $2; }')\nIRONIC_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' IronicPassword:' | awk -F ': ' '{ print $2; }')\nMANILA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' ManilaPassword:' | awk -F ': ' '{ print $2; }')\nNEUTRON_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NeutronPassword:' | awk -F ': ' '{ print $2; }')\nNOVA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NovaPassword:' | awk -F ': ' '{ print $2; }')\nOCTAVIA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' OctaviaPassword:' | awk -F ': ' '{ print $2; }')\nPLACEMENT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' PlacementPassword:' | awk -F ': ' '{ print $2; }')\n
"},{"location":"openstack/backend_services_deployment/#pre-checks","title":"Pre-checks","text":""},{"location":"openstack/backend_services_deployment/#procedure-backend-services-deployment","title":"Procedure - backend services deployment","text":"oc project openstack\n
The procedure for this will vary, but in developer/CI environments we use install_yamls:
# in install_yamls\nmake input\n
$ADMIN_PASSWORD
is different than the already set password in osp-secret
, amend the AdminPassword
key in the osp-secret
correspondingly:oc set data secret/osp-secret \"AdminPassword=$ADMIN_PASSWORD\"\n
osp-secret
to match the service account passwords from the original deployment:oc set data secret/osp-secret \"CinderPassword=$CINDER_PASSWORD\"\noc set data secret/osp-secret \"GlancePassword=$GLANCE_PASSWORD\"\noc set data secret/osp-secret \"HeatAuthEncryptionKey=$HEAT_AUTH_ENCRYPTION_KEY\"\noc set data secret/osp-secret \"HeatPassword=$HEAT_PASSWORD\"\noc set data secret/osp-secret \"IronicPassword=$IRONIC_PASSWORD\"\noc set data secret/osp-secret \"ManilaPassword=$MANILA_PASSWORD\"\noc set data secret/osp-secret \"NeutronPassword=$NEUTRON_PASSWORD\"\noc set data secret/osp-secret \"NovaPassword=$NOVA_PASSWORD\"\noc set data secret/osp-secret \"OctaviaPassword=$OCTAVIA_PASSWORD\"\noc set data secret/osp-secret \"PlacementPassword=$PLACEMENT_PASSWORD\"\n
oc apply -f - <<EOF\napiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n\n cinder:\n enabled: false\n template:\n cinderAPI: {}\n cinderScheduler: {}\n cinderBackup: {}\n cinderVolumes: {}\n\n dns:\n template:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: ctlplane\n metallb.universe.tf/allow-shared-ip: ctlplane\n metallb.universe.tf/loadBalancerIPs: 192.168.122.80\n spec:\n type: LoadBalancer\n options:\n - key: server\n values:\n - 192.168.122.1\n replicas: 1\n\n glance:\n enabled: false\n template:\n glanceAPI: {}\n\n horizon:\n enabled: false\n template: {}\n\n ironic:\n enabled: false\n template:\n ironicConductors: []\n\n keystone:\n enabled: false\n template: {}\n\n manila:\n enabled: false\n template:\n manilaAPI: {}\n manilaScheduler: {}\n manilaShares: {}\n\n mariadb:\n templates:\n openstack:\n storageRequest: 500M\n openstack-cell1:\n storageRequest: 500M\n\n memcached:\n enabled: true\n templates:\n memcached:\n replicas: 1\n\n neutron:\n enabled: false\n template: {}\n\n nova:\n enabled: false\n template: {}\n\n ovn:\n enabled: false\n template:\n ovnDBCluster:\n ovndbcluster-nb:\n dbType: NB\n storageRequest: 10G\n networkAttachment: internalapi\n ovndbcluster-sb:\n dbType: SB\n storageRequest: 10G\n networkAttachment: internalapi\n ovnNorthd:\n networkAttachment: internalapi\n replicas: 1\n ovnController:\n networkAttachment: tenant\n\n placement:\n enabled: false\n template: {}\n\n rabbitmq:\n templates:\n rabbitmq:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.85\n spec:\n type: LoadBalancer\n rabbitmq-cell1:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.86\n spec:\n type: LoadBalancer\n\n telemetry:\n enabled: false\n template: {}\nEOF\n
"},{"location":"openstack/backend_services_deployment/#post-checks","title":"Post-checks","text":"oc get pod mariadb-openstack -o jsonpath='{.status.phase}{\"\\n\"}'\n
"},{"location":"openstack/ceph_backend_configuration/","title":"Ceph backend configuration (if applicable)","text":"If the original deployment uses a Ceph storage backend for any service (e.g. Glance, Cinder, Nova, Manila), the same backend must be used in the adopted deployment and CRs must be configured accordingly.
"},{"location":"openstack/ceph_backend_configuration/#prerequisites","title":"Prerequisites","text":"OpenStackControlPlane
CR must already exist.Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
CEPH_SSH=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\"\nCEPH_KEY=$($CEPH_SSH \"cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0\")\nCEPH_CONF=$($CEPH_SSH \"cat /etc/ceph/ceph.conf | base64 -w 0\")\n
"},{"location":"openstack/ceph_backend_configuration/#modify-capabilities-of-the-openstack-user-to-accommodate-manila","title":"Modify capabilities of the \"openstack\" user to accommodate Manila","text":"On TripleO environments, the CephFS driver in Manila is configured to use its own keypair. For convenience, let's modify the openstack
user so that we can use it across all OpenStack services.
Using the same user across the services serves two purposes: - The capabilities of the user required to interact with the Manila service became far simpler and hence, more became more secure with RHOSP 18. - It is simpler to create a common ceph secret (keyring and ceph config file) and propagate the secret to all services that need it.
$CEPH_SSH cephadm shell\nceph auth caps client.openstack \\\n mgr 'allow *' \\\n mon 'allow r, profile rbd' \\\n osd 'profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=images, allow rw pool manila_data'\n
"},{"location":"openstack/ceph_backend_configuration/#ceph-backend-configuration","title":"Ceph backend configuration","text":"Create the ceph-conf-files
secret, containing Ceph configuration:
oc apply -f - <<EOF\napiVersion: v1\ndata:\n ceph.client.openstack.keyring: $CEPH_KEY\n ceph.conf: $CEPH_CONF\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\ntype: Opaque\nEOF\n
The content of the file should look something like this:
---\napiVersion: v1\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\nstringData:\n ceph.client.openstack.keyring: |\n [client.openstack]\n key = <secret key>\n caps mgr = \"allow *\"\n caps mon = \"profile rbd\"\n caps osd = \"profile rbd pool=images\"\n ceph.conf: |\n [global]\n fsid = 7a1719e8-9c59-49e2-ae2b-d7eb08c695d4\n mon_host = 10.1.1.2,10.1.1.3,10.1.1.4\n
Configure extraMounts
within the OpenStackControlPlane
CR:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n extraMounts:\n - name: v1\n region: r1\n extraVol:\n - propagation:\n - CinderVolume\n - CinderBackup\n - GlanceAPI\n - ManilaShare\n extraVolType: Ceph\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n mounts:\n - name: ceph\n mountPath: \"/etc/ceph\"\n readOnly: true\n'\n
"},{"location":"openstack/ceph_backend_configuration/#getting-ceph-fsid","title":"Getting Ceph FSID","text":"Configuring some OpenStack services to use Ceph backend may require the FSID value. You can fetch the value from the config like so:
CEPH_FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data.\"ceph.conf\"' | base64 -d | grep fsid | sed -e 's/fsid = //')\n
"},{"location":"openstack/cinder_adoption/","title":"Cinder adoption","text":"Adopting a director deployed Cinder service into OpenStack may require some thought because it's not always a simple process.
Usually the adoption process entails:
cinder.conf
file.This guide provides necessary knowledge to complete these steps in most situations, but it still requires knowledge on how OpenStack services work and the structure of a Cinder configuration file.
"},{"location":"openstack/cinder_adoption/#limitations","title":"Limitations","text":"There are currently some limitations that are worth highlighting; some are related to this guideline while some to the operator:
There is no global nodeSelector
for all cinder volumes, so it needs to be specified per backend. This may change in the future.
There is no global customServiceConfig
or customServiceConfigSecrets
for all cinder volumes, so it needs to be specified per backend. This may change in the future.
Adoption of LVM backends, where the volume data is stored in the compute nodes, is not currently being documented in this process. It may get documented in the future.
Support for Cinder backends that require kernel modules not included in RHEL has not been tested in Operator deployed OpenStack so it is not documented in this guide.
Adoption of DCN/Edge deployment is not currently described in this guide.
Previous Adoption steps completed. Notably, cinder service must have been stopped and the service databases must already be imported into the podified MariaDB.
Storage network has been properly configured on the OpenShift cluster.
No new environmental variables need to be defined, though we use the CONTROLLER1_SSH
that was defined in a previous step for the pre-checks.
We are going to need the contents of cinder.conf
, so we may want to download it to have it locally accessible:
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf > cinder.conf\n
"},{"location":"openstack/cinder_adoption/#prepare-openshift","title":"Prepare OpenShift","text":"As explained the planning section before deploying OpenStack in OpenShift we need to ensure that the networks are ready, that we have decided the node selection, and also make sure any necessary changes to the OpenShift nodes have been made. For Cinder volume and backup services all these 3 must be carefully considered.
"},{"location":"openstack/cinder_adoption/#node-selection","title":"Node Selection","text":"We may need, or want, to restrict the OpenShift nodes where cinder volume and backup services can run.
The best example of when we need to do node selection for a specific cinder service in when we deploy Cinder with the LVM driver. In that scenario the LVM data where the volumes are stored only exists in a specific host, so we need to pin the cinder-volume service to that specific OpenShift node. Running the service on any other OpenShift node would not work. Since nodeSelector
only works on labels we cannot use the OpenShift host node name to restrict the LVM backend and we'll need to identify it using a unique label, an existing or new one:
$ oc label nodes worker0 lvm=cinder-volumes\n
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n cinder:\n enabled: true\n template:\n cinderVolumes:\n lvm-iscsi:\n nodeSelector:\n lvm: cinder-volumes\n< . . . >\n
As mentioned in the Node Selector guide, an example where we need to use labels is when using FC storage and we don't have HBA cards in all our OpenShift nodes. In this scenario we would need to restrict all the cinder volume backends (not only the FC one) as well as the backup services.
Depending on the cinder backends, their configuration, and the usage of Cinder, we can have network intensive cinder volume services with lots of I/O as well as cinder backup services that are not only network intensive but also memory and CPU intensive. This may be a concern for the OpenShift human operators, and they may want to use the nodeSelector
to prevent these service from interfering with their other OpenShift workloads
Please make sure to read the Nodes Selector guide before continuing, as we'll be referring to some of the concepts explained there in the following sections.
When selecting the nodes where cinder volume is going to run please remember that cinder-volume may also use local storage when downloading a glance image for the create volume from image operation, and it can require a considerable amount of space when having concurrent operations and not using cinder volume cache.
If we don't have nodes with enough local disk space for the temporary images we can use a remote NFS location for the images. This is something that we had to manually setup in Director deployments, but with operators we can easily do it automatically using the extra volumes feature ()extraMounts
.
Due to the specifics of the storage transport protocols some changes may be required on the OpenShift side, and although this is something that must be documented by the Vendor here wer are going to provide some generic instructions that can serve as a guide for the different transport protocols.
Check the backend sections in our cinder.conf
file that are listed in the enabled_backends
configuration option to figure out the transport storage protocol used by the backend.
Depending on the backend we can find the transport protocol:
Looking at the volume_driver
configuration option, as it may contain the protocol itself: RBD, iSCSI, FC...
Looking at the target_protocol
configuration option
Warning: Any time a MachineConfig
is used to make changes to OpenShift nodes the node will reboot!! Act accordingly.
There's nothing to do for NFS. OpenShift can connect to NFS backends without any additional changes.
"},{"location":"openstack/cinder_adoption/#rbdceph","title":"RBD/Ceph","text":"There's nothing to do for RBD/Ceph in terms of preparing the nodes, OpenShift can connect to Ceph backends without any additional changes. Credentials and configuration files will need to be provided to the services though.
"},{"location":"openstack/cinder_adoption/#iscsi","title":"iSCSI","text":"Connecting to iSCSI volumes requires that the iSCSI initiator is running on the OpenShift hosts hosts where volume and backup services are going to run, because the Linux Open iSCSI initiator doesn't currently support network namespaces, so we must only run 1 instance of the service for the normal OpenShift usage, plus the OpenShift CSI plugins, plus the OpenStack services.
If we are not already running iscsid
on the OpenShift nodes then we'll need to apply a MachineConfig
similar to this one:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-enable-iscsid\nspec:\n config:\n ignition:\n version: 3.2.0\n systemd:\n units:\n - enabled: true\n name: iscsid.service\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
For production deployments using iSCSI volumes we always recommend setting up multipathing, please look at the multipathing section to see how to configure it.
TODO: Add, or at least mention, the Nova eDPM side for iSCSI.
"},{"location":"openstack/cinder_adoption/#fc","title":"FC","text":"There's nothing to do for FC volumes to work, but the cinder volume and cinder backup services need to run in an OpenShift host that has HBAs, so if there are nodes that don't have HBAs then we'll need to use labels to restrict where these services can run, as mentioned in the [node selection section] (#node-selection).
This also means that for virtualized OpenShift clusters using FC we'll need to expose the host's HBAs inside the VM.
For production deployments using FC volumes we always recommend setting up multipathing, please look at the multipathing section to see how to configure it.
"},{"location":"openstack/cinder_adoption/#nvme-of","title":"NVMe-oF","text":"Connecting to NVMe-oF volumes requires that the nvme kernel modules are loaded on the OpenShift hosts.
If we are not already loading the nvme-fabrics
module on the OpenShift nodes where volume and backup services are going to run then we'll need to apply a MachineConfig
similar to this one:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-load-nvme-fabrics\nspec:\n config:\n ignition:\n version: 3.2.0\n storage:\n files:\n - path: /etc/modules-load.d/nvme_fabrics.conf\n overwrite: false\n # Mode must be decimal, this is 0644\n mode: 420\n user:\n name: root\n group:\n name: root\n contents:\n # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.\n # This is the rfc2397 text/plain string format\n source: data:,nvme-fabrics\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
We are only loading the nvme-fabrics
module because it takes care of loading the transport specific modules (tcp, rdma, fc) as needed.
For production deployments using NVMe-oF volumes we always recommend using multipathing. For NVMe-oF volumes OpenStack uses native multipathing, called ANA.
Once the OpenShift nodes have rebooted and are loading the nvme-fabrics
module we can confirm that the Operating System is configured and supports ANA by checking on the host:
cat /sys/module/nvme_core/parameters/multipath\n
Attention: ANA doesn't use the Linux Multipathing Device Mapper, but the *current OpenStack code requires multipathd
on compute nodes to be running for Nova to be able to use multipathing, so please remember to follow the multipathing part for compute nodes on the multipathing section.
TODO: Add, or at least mention, the Nova eDPM side for NVMe-oF.
"},{"location":"openstack/cinder_adoption/#multipathing","title":"Multipathing","text":"For iSCSI and FC protocols we always recommend using multipathing, which has 4 parts:
To prepare the OpenShift hosts we need to ensure that the Linux Multipath Device Mapper is configured and running on the OpenShift hosts, and we do that using MachineConfig
like this one:
# Includes the /etc/multipathd.conf contents and the systemd unit changes\napiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-enable-multipathd\nspec:\n config:\n ignition:\n version: 3.2.0\n storage:\n files:\n - path: /etc/multipath.conf\n overwrite: false\n # Mode must be decimal, this is 0600\n mode: 384\n user:\n name: root\n group:\n name: root\n contents:\n # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.\n # This is the rfc2397 text/plain string format\n source: data:,defaults%20%7B%0A%20%20user_friendly_names%20no%0A%20%20recheck_wwid%20yes%0A%20%20skip_kpartx%20yes%0A%20%20find_multipaths%20yes%0A%7D%0A%0Ablacklist%20%7B%0A%7D\n systemd:\n units:\n - enabled: true\n name: multipathd.service\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
To configure the cinder services to use multipathing we need to enable the use_multipath_for_image_xfer
configuration option in all the backend sections and in the [DEFAULT]
section for the backup service, but in Podified deployments we don't need to worry about it, because that's the default. So as long as we don't override it setting use_multipath_for_image_xfer = false
then multipathing will work as long as the service is running on the OpenShift host.
TODO: Add, or at least mention, the Nova eDPM side for Multipathing once it's implemented.
"},{"location":"openstack/cinder_adoption/#configurations","title":"Configurations","text":"As described in the planning Cinder is configured using configuration snippets instead of using obscure configuration parameters defined by the installer.
The recommended way to deploy Cinder volume backends has changed to remove old limitations, add flexibility, and improve operations in general.
When deploying with Director we used to run a single Cinder volume service with all our backends (each backend would run on its own process), and even though that way of deploying is still supported, we don't recommend it. We recommend using a volume service per backend since it's a superior deployment model.
So for an LVM and a Ceph backend we would have 2 entries in cinderVolume
and, as mentioned in the limitations section, we cannot set global defaults for all volume services, so we would have to define it for each of them, like this:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n enabled: true\n template:\n cinderVolume:\n lvm:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n [lvm]\n< . . . >\n ceph:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n [ceph]\n< . . . >\n
Reminder that for volume backends that have sensitive information using Secret
and the customServiceConfigSecrets
key is the recommended way to go.
For adoption instead of using a whole deployment manifest we'll use a targeted patch, like we did with other services, and in this patch we will enable the different cinder services with their specific configurations.
WARNING: Check that all configuration options are still valid for the new OpenStack version, since configuration options may have been deprecated, removed, or added. This applies to both backend driver specific configuration options and other generic options.
There are 2 ways to prepare a cinder configuration for adoption, tailor-making it or doing it quick and dirty. There is no difference in how Cinder will operate with both methods, so we are free to chose, though we recommend tailor-making it whenever possible.
The high level explanation of the tailor-made approach is:
Determine what part of the configuration is generic for all the cinder services and remove anything that would change when deployed in OpenShift, like the connection
in the [dabase]
section, the transport_url
and log_dir
in [DEFAULT]
, the whole [coordination]
section. This configuration goes into the customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
) at the cinder: template:
level.
Determine if there's any scheduler specific configuration and add it to the customServiceConfig
section in cinder: template: cinderScheduler
.
Determine if there's any API specific configuration and add it to the customServiceConfig
section in cinder: template: cinderAPI
.
If we have cinder backup deployed, then we'll get the cinder backup relevant configuration options and add them to customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
) at the cinder: template: cinderBackup:
level. We should remove the host
configuration in the [DEFAULT]
section to facilitate supporting multiple replicas in the future.
Determine the individual volume backend configuration for each of the drivers. The configuration will not only be the specific driver section, it should also include the [backend_defaults]
section and FC zoning sections is they are being used, because the cinder operator doesn't support a customServiceConfig
section global for all volume services. Each backend would have its own section under cinder: template: cinderVolumes
and the configuration would go in customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
).
Check if any of the cinder volume drivers being used requires a custom vendor image. If they do, find the location of the image in the vendor's instruction available in the w OpenStack Cinder ecosystem page and add it under the specific's driver section using the containerImage
key. For example, if we had a Pure Storage array and the driver was already certified for OSP18, then we would have something like this:
spec:\n cinder:\n enabled: true\n template:\n cinderVolume:\n pure:\n containerImage: registry.connect.redhat.com/purestorage/openstack-cinder-volume-pure-rhosp-18-0'\n customServiceConfigSecrets:\n - openstack-cinder-pure-cfg\n< . . . >\n
Secrets
or ConfigMap
to store the information in OpenShift and then the extraMounts
key. For example, for the Ceph credentials stored in a Secret
called ceph-conf-files
we would patch the top level extraMounts
in OpenstackControlPlane
:spec:\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/ceph\n name: ceph\n readOnly: true\n propagation:\n - CinderVolume\n - CinderBackup\n - Glance\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n
But for a service specific one, like the API policy, we would do it directly on the service itself, in this example we include the cinder API configuration that references the policy we are adding from a ConfigMap
called my-cinder-conf
that has a key policy
with the contents of the policy: spec:\n cinder:\n enabled: true\n template:\n cinderAPI:\n customServiceConfig: |\n [oslo_policy]\n policy_file=/etc/cinder/api/policy.yaml\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/cinder/api\n name: policy\n readOnly: true\n propagation:\n - CinderAPI\n volumes:\n - name: policy\n projected:\n sources:\n - configMap:\n name: my-cinder-conf\n items:\n - key: policy\n path: policy.yaml\n
The quick and dirty process is more straightforward:
Create an agnostic configuration file removing any specifics from the old deployment's cinder.conf
file, like the connection
in the [dabase]
section, the transport_url
and log_dir
in [DEFAULT]
, the whole [coordination]
section, etc..
Assuming the configuration has sensitive information, drop the modified contents of the whole file into a Secret
.
Reference this secret in all the services, creating a cinder volumes section for each backend and just adding the respective enabled_backends
option.
Add external files as mentioned in the last bullet of the tailor-made configuration explanation.
Example of what the quick and dirty configuration patch would look like:
spec:\n cinder:\n enabled: true\n template:\n cinderAPI:\n customServiceConfigSecrets:\n - cinder-conf\n cinderScheduler:\n customServiceConfigSecrets:\n - cinder-conf\n cinderBackup:\n customServiceConfigSecrets:\n - cinder-conf\n cinderVolume:\n lvm1:\n customServiceConfig: |\n [DEFAULT]\n enabled_backends = lvm1\n customServiceConfigSecrets:\n - cinder-conf\n lvm2:\n customServiceConfig: |\n [DEFAULT]\n enabled_backends = lvm2\n customServiceConfigSecrets:\n - cinder-conf\n
"},{"location":"openstack/cinder_adoption/#configuration-generation-helper-tool","title":"Configuration generation helper tool","text":"Creating the right Cinder configuration files to deploy using Operators may sometimes be a complicated experience, especially the first times, so we have a helper tool that can create a draft of the files from a cinder.conf
file.
This tool is not meant to be a automation tool, it's mostly to help us get the gist of it, maybe point out some potential pitfalls and reminders.
Attention: The tools requires PyYAML
Python package to be installed (pip install PyYAML
).
This cinder-cfg.py script defaults to reading the cinder.conf
file from the current directory (unless --config
option is used) and outputs files to the current directory (unless --out-dir
option is used).
In the output directory we'll always get a cinder.patch
file with the Cinder specific configuration patch to apply to the OpenStackControlPlane
CR but we may also get an additional file called cinder-prereq.yaml
file with some Secrets
and MachineConfigs
.
Example of an invocation setting input and output explicitly to the defaults for a Ceph backend:
$ python cinder-cfg.py --config cinder.conf --out-dir ./\nWARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using \"extraMounts\" or remove the option.\n\nWARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.\n\nWARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.\n\nWARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!\n\nOutput written at ./: cinder.patch\n
The script outputs some warnings to let us know things we may need to do manually -adding the custom policy, provide the ceph configuration files- and also let us know a change in how the service_user
has been removed.
A different example when using multiple backends, one of them being a 3PAR FC could be:
$ python cinder-cfg.py --config cinder.conf --out-dir ./\nWARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using \"extraMounts\" or remove the option.\n\nERROR:root:Backend hpe_fc requires a vendor container image, but there is no certified image available yet. Patch will use the last known image for reference, but IT WILL NOT WORK\n\nWARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.\n\nWARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.\n\nWARNING:root:Configuration is using FC, please ensure all your OpenShift nodes have HBAs or use labels to ensure that Volume and Backup services are scheduled on nodes with HBAs.\n\nWARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!\n\nOutput written at ./: cinder.patch, cinder-prereq.yaml\n
In this case we can see that there are additional messages, so let's quickly go over them:
cinder.patch
file: cinderVolumes:\n hpe-fc:\n containerImage: registry.connect.redhat.com/hpe3parcinder/openstack-cinder-volume-hpe3parcinder17-0\n
The FC message reminds us that this transport protocol requires specific HBA cards to be present on the nodes where cinder services are running.
In this case we also see that it has created the cinder-prereq.yaml
file and if we look into it we'll see there is one MachineConfig
and one Secret
. The MachineConfig
is called 99-master-cinder-enable-multipathd
and like the name suggests enables multipathing on all the OCP worker nodes. The Secret
is called openstackcinder-volumes-hpe_fc
and contains the 3PAR backend configuration because it has sensitive information (credentials), and in the cinder.patch
file we'll see that it uses this configuration:
cinderVolumes:\n hpe-fc:\n customServiceConfigSecrets:\n - openstackcinder-volumes-hpe_fc\n
Assuming we have already stopped cinder services, prepared the OpenShift nodes, deployed the OpenStack operators and a bare OpenStack manifest, and migrated the database, and prepared the patch manifest with the Cinder service configuration, all that's left is to apply the patch and wait for the operator to apply the changes and deploy the Cinder services.
Our recommendation is to write the patch manifest into a file, for example cinder.patch
and then apply it with something like:
oc patch openstackcontrolplane openstack --type=merge --patch-file=cinder.patch\n
For example, for the RBD deployment from the Development Guide the cinder.patch
would look like this:
spec:\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/ceph\n name: ceph\n readOnly: true\n propagation:\n - CinderVolume\n - CinderBackup\n - Glance\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n cinder:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n cinderAPI:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n default_volume_type=tripleo\n cinderScheduler:\n replicas: 1\n cinderBackup:\n networkAttachments:\n - storage\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n backup_driver=cinder.backup.drivers.ceph.CephBackupDriver\n backup_ceph_conf=/etc/ceph/ceph.conf\n backup_ceph_user=openstack\n backup_ceph_pool=backups\n cinderVolumes:\n ceph:\n networkAttachments:\n - storage\n replicas: 1\n customServiceConfig: |\n [tripleo_ceph]\n backend_host=hostgroup\n volume_backend_name=tripleo_ceph\n volume_driver=cinder.volume.drivers.rbd.RBDDriver\n rbd_ceph_conf=/etc/ceph/ceph.conf\n rbd_user=openstack\n rbd_pool=volumes\n rbd_flatten_volume_from_snapshot=False\n report_discard_supported=True\n
Once the services have been deployed we'll need to clean up the old scheduler and backup services which will appear as being down while we have others that appear as being up:
openstack volume service list\n\n+------------------+------------------------+------+---------+-------+----------------------------+\n| Binary | Host | Zone | Status | State | Updated At |\n+------------------+------------------------+------+---------+-------+----------------------------+\n| cinder-backup | standalone.localdomain | nova | enabled | down | 2023-06-28T11:00:59.000000 |\n| cinder-scheduler | standalone.localdomain | nova | enabled | down | 2023-06-28T11:00:29.000000 |\n| cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2023-06-28T17:00:03.000000 |\n| cinder-scheduler | cinder-scheduler-0 | nova | enabled | up | 2023-06-28T17:00:02.000000 |\n| cinder-backup | cinder-backup-0 | nova | enabled | up | 2023-06-28T17:00:01.000000 |\n+------------------+------------------------+------+---------+-------+----------------------------+\n
In this case we need to remove services for hosts standalone.localdomain
oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-backup standalone.localdomain\noc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-scheduler standalone.localdomain\n
The reason why we haven't preserved the name of the backup service is because we have taken the opportunity to change its configuration to support Active-Active, even though we are not doing so right now because we have 1 replica.
Now that we have the Cinder services running we know that the DB schema migration has been completed and we can proceed to apply the DB data migrations. While it is not necessary to run these data migrations at this precise moment, because we can just run them right before the next upgrade, we consider that for adoption it's best to run them now to make sure there are no issues before running production workloads on the deployment.
The command to run the DB data migrations is:
oc exec -it cinder-scheduler-0 -- cinder-manage db online_data_migrations\n
"},{"location":"openstack/cinder_adoption/#post-checks","title":"Post-checks","text":"Before we can run any checks we need to set the right cloud configuration for the openstack
command to be able to connect to our OpenShift control plane.
Just like we did in the KeyStone adoption step we ensure we have the openstack
alias defined:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n
Now we can run a set of tests to confirm that the deployment is there using our old database contents:
openstack endpoint list --service cinderv3\n
openstack volume service list\n
openstack volume type list\nopenstack volume list\nopenstack volume snapshot list\nopenstack volume backup list\n
To confirm that everything not only looks good but it's also properly working we recommend doing some basic operations:
Create a volume from an image to check that the connection to glance is working.
openstack volume create --image cirros --bootable --size 1 disk_new\n
Backup the old attached volume to a new backup. Example:
openstack --os-volume-api-version 3.47 volume create --backup backup restored\n
We don't boot a nova instance using the new volume from image or try to detach the old volume because nova and cinder are still not connected.
"},{"location":"openstack/edpm_adoption/","title":"EDPM adoption","text":""},{"location":"openstack/edpm_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/edpm_adoption/#pre-checks","title":"Pre-checks","text":"oc apply -f - <<EOF\napiVersion: network.openstack.org/v1beta1\nkind: NetConfig\nmetadata:\n name: netconfig\nspec:\n networks:\n - name: CtlPlane\n dnsDomain: ctlplane.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 192.168.122.120\n start: 192.168.122.100\n - end: 192.168.122.200\n start: 192.168.122.150\n cidr: 192.168.122.0/24\n gateway: 192.168.122.1\n - name: InternalApi\n dnsDomain: internalapi.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.17.0.250\n start: 172.17.0.100\n cidr: 172.17.0.0/24\n vlan: 20\n - name: External\n dnsDomain: external.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 10.0.0.250\n start: 10.0.0.100\n cidr: 10.0.0.0/24\n gateway: 10.0.0.1\n - name: Storage\n dnsDomain: storage.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.18.0.250\n start: 172.18.0.100\n cidr: 172.18.0.0/24\n vlan: 21\n - name: StorageMgmt\n dnsDomain: storagemgmt.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.20.0.250\n start: 172.20.0.100\n cidr: 172.20.0.0/24\n vlan: 23\n - name: Tenant\n dnsDomain: tenant.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.19.0.250\n start: 172.19.0.100\n cidr: 172.19.0.0/24\n vlan: 22\nEOF\n
"},{"location":"openstack/edpm_adoption/#procedure-edpm-adoption","title":"Procedure - EDPM adoption","text":"oc apply -f - <<EOF\napiVersion: v1\nkind: Secret\nmetadata:\n name: dataplane-adoption-secret\n namespace: openstack\ndata:\n ssh-privatekey: |\n$(cat ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa | base64 | sed 's/^/ /')\nEOF\n
nova-migration-ssh-key
secretcd \"$(mktemp -d)\"\nssh-keygen -f ./id -t ed25519 -N ''\noc get secret nova-migration-ssh-key || oc create secret generic nova-migration-ssh-key \\\n -n openstack \\\n --from-file=ssh-privatekey=id \\\n --from-file=ssh-publickey=id.pub \\\n --type kubernetes.io/ssh-auth\nrm -f id*\ncd -\n
oc apply -f - <<EOF\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: nova-compute-extraconfig\n namespace: openstack\ndata:\n 19-nova-compute-cell1-workarounds.conf: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n---\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneService\nmetadata:\n name: nova-compute-extraconfig\n namespace: openstack\nspec:\n label: nova.compute.extraconfig\n configMaps:\n - nova-compute-extraconfig\n secrets:\n - nova-cell1-compute-config\n - nova-migration-ssh-key\n playbook: osp.edpm.nova\nEOF\n
The secret nova-cell<X>-compute-config
is auto-generated for each cell<X>
. That secret, alongside nova-migration-ssh-key
, should always be specified for each custom OpenStackDataPlaneService
related to Nova.
oc apply -f - <<EOF\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneNodeSet\nmetadata:\n name: openstack\nspec:\n networkAttachments:\n - ctlplane\n preProvisioned: true\n services:\n - download-cache\n - configure-network\n - validate-network\n - install-os\n - configure-os\n - run-os\n - libvirt\n - nova-compute-extraconfig\n - ovn\n env:\n - name: ANSIBLE_CALLBACKS_ENABLED\n value: \"profile_tasks\"\n - name: ANSIBLE_FORCE_COLOR\n value: \"True\"\n nodes:\n standalone:\n hostName: standalone\n ansible:\n ansibleHost: 192.168.122.100\n networks:\n - defaultRoute: true\n fixedIP: 192.168.122.100\n name: CtlPlane\n subnetName: subnet1\n - name: InternalApi\n subnetName: subnet1\n - name: Storage\n subnetName: subnet1\n - name: Tenant\n subnetName: subnet1\n nodeTemplate:\n ansibleSSHPrivateKeySecret: dataplane-adoption-secret\n managementNetwork: ctlplane\n ansible:\n ansibleUser: root\n ansiblePort: 22\n ansibleVars:\n service_net_map:\n nova_api_network: internal_api\n nova_libvirt_network: internal_api\n\n # edpm_network_config\n # Default nic config template for a EDPM compute node\n # These vars are edpm_network_config role vars\n edpm_network_config_override: \"\"\n edpm_network_config_template: |\n ---\n {% set mtu_list = [ctlplane_mtu] %}\n {% for network in role_networks %}\n {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}\n {%- endfor %}\n {% set min_viable_mtu = mtu_list | max %}\n network_config:\n - type: ovs_bridge\n name: {{ neutron_physical_bridge_name }}\n mtu: {{ min_viable_mtu }}\n use_dhcp: false\n dns_servers: {{ ctlplane_dns_nameservers }}\n domain: {{ dns_search_domains }}\n addresses:\n - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}\n routes: {{ ctlplane_host_routes }}\n members:\n - type: interface\n name: nic1\n mtu: {{ min_viable_mtu }}\n # force the MAC address of the bridge to this interface\n primary: true\n {% for network in role_networks %}\n - type: vlan\n mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}\n vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}\n addresses:\n - ip_netmask:\n {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}\n routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}\n {% endfor %}\n\n edpm_network_config_hide_sensitive_logs: false\n #\n # These vars are for the network config templates themselves and are\n # considered EDPM network defaults.\n neutron_physical_bridge_name: br-ctlplane\n neutron_public_interface_name: eth0\n role_networks:\n - InternalApi\n - Storage\n - Tenant\n networks_lower:\n External: external\n InternalApi: internal_api\n Storage: storage\n Tenant: tenant\n\n # edpm_nodes_validation\n edpm_nodes_validation_validate_controllers_icmp: false\n edpm_nodes_validation_validate_gateway_icmp: false\n\n timesync_ntp_servers:\n - hostname: clock.redhat.com\n - hostname: clock2.redhat.com\n\n edpm_ovn_controller_agent_image: quay.io/podified-antelope-centos9/openstack-ovn-controller:current-podified\n edpm_iscsid_image: quay.io/podified-antelope-centos9/openstack-iscsid:current-podified\n edpm_logrotate_crond_image: quay.io/podified-antelope-centos9/openstack-cron:current-podified\n edpm_nova_compute_container_image: quay.io/podified-antelope-centos9/openstack-nova-compute:current-podified\n edpm_nova_libvirt_container_image: quay.io/podified-antelope-centos9/openstack-nova-libvirt:current-podified\n edpm_ovn_metadata_agent_image: quay.io/podified-antelope-centos9/openstack-neutron-metadata-agent-ovn:current-podified\n\n gather_facts: false\n enable_debug: false\n # edpm firewall, change the allowed CIDR if needed\n edpm_sshd_configure_firewall: true\n edpm_sshd_allowed_ranges: ['192.168.122.0/24']\n # SELinux module\n edpm_selinux_mode: enforcing\n plan: overcloud\nEOF\n
oc apply -f - <<EOF\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneDeployment\nmetadata:\n name: openstack\nspec:\n nodeSets:\n - openstack\nEOF\n
"},{"location":"openstack/edpm_adoption/#post-checks","title":"Post-checks","text":"Check if all the Ansible EE pods reaches Completed
status:
# watching the pods\nwatch oc get pod -l app=openstackansibleee\n
# following the ansible logs with:\noc logs -l app=openstackansibleee -f --max-log-requests 10\n
Wait for the dataplane node set to reach the Ready status:
oc wait --for condition=Ready osdpns/openstack --timeout=30m\n
Adopting Glance means that an existing OpenStackControlPlane
CR, where Glance is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
When the procedure is over, the expectation is to see the GlanceAPI
service up and running: the Keystone endpoints
should be updated and the same backend of the source Cloud will be available. If the conditions above are met, the adoption is considered concluded.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;SNO
/ CodeReadyContainers
is running on the other side;Ceph
cluster is reachable by both crc
and TripleO
As already done for Keystone, the Glance Adoption follows the same pattern.
"},{"location":"openstack/glance_adoption/#using-local-storage-backend","title":"Using local storage backend","text":"When Glance should be deployed with local storage backend (not Ceph), patch OpenStackControlPlane to deploy Glance:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n glance:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n storageClass: \"local-storage\"\n storageRequest: 10G\n glanceAPI:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n networkAttachments:\n - storage\n'\n
"},{"location":"openstack/glance_adoption/#using-ceph-storage-backend","title":"Using Ceph storage backend","text":"If a Ceph backend is used, the customServiceConfig
parameter should be used to inject the right configuration to the GlanceAPI
instance.
Make sure the Ceph-related secret (ceph-conf-files
) was created in the openstack
namespace and that the extraMounts
property of the OpenStackControlPlane
CR has been configured properly. These tasks are described in an earlier Adoption step Ceph storage backend configuration.
cat << EOF > glance_patch.yaml\nspec:\n glance:\n enabled: true\n template:\n databaseInstance: openstack\n customServiceConfig: |\n [DEFAULT]\n enabled_backends=default_backend:rbd\n [glance_store]\n default_backend=default_backend\n [default_backend]\n rbd_store_ceph_conf=/etc/ceph/ceph.conf\n rbd_store_user=openstack\n rbd_store_pool=images\n store_description=Ceph glance store backend.\n storageClass: \"local-storage\"\n storageRequest: 10G\n glanceAPI:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n networkAttachments:\n - storage\nEOF\n
If you have previously backup your Openstack services configuration file from the old environment: pull openstack configuration os-diff you can use os-diff to compare and make sure the configuration is correct.
pushd os-diff\n./os-diff cdiff --service glance -c /tmp/collect_tripleo_configs/glance/etc/glance/glance-api.conf -o glance_patch.yaml\n
This will producre the difference between both ini configuration files.
Patch OpenStackControlPlane to deploy Glance with Ceph backend:
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_patch.yaml\n
"},{"location":"openstack/glance_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/glance_adoption/#test-the-glance-service-from-the-openstack-cli","title":"Test the glance service from the OpenStack CLI","text":"You can compare and make sure the configuration has been correctly applied to the glance pods by running
./os-diff cdiff --service glance -c /etc/glance/glance.conf.d/02-config.conf -o glance_patch.yaml --frompod -p glance-api\n
If no line appear, then the configuration is correctly done.
Inspect the resulting glance pods:
GLANCE_POD=`oc get pod |grep glance-external-api | cut -f 1 -d' '`\noc exec -t $GLANCE_POD -c glance-api -- cat /etc/glance/glance.conf.d/02-config.conf\n\n[DEFAULT]\nenabled_backends=default_backend:rbd\n[glance_store]\ndefault_backend=default_backend\n[default_backend]\nrbd_store_ceph_conf=/etc/ceph/ceph.conf\nrbd_store_user=openstack\nrbd_store_pool=images\nstore_description=Ceph glance store backend.\n\noc exec -t $GLANCE_POD -c glance-api -- ls /etc/ceph\nceph.client.openstack.keyring\nceph.conf\n
Ceph secrets are properly mounted, at this point let's move to the OpenStack CLI and check the service is active and the endpoints are properly updated.
(openstack)$ service list | grep image\n\n| fc52dbffef36434d906eeb99adfc6186 | glance | image |\n\n(openstack)$ endpoint list | grep image\n\n| 569ed81064f84d4a91e0d2d807e4c1f1 | regionOne | glance | image | True | internal | http://glance-internal-openstack.apps-crc.testing |\n| 5843fae70cba4e73b29d4aff3e8b616c | regionOne | glance | image | True | public | http://glance-public-openstack.apps-crc.testing |\n| 709859219bc24ab9ac548eab74ad4dd5 | regionOne | glance | image | True | admin | http://glance-admin-openstack.apps-crc.testing |\n
Check the images that we previously listed in the source Cloud are available in the adopted service:
(openstack)$ image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |\n+--------------------------------------+--------+--------+\n
"},{"location":"openstack/glance_adoption/#image-upload","title":"Image upload","text":"We can test that an image can be created on from the adopted service.
(openstack)$ alias openstack=\"oc exec -t openstackclient -- openstack\"\n(openstack)$ curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img\n qemu-img convert -O raw /tmp/cirros-0.5.2-x86_64-disk.img /tmp/cirros-0.5.2-x86_64-disk.img.raw\n openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img.raw cirros2\n openstack image list\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 273 100 273 0 0 1525 0 --:--:-- --:--:-- --:--:-- 1533\n 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n100 15.5M 100 15.5M 0 0 17.4M 0 --:--:-- --:--:-- --:--:-- 17.4M\n\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n| Field | Value |\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n| container_format | bare |\n| created_at | 2023-01-31T21:12:56Z |\n| disk_format | raw |\n| file | /v2/images/46a3eac1-7224-40bc-9083-f2f0cd122ba4/file |\n| id | 46a3eac1-7224-40bc-9083-f2f0cd122ba4 |\n| min_disk | 0 |\n| min_ram | 0 |\n| name | cirros |\n| owner | 9f7e8fdc50f34b658cfaee9c48e5e12d |\n| properties | os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros', owner_specified.openstack.sha256='' |\n| protected | False |\n| schema | /v2/schemas/image |\n| status | queued |\n| tags | |\n| updated_at | 2023-01-31T21:12:56Z |\n| visibility | shared |\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 46a3eac1-7224-40bc-9083-f2f0cd122ba4 | cirros2| active |\n| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |\n+--------------------------------------+--------+--------+\n\n\n(openstack)$ oc rsh ceph\nsh-4.4$ ceph -s\nr cluster:\n id: 432d9a34-9cee-4109-b705-0c59e8973983\n health: HEALTH_OK\n\n services:\n mon: 1 daemons, quorum a (age 4h)\n mgr: a(active, since 4h)\n osd: 1 osds: 1 up (since 4h), 1 in (since 4h)\n\n data:\n pools: 5 pools, 160 pgs\n objects: 46 objects, 224 MiB\n usage: 247 MiB used, 6.8 GiB / 7.0 GiB avail\n pgs: 160 active+clean\n\nsh-4.4$ rbd -p images ls\n46a3eac1-7224-40bc-9083-f2f0cd122ba4\nc3158cad-d50b-452f-bec1-f250562f5c1f\n
"},{"location":"openstack/heat_adoption/","title":"Heat adoption","text":"Adopting Heat means that an existing OpenStackControlPlane
CR, where Heat is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
After the adoption process has been completed, a user can expect that they will then have CR's for Heat
, HeatAPI
, HeatEngine
and HeatCFNAPI
. Additionally, a user should have endpoints created within Keystone to facilitate the above mentioned servies.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;As already done for Keystone, the Heat Adoption follows a similar pattern.
Patch the osp-secret
to update the HeatAuthEncryptionKey
and HeatPassword
. This needs to match what you have configured in the existing TripleO Heat configuration.
You can retrieve and verify the existing auth_encryption_key
and service
passwords via:
[stack@rhosp17 ~]$ grep -E 'HeatPassword|HeatAuth' ~/overcloud-deploy/overcloud/overcloud-passwords.yaml\n HeatAuthEncryptionKey: Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2\n HeatPassword: dU2N0Vr2bdelYH7eQonAwPfI3\n
And verifying on one of the Controllers that this is indeed the value in use:
[stack@rhosp17 ~]$ ansible -i overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml overcloud-controller-0 -m shell -a \"grep auth_encryption_key /var/lib/config-data/puppet-generated/heat/etc/heat/heat.conf | grep -Ev '^#|^$'\" -b\novercloud-controller-0 | CHANGED | rc=0 >>\nauth_encryption_key=Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2\n
This password needs to be base64 encoded and added to the osp-secret
\u276f echo Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2 | base64\nUTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK\n\n\u276f oc patch secret osp-secret --type='json' -p='[{\"op\" : \"replace\" ,\"path\" : \"/data/HeatAuthEncryptionKey\" ,\"value\" : \"UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK\"}]'\nsecret/osp-secret patched\n
Patch OpenStackControlPlane to deploy Heat:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n heat:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n memcachedInstance: memcached\n passwordSelectors:\n authEncryptionKey: HeatAuthEncryptionKey\n database: HeatDatabasePassword\n service: HeatPassword\n'\n
"},{"location":"openstack/heat_adoption/#post-checks","title":"Post-checks","text":"Ensure all of the CR's reach the \"Setup Complete\" state:
\u276f oc get Heat,HeatAPI,HeatEngine,HeatCFNAPI\nNAME STATUS MESSAGE\nheat.heat.openstack.org/heat True Setup complete\n\nNAME STATUS MESSAGE\nheatapi.heat.openstack.org/heat-api True Setup complete\n\nNAME STATUS MESSAGE\nheatengine.heat.openstack.org/heat-engine True Setup complete\n\nNAME STATUS MESSAGE\nheatcfnapi.heat.openstack.org/heat-cfnapi True Setup complete\n
"},{"location":"openstack/heat_adoption/#check-that-heat-service-is-registered-in-keystone","title":"Check that Heat service is registered in Keystone","text":" oc exec -it openstackclient -- openstack service list -c Name -c Type\n+------------+----------------+\n| Name | Type |\n+------------+----------------+\n| heat | orchestration |\n| glance | image |\n| heat-cfn | cloudformation |\n| ceilometer | Ceilometer |\n| keystone | identity |\n| placement | placement |\n| cinderv3 | volumev3 |\n| nova | compute |\n| neutron | network |\n+------------+----------------+\n
\u276f oc exec -it openstackclient -- openstack endpoint list --service=heat -f yaml\n- Enabled: true\n ID: 1da7df5b25b94d1cae85e3ad736b25a5\n Interface: public\n Region: regionOne\n Service Name: heat\n Service Type: orchestration\n URL: http://heat-api-public-openstack-operators.apps.okd.bne-shift.net/v1/%(tenant_id)s\n- Enabled: true\n ID: 414dd03d8e9d462988113ea0e3a330b0\n Interface: internal\n Region: regionOne\n Service Name: heat\n Service Type: orchestration\n URL: http://heat-api-internal.openstack-operators.svc:8004/v1/%(tenant_id)s\n
"},{"location":"openstack/heat_adoption/#check-heat-engine-services-are-up","title":"Check Heat engine services are up","text":" oc exec -it openstackclient -- openstack orchestration service list -f yaml\n- Binary: heat-engine\n Engine ID: b16ad899-815a-4b0c-9f2e-e6d9c74aa200\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:01.000000'\n- Binary: heat-engine\n Engine ID: 887ed392-0799-4310-b95c-ac2d3e6f965f\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:00.000000'\n- Binary: heat-engine\n Engine ID: 26ed9668-b3f2-48aa-92e8-2862252485ea\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:00.000000'\n- Binary: heat-engine\n Engine ID: 1011943b-9fea-4f53-b543-d841297245fd\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:01.000000'\n
"},{"location":"openstack/heat_adoption/#verify-you-can-now-see-your-heat-stacks-again","title":"Verify you can now see your Heat stacks again","text":"We can now test that user can create networks, subnets, ports, routers etc.
\u276f openstack stack list -f yaml\n- Creation Time: '2023-10-11T22:03:20Z'\n ID: 20f95925-7443-49cb-9561-a1ab736749ba\n Project: 4eacd0d1cab04427bc315805c28e66c9\n Stack Name: test-networks\n Stack Status: CREATE_COMPLETE\n Updated Time: null\n
"},{"location":"openstack/horizon_adoption/","title":"Horizon adoption","text":""},{"location":"openstack/horizon_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/horizon_adoption/#procedure-horizon-adoption","title":"Procedure - Horizon adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n horizon:\n enabled: true\n apiOverride:\n route: {}\n template:\n memcachedInstance: memcached\n secret: osp-secret\n'\n
"},{"location":"openstack/horizon_adoption/#post-checks","title":"Post-checks","text":"oc get horizon\n
200
PUBLIC_URL=$(oc get horizon horizon -o jsonpath='{.status.endpoint}')\ncurl --silent --output /dev/stderr --head --write-out \"%{http_code}\" \"$PUBLIC_URL/dashboard/auth/login/?next=/dashboard/\" | grep 200\n
"},{"location":"openstack/ironic_adoption/","title":"Ironic adoption","text":""},{"location":"openstack/ironic_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/ironic_adoption/#pre-checks","title":"Pre-checks","text":"TODO
"},{"location":"openstack/ironic_adoption/#procedure-ironic-adoption","title":"Procedure - Ironic adoption","text":"TODO
"},{"location":"openstack/ironic_adoption/#post-checks","title":"Post-checks","text":"TODO
"},{"location":"openstack/keystone_adoption/","title":"Keystone adoption","text":""},{"location":"openstack/keystone_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/keystone_adoption/#pre-checks","title":"Pre-checks","text":""},{"location":"openstack/keystone_adoption/#procedure-keystone-adoption","title":"Procedure - Keystone adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n keystone:\n enabled: true\n apiOverride:\n route: {}\n template:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n databaseInstance: openstack\n secret: osp-secret\n'\n
openstack
command in the adopted deployment:alias openstack=\"oc exec -t openstackclient -- openstack\"\n
openstack endpoint list | grep keystone | awk '/admin/{ print $2; }' | xargs ${BASH_ALIASES[openstack]} endpoint delete || true\n\nfor service in cinderv3 glance manila manilav2 neutron nova placement swift; do\n openstack service list | awk \"/ $service /{ print \\$2; }\" | xargs ${BASH_ALIASES[openstack]} service delete || true\ndone\n
"},{"location":"openstack/keystone_adoption/#post-checks","title":"Post-checks","text":"openstack endpoint list | grep keystone\n
"},{"location":"openstack/manila_adoption/","title":"Manila adoption","text":"OpenStack Manila is the Shared File Systems service. It provides OpenStack users with a self-service API to create and manage file shares. File shares (or simply, \"shares\"), are built for concurrent read/write access by any number of clients. This, coupled with the inherent elasticity of the underlying storage makes the Shared File Systems service essential in cloud environments with require RWX (\"read write many\") persistent storage.
"},{"location":"openstack/manila_adoption/#networking","title":"Networking","text":"File shares in OpenStack are accessed directly over a network. Hence, it is essential to plan the networking of the cloud to create a successful and sustainable orchestration layer for shared file systems.
Manila supports two levels of storage networking abstractions - one where users can directly control the networking for their respective file shares; and another where the storage networking is configured by the OpenStack administrator. It is important to ensure that the networking in the Red Hat OpenStack Platform 17.1 matches the network plans for your new cloud after adoption. This ensures that tenant workloads remain connected to storage through the adoption process, even as the control plane suffers a minor interruption. Manila's control plane services are not in the data path; and shutting down the API, scheduler and share manager services will not impact access to existing shared file systems.
Typically, storage and storage device management networks are separate. Manila services only need access to the storage device management network. For example, if a Ceph cluster was used in the deployment, the \"storage\" network refers to the Ceph cluster's public network, and Manila's share manager service needs to be able to reach it.
"},{"location":"openstack/manila_adoption/#prerequisites","title":"Prerequisites","text":"manila-share
service will be deployed can reach the management network that the storage system is in.driver_handles_share_servers=True
), ensure that neutron has been deployed prior to adopting manila services.Define the CONTROLLER1_SSH
environment variable, if it hasn't been defined already. Then copy the configuration file from RHOSP 17.1 for reference.
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/manila/etc/manila/manila.conf | awk '!/^ *#/ && NF' > ~/manila.conf\n
Review this configuration, alongside any configuration changes that were noted since RHOSP 17.1. Not all of it makes sense to bring into the new cloud environment:
[database]
), service authentication (auth_strategy
, [keystone_authtoken]
), message bus configuration (transport_url
, control_exchange
), the default paste config (api_paste_config
) and inter-service communication configuration ([neutron]
, [nova]
, [cinder]
, [glance]
[oslo_messaging_*]
). So all of these can be ignored.osapi_share_listen
configuration. In RHOSP 18, we rely on OpenShift's routes and ingress.ConfigMap
. The following sample spec illustrates how a ConfigMap
called manila-policy
can be set up with the contents of a file called policy.yaml
. spec:\n manila:\n enabled: true\n template:\n manilaAPI:\n customServiceConfig: |\n [oslo_policy]\n policy_file=/etc/manila/policy.yaml\n extraMounts:\n - extraVol:\n - extraVolType: Undefined\n mounts:\n - mountPath: /etc/manila/\n name: policy\n readOnly: true\n propagation:\n - ManilaAPI\n volumes:\n - name: policy\n projected:\n sources:\n - configMap:\n name: manila-policy\n items:\n - key: policy\n path: policy.yaml\n
- The Manila API service needs the enabled_share_protocols
option to be added in the customServiceConfig
section in manila: template: manilaAPI
. - If you had scheduler overrides, add them to the customServiceConfig
section in manila: template: manilaScheduler
. - If you had multiple storage backend drivers configured with RHOSP 17.1, you will need to split them up when deploying RHOSP 18. Each storage backend driver needs to use its own instance of the manila-share
service. - If a storage backend driver needs a custom container image, find it on the RHOSP Ecosystem Catalog and set manila: template: manilaShares: <custom name> : containerImage
value. The following example illustrates multiple storage backend drivers, using custom container images. spec:\n manila:\n enabled: true\n template:\n manilaAPI:\n customServiceConfig: |\n [DEFAULT]\n enabled_share_protocols = nfs\n replicas: 3\n manilaScheduler:\n replicas: 3\n manilaShares:\n netapp:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends = netapp\n [netapp]\n driver_handles_share_servers = False\n share_backend_name = netapp\n share_driver = manila.share.drivers.netapp.common.NetAppDriver\n netapp_storage_family = ontap_cluster\n netapp_transport_type = http\n replicas: 1\n pure:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends=pure-1\n [pure-1]\n driver_handles_share_servers = False\n share_backend_name = pure-1\n share_driver = manila.share.drivers.purestorage.flashblade.FlashBladeShareDriver\n flashblade_mgmt_vip = 203.0.113.15\n flashblade_data_vip = 203.0.10.14\n containerImage: registry.connect.redhat.com/purestorage/openstack-manila-share-pure-rhosp-18-0\n replicas: 1\n
customServiceConfigSecrets
key. An example:cat << __EOF__ > ~/netapp_secrets.conf\n\n[netapp]\nnetapp_server_hostname = 203.0.113.10\nnetapp_login = fancy_netapp_user\nnetapp_password = secret_netapp_password\nnetapp_vserver = mydatavserver\n__EOF__\n\noc create secret generic osp-secret-manila-netapp --from-file=~/netapp_secrets.conf -n openstack\n
customConfigSecrets
can be used in any service, the following is a config example using the secret we created as above. spec:\n manila:\n enabled: true\n template:\n < . . . >\n manilaShares:\n netapp:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends = netapp\n [netapp]\n driver_handles_share_servers = False\n share_backend_name = netapp\n share_driver = manila.share.drivers.netapp.common.NetAppDriver\n netapp_storage_family = ontap_cluster\n netapp_transport_type = http\n customServiceConfigSecrets:\n - osp-secret-manila-netapp\n replicas: 1\n < . . . >\n
extraMounts
. For example, when using ceph, you'd need Manila's ceph user's keyring file as well as the ceph.conf
configuration file available. These are mounted via extraMounts
as done with the example below.share_backend_name
) remain as they did on RHOSP 17.1.manilaAPI
service and the manilaScheduler
service to 3. You should ensure to set the replica count of the manilaShares
service/s to 1. manilaShares
section. The example below connects the manilaShares
instance with the CephFS backend driver to the storage
network. Patch OpenStackControlPlane to deploy Manila; here's an example that uses Native CephFS:
cat << __EOF__ > ~/manila.patch\nspec:\n manila:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n manilaAPI:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n template:\n manilaAPI:\n replicas: 3\n customServiceConfig: |\n [DEFAULT]\n enabled_share_protocols = cephfs\n manilaScheduler:\n replicas: 3\n manilaShares:\n cephfs:\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n enabled_share_backends = tripleo_ceph\n [tripleo_ceph]\n driver_handles_share_servers=False\n share_backend_name=tripleo_ceph\n share_driver=manila.share.drivers.cephfs.driver.CephFSDriver\n cephfs_conf_path=/etc/ceph/ceph.conf\n cephfs_auth_id=openstack\n cephfs_cluster_name=ceph\n cephfs_volume_mode=0755\n cephfs_protocol_helper_type=CEPHFS\n networkAttachments:\n - storage\n__EOF__\n
oc patch openstackcontrolplane openstack --type=merge --patch-file=~/manila.patch\n
"},{"location":"openstack/manila_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/manila_adoption/#inspect-the-resulting-manila-service-pods","title":"Inspect the resulting manila service pods","text":"oc get pods -l service=manila \n
"},{"location":"openstack/manila_adoption/#check-that-manila-api-service-is-registered-in-keystone","title":"Check that Manila API service is registered in Keystone","text":"openstack service list | grep manila\n
openstack endpoint list | grep manila\n\n| 1164c70045d34b959e889846f9959c0e | regionOne | manila | share | True | internal | http://manila-internal.openstack.svc:8786/v1/%(project_id)s |\n| 63e89296522d4b28a9af56586641590c | regionOne | manilav2 | sharev2 | True | public | https://manila-public-openstack.apps-crc.testing/v2 |\n| af36c57adcdf4d50b10f484b616764cc | regionOne | manila | share | True | public | https://manila-public-openstack.apps-crc.testing/v1/%(project_id)s |\n| d655b4390d7544a29ce4ea356cc2b547 | regionOne | manilav2 | sharev2 | True | internal | http://manila-internal.openstack.svc:8786/v2 |\n
"},{"location":"openstack/manila_adoption/#verify-resources","title":"Verify resources","text":"We can now test the health of the service
openstack share service list\nopenstack share pool list --detail\n
We can check on existing workloads
openstack share list\nopenstack share snapshot list\n
We can create further resources
openstack share create cephfs 10 --snapshot mysharesnap --name myshareclone\n
"},{"location":"openstack/mariadb_copy/","title":"MariaDB data copy","text":"This document describes how to move the databases from the original OpenStack deployment to the MariaDB instances in the OpenShift cluster.
NOTE This example scenario describes a simple single-cell setup. Real multi-stack topology recommended for production use results in different cells DBs layout, and should be using different naming schemes (not covered here this time).
"},{"location":"openstack/mariadb_copy/#prerequisites","title":"Prerequisites","text":"Make sure the previous Adoption steps have been performed successfully.
The OpenStackControlPlane resource must be already created at this point.
Podified MariaDB and RabbitMQ are running. No other podified control plane services are running.
OpenStack services have been stopped
There must be network routability between:
The adoption host and the original MariaDB.
The adoption host and the podified MariaDB.
Note that this routability requirement may change in the future, e.g. we may require routability from the original MariaDB to podified MariaDB.
Podman package is installed
CONTROLLER1_SSH
, CONTROLLER2_SSH
, and CONTROLLER3_SSH
are configured.
Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified\n\nPODIFIED_MARIADB_IP=$(oc get svc --selector \"cr=mariadb-openstack\" -ojsonpath='{.items[0].spec.clusterIP}')\nPODIFIED_CELL1_MARIADB_IP=$(oc get svc --selector \"cr=mariadb-openstack-cell1\" -ojsonpath='{.items[0].spec.clusterIP}')\nPODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)\n\n# Replace with your environment's MariaDB IP:\nSOURCE_MARIADB_IP=192.168.122.100\nSOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')\n\n# The CHARACTER_SET and collation should match the source DB\n# if the do not then it will break foreign key relationships\n# for any tables that are created in the future as part of db sync\nCHARACTER_SET=utf8\nCOLLATION=utf8_general_ci\n
"},{"location":"openstack/mariadb_copy/#pre-checks","title":"Pre-checks","text":"podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \\\n mysql -h \"$SOURCE_MARIADB_IP\" -uroot \"-p$SOURCE_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\n
podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \\\n mysqlcheck --all-databases -h $SOURCE_MARIADB_IP -u root \"-p$SOURCE_DB_ROOT_PASSWORD\" | grep -v OK\n
oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\n mysql -h \"$PODIFIED_MARIADB_IP\" -uroot \"-p$PODIFIED_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\n mysql -h \"$PODIFIED_CELL1_MARIADB_IP\" -uroot \"-p$PODIFIED_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\n
"},{"location":"openstack/mariadb_copy/#procedure-data-copy","title":"Procedure - data copy","text":"NOTE: We'll need to transition Nova services imported later on into a superconductor architecture. For that, delete the old service records in cells DBs, starting from the cell1. New records will be registered with different hostnames provided by the Nova service operator. All Nova services, except the compute agent, have no internal state, and its service records can be safely deleted. Also we need to rename the former default
cell as cell1
.
mkdir ~/adoption-db\ncd ~/adoption-db\n
podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $MARIADB_IMAGE bash <<EOF\n\n# Note we do not want to dump the information and performance schema tables so we filter them\nmysql -h ${SOURCE_MARIADB_IP} -u root \"-p${SOURCE_DB_ROOT_PASSWORD}\" -N -e 'show databases' | grep -E -v 'schema|mysql' | while read dbname; do\n echo \"Dumping \\${dbname}\"\n mysqldump -h $SOURCE_MARIADB_IP -uroot \"-p$SOURCE_DB_ROOT_PASSWORD\" \\\n --single-transaction --complete-insert --skip-lock-tables --lock-tables=0 \\\n \"\\${dbname}\" > \"\\${dbname}\".sql\ndone\n\nEOF\n
# db schemas to rename on import\ndeclare -A db_name_map\ndb_name_map[\"nova\"]=\"nova_cell1\"\ndb_name_map[\"ovs_neutron\"]=\"neutron\"\n\n# db servers to import into\ndeclare -A db_server_map\ndb_server_map[\"default\"]=${PODIFIED_MARIADB_IP}\ndb_server_map[\"nova_cell1\"]=${PODIFIED_CELL1_MARIADB_IP}\n\n# db server root password map\ndeclare -A db_server_password_map\ndb_server_password_map[\"default\"]=${PODIFIED_DB_ROOT_PASSWORD}\ndb_server_password_map[\"nova_cell1\"]=${PODIFIED_DB_ROOT_PASSWORD}\n\nall_db_files=$(ls *.sql)\nfor db_file in ${all_db_files}; do\n db_name=$(echo ${db_file} | awk -F'.' '{ print $1; }')\n if [[ -v \"db_name_map[${db_name}]\" ]]; then\n echo \"renaming ${db_name} to ${db_name_map[${db_name}]}\"\n db_name=${db_name_map[${db_name}]}\n fi\n db_server=${db_server_map[\"default\"]}\n if [[ -v \"db_server_map[${db_name}]\" ]]; then\n db_server=${db_server_map[${db_name}]}\n fi\n db_password=${db_server_password_map[\"default\"]}\n if [[ -v \"db_server_password_map[${db_name}]\" ]]; then\n db_password=${db_server_password_map[${db_name}]}\n fi\n echo \"creating ${db_name} in ${db_server}\"\n container_name=$(echo \"mariadb-client-${db_name}-create\" | sed 's/_/-/g')\n oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \\\n mysql -h \"${db_server}\" -uroot \"-p${db_password}\" << EOF\nCREATE DATABASE IF NOT EXISTS ${db_name} DEFAULT CHARACTER SET ${CHARACTER_SET} DEFAULT COLLATE ${COLLATION};\nEOF\n echo \"importing ${db_name} into ${db_server}\"\n container_name=$(echo \"mariadb-client-${db_name}-restore\" | sed 's/_/-/g')\n oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \\\n mysql -h \"${db_server}\" -uroot \"-p${db_password}\" \"${db_name}\" < \"${db_file}\"\ndone\noc exec -it mariadb-openstack -- mysql --user=root --password=${db_server_password_map[\"default\"]} -e \\\n \"update nova_api.cell_mappings set name='cell1' where name='default';\"\noc exec -it mariadb-openstack-cell1 -- mysql --user=root --password=${db_server_password_map[\"default\"]} -e \\\n \"delete from nova_cell1.services where host not like '%nova-cell1-%' and services.binary != 'nova-compute';\"\n
"},{"location":"openstack/mariadb_copy/#post-checks","title":"Post-checks","text":"oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep keystone\n# ensure neutron db is renamed from ovs_neutron\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep neutron\n# ensure nova cell1 db is extracted to a separate db server and renamed from nova to nova_cell1\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_CELL1_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep nova_cell1\n
mariadb-client
might have returned a pod security warning related to the restricted:latest
security context constraint. This is due to default security context constraints and will not prevent pod creation by the admission controller. You'll see a warning for the short-lived pod but it will not interfere with functionality. For more info visit hereAdopting Neutron means that an existing OpenStackControlPlane
CR, where Neutron is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
When the procedure is over, the expectation is to see the NeutronAPI
service up and running: the Keystone endpoints
should be updated and the same backend of the source Cloud will be available. If the conditions above are met, the adoption is considered concluded.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;SNO
/ CodeReadyContainers
is running on the other side.As already done for Keystone, the Neutron Adoption follows the same pattern.
Patch OpenStackControlPlane to deploy Neutron:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n neutron:\n enabled: true\n apiOverride:\n route: {}\n template:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n databaseInstance: openstack\n secret: osp-secret\n networkAttachments:\n - internalapi\n'\n
"},{"location":"openstack/neutron_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/neutron_adoption/#inspect-the-resulting-neutron-pods","title":"Inspect the resulting neutron pods","text":"NEUTRON_API_POD=`oc get pods -l service=neutron | tail -n 1 | cut -f 1 -d' '`\noc exec -t $NEUTRON_API_POD -c neutron-api -- cat /etc/neutron/neutron.conf\n
"},{"location":"openstack/neutron_adoption/#check-that-neutron-api-service-is-registered-in-keystone","title":"Check that Neutron API service is registered in Keystone","text":"openstack service list | grep network\n
openstack endpoint list | grep network\n\n| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | neutron | network | True | public | http://neutron-public-openstack.apps-crc.testing |\n| b943243e596847a9a317c8ce1800fa98 | regionOne | neutron | network | True | internal | http://neutron-internal.openstack.svc:9696 |\n| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | neutron | network | True | admin | http://192.168.122.99:9696 |\n
"},{"location":"openstack/neutron_adoption/#create-sample-resources","title":"Create sample resources","text":"We can now test that user can create networks, subnets, ports, routers etc.
openstack network create net\nopenstack subnet create --network net --subnet-range 10.0.0.0/24 subnet\nopenstack router create router\n
NOTE: this page should be expanded to include information on SR-IOV adoption.
"},{"location":"openstack/node-selector/","title":"Node Selector","text":"There are a variety of reasons why we may want to restrict the nodes where OpenStack services can be placed:
The mechanism provided by the OpenStack operators to achieve this is through the use of labels.
We would either label the OpenShift nodes or use existing labels they already have, and then use those labels in the OpenStack manifests in the nodeSelector
field.
The nodeSelector
field in the OpenStack manifests follows the standard OpenShift nodeSelector
field, please refer to the OpenShift documentation on the matter additional information.
This field is present at all the different levels of the OpenStack manifests:
OpenStackControlPlane
object.cinder
element in the OpenStackControlPlane
.cinderVolume
element within the cinder
element in the OpenStackControlPlane
.This allows a fine grained control of the placement of the OpenStack services with minimal repetition.
Values of the nodeSelector
are propagated to the next levels unless they are overwritten. This means that a nodeSelector
value at the deployment level will affect all the OpenStack services.
For example we can add label type: openstack
to any 3 OpenShift nodes:
$ oc label nodes worker0 type=openstack\n$ oc label nodes worker1 type=openstack\n$ oc label nodes worker2 type=openstack\n
And then in our OpenStackControlPlane
we can use the label to place all the services in those 3 nodes:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n nodeSelector:\n type: openstack\n< . . . >\n
What if we don't mind where any OpenStack services go but the cinder volume and backup services because we are using FC and we only have HBAs on a subset of nodes? Then we can just use the selector on for those specific services, which for the sake of this example we'll assume they have the label fc_card: true
:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n cinder:\n template:\n cinderVolumes:\n pure_fc:\n nodeSelector:\n fc_card: true\n< . . . >\n lvm-iscsi:\n nodeSelector:\n fc_card: true\n< . . . >\n cinderBackup:\n nodeSelector:\n fc_card: true\n< . . . >\n
The Cinder operator does not currently have the possibility of defining the nodeSelector
in cinderVolumes
, so we need to specify it on each of the backends.
It's possible to leverage labels added by the node feature discovery operator to place OpenStack services.
"},{"location":"openstack/node-selector/#machineconfig","title":"MachineConfig","text":"Some services require us to have services or kernel modules running on the hosts where they run, for example iscsid
or multipathd
daemons, or the nvme-fabrics
kernel module.
For those cases we'll use MachineConfig
manifests, and if we are restricting the nodes we are placing the OpenStack services using the nodeSelector
then we'll also want to limit where the MachineConfig
is applied.
To define where the MachineConfig
can be applied we'll need to use a MachineConfigPool
that links the MachineConfig
to the nodes.
For example to be able to limit MachineConfig
to the 3 OpenShift nodes we marked with the type: openstack
label we would create the MachineConfigPool
like this:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfigPool\nmetadata:\n name: openstack\nspec:\n machineConfigSelector:\n matchLabels:\n machineconfiguration.openshift.io/role: openstack\n nodeSelector:\n matchLabels:\n type: openstack\n
And then we could use it in the MachineConfig
:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: openstack\n< . . . >\n
Refer to the OpenShift documentation for additional information on MachineConfig
and MachineConfigPools
WARNING: Applying a MachineConfig
to an OpenShift node will make the node reboot.
NOTE This example scenario describes a simple single-cell setup. Real multi-stack topology recommended for production use results in different cells DBs layout, and should be using different naming schemes (not covered here this time).
"},{"location":"openstack/nova_adoption/#prerequisites","title":"Prerequisites","text":"Define the shell variables and aliases used in the steps below. The values are just illustrative, use values that are correct for your environment:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n
"},{"location":"openstack/nova_adoption/#procedure-nova-adoption","title":"Procedure - Nova adoption","text":"NOTE: We assume Nova Metadata deployed on the top level and not on each cell level, so this example imports it the same way. If the source deployment has a per cell metadata deployment, adjust the given below patch as needed. Metadata service cannot be run in cell0
.
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '\nspec:\n nova:\n enabled: true\n apiOverride:\n route: {}\n template:\n secret: osp-secret\n apiServiceTemplate:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n metadataServiceTemplate:\n enabled: true # deploy single nova metadata on the top level\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n schedulerServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n cellTemplates:\n cell0:\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n cell1:\n metadataServiceTemplate:\n enabled: false # enable here to run it in a cell instead\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n'\n
oc wait --for condition=Ready --timeout=300s Nova/nova\n
The local Conductor services will be started for each cell, while the superconductor runs in cell0
. Note that disable_compute_service_check_for_ffu
is mandatory for all imported Nova services, until the external dataplane imported, and until Nova Compute services fast-forward upgraded.
openstack endpoint list | grep nova\nopenstack server list\n
"},{"location":"openstack/ovn_adoption/","title":"OVN data migration","text":"This document describes how to move OVN northbound and southbound databases from the original OpenStack deployment to ovsdb-server instances running in the OpenShift cluster.
"},{"location":"openstack/ovn_adoption/#rationale","title":"Rationale","text":"While it may be argued that the podified Neutron ML2/OVN driver and OVN northd service will reconstruct the databases on startup, the reconstruction may be time consuming on large existing clusters. The procedure below allows to speed up data migration and avoid unnecessary data plane disruptions due to incomplete OpenFlow table contents.
"},{"location":"openstack/ovn_adoption/#prerequisites","title":"Prerequisites","text":"Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
OVSDB_IMAGE=quay.io/podified-antelope-centos9/openstack-ovn-base:current-podified\nSOURCE_OVSDB_IP=172.17.1.49\n\n# ssh commands to reach the original controller machines\nCONTROLLER_SSH=\"ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone\"\n\n# ssh commands to reach the original compute machines\nCOMPUTE_SSH=\"ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone\"\n
The real value of the SOURCE_OVSDB_IP
can be get from the puppet generated configs:
grep -rI 'ovn_[ns]b_conn' /var/lib/config-data/puppet-generated/\n
"},{"location":"openstack/ovn_adoption/#procedure","title":"Procedure","text":"${CONTROLLER_SSH} sudo systemctl stop tripleo_ovn_cluster_northd.service\n
client=\"podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE ovsdb-client\"\n${client} backup tcp:$SOURCE_OVSDB_IP:6641 > ovs-nb.db\n${client} backup tcp:$SOURCE_OVSDB_IP:6642 > ovs-sb.db\n
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n ovn:\n\u00a0 \u00a0 enabled: true\n template:\n ovnDBCluster:\n ovndbcluster-nb:\n dbType: NB\n storageRequest: 10G\n networkAttachment: internalapi\n ovndbcluster-sb:\n dbType: SB\n storageRequest: 10G\n networkAttachment: internalapi\n'\n
PODIFIED_OVSDB_NB_IP=$(kubectl get po ovsdbserver-nb-0 -o jsonpath='{.metadata.annotations.k8s\\.v1\\.cni\\.cncf\\.io/network-status}' | jq 'map(. | select(.name==\"openstack/internalapi\"))[0].ips[0]' | tr -d '\"')\nPODIFIED_OVSDB_SB_IP=$(kubectl get po ovsdbserver-sb-0 -o jsonpath='{.metadata.annotations.k8s\\.v1\\.cni\\.cncf\\.io/network-status}' | jq 'map(. | select(.name==\"openstack/internalapi\"))[0].ips[0]' | tr -d '\"')\n
podman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client get-schema tcp:$PODIFIED_OVSDB_NB_IP:6641 > ./ovs-nb.ovsschema && ovsdb-tool convert ovs-nb.db ./ovs-nb.ovsschema\"\npodman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client get-schema tcp:$PODIFIED_OVSDB_SB_IP:6642 > ./ovs-sb.ovsschema && ovsdb-tool convert ovs-sb.db ./ovs-sb.ovsschema\"\n
podman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client restore tcp:$PODIFIED_OVSDB_NB_IP:6641 < ovs-nb.db\"\npodman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client restore tcp:$PODIFIED_OVSDB_SB_IP:6642 < ovs-sb.db\"\n
oc exec -it ovsdbserver-nb-0 -- ovn-nbctl show\noc exec -it ovsdbserver-sb-0 -- ovn-sbctl list Chassis\n
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovs-vsctl set open . external_ids:ovn-remote=tcp:$PODIFIED_OVSDB_SB_IP:6642\n
You should now see the following warning in the ovn_controller
container logs:
2023-03-16T21:40:35Z|03095|ovsdb_cs|WARN|tcp:172.17.1.50:6642: clustered database server has stale data; trying another server\n
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovn-appctl -t ovn-controller sb-cluster-state-reset\n
This should complete connection of the controller process to the new remote. See in logs:
2023-03-16T21:42:31Z|03134|main|INFO|Resetting southbound database cluster state\n2023-03-16T21:42:33Z|03135|reconnect|INFO|tcp:172.17.1.50:6642: connected\n
$ ${COMPUTE_SSH} sudo systemctl restart tripleo_ovn_controller.service\n
ovn-northd
service that will keep OVN northbound and southbound databases in sync.oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n ovn:\n\u00a0 \u00a0 enabled: true\n template:\n ovnNorthd:\n networkAttachment: internalapi\n'\n
"},{"location":"openstack/placement_adoption/","title":"Placement adoption","text":""},{"location":"openstack/placement_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/placement_adoption/#procedure-placement-adoption","title":"Procedure - Placement adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n placement:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n'\n
"},{"location":"openstack/placement_adoption/#post-checks","title":"Post-checks","text":"alias openstack=\"oc exec -t openstackclient -- openstack\"\n\nopenstack endpoint list | grep placement\n\n\n# Without OpenStack CLI placement plugin installed:\nPLACEMENT_PUBLIC_URL=$(openstack endpoint list -c 'Service Name' -c 'Service Type' -c URL | grep placement | grep public | awk '{ print $6; }')\noc exec -t openstackclient -- curl \"$PLACEMENT_PUBLIC_URL\"\n\n# With OpenStack CLI placement plugin installed:\nopenstack resource class list\n
"},{"location":"openstack/planning/","title":"Planning the new deployment","text":"Just like you did back when you installed your Director deployed OpenStack, the upgrade/migration to the podified OpenStack requires planning various aspects of the environment such as node roles, planning your network topology, and storage.
In this document we cover some of this planning, but it is recommended to read the whole adoption guide before actually starting the process to be sure that there is a global understanding of the whole process.
"},{"location":"openstack/planning/#configurations","title":"Configurations","text":"There is a fundamental difference between the Director and Operator deployments regarding the configuration of the services.
In Director deployments many of the service configurations are abstracted by Director specific configuration options. A single Director option may trigger changes for multiple services and support for drivers (for example Cinder's) required patches to the Director code base.
In Operator deployments this has changed to what we believe is a simpler approach: reduce the installer specific knowledge and leverage OpenShift and OpenStack service specific knowledge whenever possible.
To this effect OpenStack services will have sensible defaults for OpenShift deployments and human operators will provide configuration snippets to provide necessary configuration, such as cinder backend configuration, or to override the defaults.
This shortens the distance between a service specific configuration file (such as cinder.conf
) and what the human operator provides in the manifests.
These configuration snippets are passed to the operators in the different customServiceConfig
sections available in the manifests, and then they are layered in the services available in the following levels. To illustrate this, if we were to set a configuration at the top Cinder level (spec: cinder: template:
) then it would be applied to all the cinder services; for example to enable debug in all the cinder services we would do:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n template:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n< . . . >\n
If we only wanted to set it for one of the cinder services, for example the scheduler, then we would use the cinderScheduler
section instead:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n template:\n cinderScheduler:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n< . . . >\n
In openshift it is not recommended to store sensitive information like the credentials to the cinder storage array in the CRs, so most OpenStack operators have a mechanism to use OpenShift's Secrets
for sensitive configuration parameters of the services and then use then by reference in the customServiceConfigSecrets
section which is analogous to the customServiceConfig
.
The contents of the Secret
references passed in the customServiceConfigSecrets
will have the same format as customServiceConfig
: a snippet with the section/s and configuration options.
When there are sensitive information in the service configuration then it becomes a matter of personal preference whether to store all the configuration in the Secret
or only the sensitive parts, but remember that if we split the configuration between Secret
and customServiceConfig
we still need the section header (eg: [DEFAULT]
) to be present in both places.
Attention should be paid to each service's adoption process as they may have some particularities regarding their configuration.
"},{"location":"openstack/planning/#configuration-tooling","title":"Configuration tooling","text":"In order to help users to handle the configuration for the TripleO and Openstack services the tool: https://github.com/openstack-k8s-operators/os-diff has been develop to compare the configuration files between the TripleO deployment and the next gen cloud. Make sure Golang is installed and configured on your env:
git clone https://github.com/openstack-k8s-operators/os-diff\npushd os-diff\nmake build\n
Then configure ansible.cfg and ssh-config file according to your environment:
Host *\n IdentitiesOnly yes\n\nHost virthost\n Hostname virthost\n IdentityFile ~/.ssh/id_rsa\n User root\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n\n\nHost standalone\n Hostname standalone\n IdentityFile ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa\n User root\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n\nHost crc\n Hostname crc\n IdentityFile ~/.ssh/id_rsa\n User stack\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n
And test your connection:
ssh -F ssh.config standalone\n
"},{"location":"openstack/planning/#node-roles","title":"Node Roles","text":"In Director deployments we had 4 different standard roles for the nodes: Controller
, Compute
, Ceph Storage
, Swift Storage
, but in podified OpenStack we just make a distinction based on where things are running, in OpenShift or external to it.
When adopting a Director OpenStack your Compute
nodes will directly become external nodes, so there should not be much additional planning needed there.
In many deployments being adopted the Controller
nodes will require some thought because we'll have many OpenShift nodes where the controller services could run, and we have to decide which ones we want to use, how we are going to use them, and make sure those nodes are ready to run the services.
In most deployments running OpenStack services on master
nodes can have a seriously adverse impact on the OpenShift cluster, so we recommend placing OpenStack services on non master
nodes.
By default OpenStack Operators deploy OpenStack services on any worker node, but that is not necessarily what's best for all deployments, and there may be even services that won't even work deployed like that.
When planing a deployment it's good to remember that not all the services on an OpenStack deployments are the same as they have very different requirements.
Looking at the Cinder component we can clearly see different requirements for its services: the cinder-scheduler is a very light service with low memory, disk, network, and CPU usage; cinder-api service has a higher network usage due to resource listing requests; the cinder-volume service will have a high disk and network usage since many of its operations are in the data path (offline volume migration, create volume from image, etc.), and then we have the cinder-backup service which has high memory, network, and CPU (to compress data) requirements.
We also have the Glance and Swift components that are in the data path, and let's not forget RabbitMQ and Galera services.
Given these requirements it may be preferable not to let these services wander all over your OpenShift worker nodes with the possibility of impacting other workloads, or maybe you don't mind the light services wandering around but you want to pin down the heavy ones to a set of infrastructure nodes.
There are also hardware restrictions to take into consideration, because if we are using a Fibre Channel (FC) Cinder backend we'll need the cinder-volume, cinder-backup, and maybe even the glance (if it's using Cinder as a backend) services to run on a OpenShift host that has an HBA.
The OpenStack Operators allow a great deal of flexibility on where to run the OpenStack services, as we can use node labels to define which OpenShift nodes are eligible to run the different OpenStack services. Refer to the Node Selector guide to learn more about using labels to define placement of the OpenStack services.
TODO: Talk about Ceph Storage and Swift Storage nodes, HCI deployments, etc.
"},{"location":"openstack/planning/#network","title":"Network","text":"TODO: Write about isolated networks, NetworkAttachmentDefinition, NetworkAttachmets, etc
"},{"location":"openstack/planning/#storage","title":"Storage","text":"When looking into the storage in an OpenStack deployment we can differentiate 2 different kinds, the storage requirements of the services themselves and the storage used for the OpenStack users that thee services will manage.
These requirements may drive our OpenShift node selection, as mentioned above, and may even require us to do some preparations on the OpenShift nodes before we can deploy the services.
TODO: Galera, RabbitMQ, Swift, Glance, etc.
"},{"location":"openstack/planning/#cinder-requirements","title":"Cinder requirements","text":"The Cinder service has both local storage used by the service and OpenStack user requirements.
Local storage is used for example when downloading a glance image for the create volume from image operation, which can become considerable when having concurrent operations and not using cinder volume cache.
In the Operator deployed OpenStack we now have an easy way to configure the location of the conversion directory to be an NFS share (using the extra volumes feature), something that needed to be done manually before.
Even if it's an adoption and it may seem that there's nothing to consider regarding the Cinder backends, because we'll just be using the same ones we are using in our current deployment, we should still evaluate it, because it may not be so straightforward.
First we need to check the transport protocol the Cinder backends are using: RBD, iSCSI, FC, NFS, NVMe-oF, etc.
Once we know all the transport protocols we are using, we can proceed to make sure we are taking them into consideration when placing the Cinder services (as mentioned above in the Node Roles section) and the right storage transport related binaries are running on the OpenShift nodes.
Detailed information about the specifics for each storage transport protocol can be found in the Cinder Adoption section. Please take a good look at that document before proceeding to be able to plan the adoption better.
"},{"location":"openstack/pull_openstack_configuration/","title":"Pull Openstack configuration","text":"Before starting to adoption workflow, we can start by pulling the configuration from the Openstack services and TripleO on our file system in order to backup the configuration files and then use it for later, during the configuration of the adopted services and for the record to compare and make sure nothing has been missed or misconfigured.
Make sure you have pull the os-diff repository and configure according to your environment: Configure os-diff
"},{"location":"openstack/pull_openstack_configuration/#pull-configuration-from-a-tripleo-deployment","title":"Pull configuration from a TripleO deployment","text":"Once you make sure the ssh connnection is confugred correctly and os-diff has been built, you can start to pull configuration from your Openstack services.
All the services are describes in an Ansible role:
collect_config vars
Once you enabled the services you need (you can enable everything even if a services is not deployed) you can start to pull the Openstack services configuration files:
pushd os-diff\n./os-diff pull --cloud_engine=podman\n
The configuration will be pulled and stored in:
/tmp/collect_tripleo_configs\n
And you provided another path with:
./os-diff pull --cloud_engine=podman -e local_working_dir=$HOME\n
Once the ansible playbook has been run, you should have into your local directory a directory per services
\u25be tmp/\n \u25be collect_tripleo_configs/\n \u25be glance/\n
"},{"location":"openstack/stop_openstack_services/","title":"Stop OpenStack services","text":"Before we can start with the adoption we need to make sure that the OpenStack services have been stopped.
This is an important step to avoid inconsistencies in the data migrated for the data-plane adoption procedure caused by resource changes after the DB has been copied to the new deployment.
Some services are easy to stop because they only perform short asynchronous operations, but other services are a bit more complex to gracefully stop because they perform synchronous or long running operations that we may want to complete instead of aborting them.
Since gracefully stopping all services is non-trivial and beyond the scope of this guide we'll proceed with the force method but present a couple of recommendations on how to check some things in the services.
"},{"location":"openstack/stop_openstack_services/#variables","title":"Variables","text":"Define the shell variables used in the steps below. The values are just illustrative and refer to a single node standalone director deployment, use values that are correct for your environment:
CONTROLLER1_SSH=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\"\nCONTROLLER2_SSH=\"\"\nCONTROLLER3_SSH=\"\"\n
We chose to use these ssh variables with the ssh commands instead of using ansible to try to create instructions that are independent on where they are running, but ansible commands could be used to achieve the same result if we are in the right host, for example to stop a service:
. stackrc ansible -i $(which tripleo-ansible-inventory) Controller -m shell -a \"sudo systemctl stop tripleo_horizon.service\" -b\n
NOTE Nova computpe services in this guide are running on the same controller hosts. Adjust CONTROLLER${i}_SSH
commands and ServicesToStop
given below to your source environment specific topology.
We can stop OpenStack services at any moment, but we may leave things in an undesired state, so at the very least we should have a look to confirm that there are no long running operations that require other services.
Ensure that there are no ongoing instance live migrations, volume migrations (online or offline), volume creation, backup restore, attaching, detaching, etc.
openstack server list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'\nopenstack volume list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'| grep -vi error\nopenstack volume backup list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|' | grep -vi error\nopenstack share list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'| grep -vi error\nopenstack image list -c ID -c Status |grep -E '\\| .+ing \\|'\n
"},{"location":"openstack/stop_openstack_services/#stopping-control-plane-services","title":"Stopping control plane services","text":"We can stop OpenStack services at any moment, but we may leave things in an undesired state, so at the very least we should have a look to confirm that there are no ongoing operations.
1- Connect to all the controller nodes. 2- Stop the services. 3- Make sure all the services are stopped. 4- Repeat steps 1-3 for compute hosts (workloads running on dataplane will not be affected)
The cinder-backup service on OSP 17.1 could be running as Active-Passive under pacemaker or as Active-Active, so we'll have to check how it's running and stop it.
These steps can be automated with a simple script that relies on the previously defined environmental variables and function:
# Update the services list to be stopped\nServicesToStop=(\"tripleo_horizon.service\"\n \"tripleo_keystone.service\"\n \"tripleo_cinder_api.service\"\n \"tripleo_cinder_api_cron.service\"\n \"tripleo_cinder_scheduler.service\"\n \"tripleo_cinder_backup.service\"\n \"tripleo_glance_api.service\"\n \"tripleo_manila_api.service\"\n \"tripleo_manila_api_cron.service\"\n \"tripleo_manila_scheduler.service\"\n \"tripleo_neutron_api.service\"\n \"tripleo_nova_api.service\"\n \"tripleo_placement_api.service\"\n \"tripleo_nova_api_cron.service\"\n \"tripleo_nova_api.service\"\n \"tripleo_nova_conductor.service\"\n \"tripleo_nova_metadata.service\"\n \"tripleo_nova_scheduler.service\"\n \"tripleo_nova_vnc_proxy.service\"\n # Compute services on dataplane\n \"tripleo_nova_compute.service\"\n \"tripleo_nova_libvirt.target\"\n \"tripleo_nova_migration_target.service\"\n \"tripleo_nova_virtlogd_wrapper.service\"\n \"tripleo_nova_virtnodedevd.service\"\n \"tripleo_nova_virtproxyd.service\"\n \"tripleo_nova_virtqemud.service\"\n \"tripleo_nova_virtsecretd.service\"\n \"tripleo_nova_virtstoraged.service\")\n\nPacemakerResourcesToStop=(\"openstack-cinder-volume\"\n \"openstack-cinder-backup\"\n \"openstack-manila-share\")\n\necho \"Stopping systemd OpenStack services\"\nfor service in ${ServicesToStop[*]}; do\n for i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Stopping the $service in controller $i\"\n if ${!SSH_CMD} sudo systemctl is-active $service; then\n ${!SSH_CMD} sudo systemctl stop $service\n fi\n fi\n done\ndone\n\necho \"Checking systemd OpenStack services\"\nfor service in ${ServicesToStop[*]}; do\n for i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Checking status of $service in controller $i\"\n if ! ${!SSH_CMD} systemctl show $service | grep ActiveState=inactive >/dev/null; then\n echo \"ERROR: Service $service still running on controller $i\"\n fi\n fi\n done\ndone\n\necho \"Stopping pacemaker OpenStack services\"\nfor i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Using controller $i to run pacemaker commands\"\n for resource in ${PacemakerResourcesToStop[*]}; do\n if ${!SSH_CMD} sudo pcs resource config $resource; then\n ${!SSH_CMD} sudo pcs resource disable $resource\n fi\n done\n break\n fi\ndone\n
"},{"location":"openstack/troubleshooting/","title":"Troubleshooting","text":"This document contains information about various issues you might face and how to solve them.
"},{"location":"openstack/troubleshooting/#errimagepull-due-to-missing-authentication","title":"ErrImagePull due to missing authentication","text":"The deployed containers pull the images from private containers registries that can potentially return authentication errors like:
Failed to pull image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\":\nrpc error: code = Unknown desc = unable to retrieve auth token: invalid\nusername/password: unauthorized: Please login to the Red Hat Registry using\nyour Customer Portal credentials.\n
An example of a failed pod:
Normal Scheduled 3m40s default-scheduler Successfully assigned openstack/rabbitmq-server-0 to worker0\n Normal AddedInterface 3m38s multus Add eth0 [10.101.0.41/23] from ovn-kubernetes\n Warning Failed 2m16s (x6 over 3m38s) kubelet Error: ImagePullBackOff\n Normal Pulling 2m5s (x4 over 3m38s) kubelet Pulling image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\"\n Warning Failed 2m5s (x4 over 3m38s) kubelet Failed to pull image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\": rpc error: code ... can be found here: https://access.redhat.com/RegistryAuthentication\n Warning Failed 2m5s (x4 over 3m38s) kubelet Error: ErrImagePull\n Normal BackOff 110s (x7 over 3m38s) kubelet Back-off pulling image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\"\n
To solve this issue we need to get a valid pull-secret from the official Red Hat console site, store this pull secret locally in a machine with access to the Kubernetes API (service node), and then run:
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull_secret_location.json>\n
The previous command will make available the authentication information in all the cluster's compute nodes, then trigger a new pod deployment to pull the container image with:
kubectl delete pod rabbitmq-server-0 -n openstack\n
And the pod should be able to pull the image successfully. For more information about what container registries requires what type of authentication, check the official docs.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Data Plane Adoption procedure","text":""},{"location":"#openstack-adoption","title":"OpenStack adoption","text":"This is a procedure for adopting an OpenStack cloud.
Perform the actions from the sub-documents in the following order:
Planning the new deployment
Deploy podified backend services
Pull Openstack configuration
Stop OpenStack services
Copy MariaDB data
OVN adoption
Keystone adoption
Neutron adoption
Ceph backend configuration (if applicable)
Glance adoption
Placement adoption
Nova adoption
Cinder adoption
Manila adoption
Horizon adoption
Dataplane adoption
Ironic adoption
If you face issues during adoption, check the Troubleshooting document for common problems and solutions.
"},{"location":"#post-openstack-ceph-adoption","title":"Post-OpenStack Ceph adoption","text":"If the environment includes Ceph and some of its services are collocated on the Controller hosts (\"internal Ceph\"), then Ceph services need to be moved out of Controller hosts as the last step of the OpenStack adoption. Follow this documentation:
For information about contributing to the docs and how to run tests, see:
Contributing to documentation - how to build docs locally, docs patterns and tips.
Development environment - how set up a local development environment where Adoption can be executed (either manually or via the test suite).
Tests - information about the test suite and how to run it.
In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated Storage nodes, the daemons living in the OpenStack control plane should be moved/migrated into the existing external RHEL nodes (typically the compute nodes for an HCI environment or dedicated storage nodes in all the remaining use cases).
"},{"location":"ceph/ceph_rbd/#requirements","title":"Requirements","text":"The goal of the first POC is to prove we are able to successfully drain a controller node, in terms of ceph daemons, and move them to a different node. The initial target of the POC is RBD only, which means we\u2019re going to move only mon and mgr daemons. For the purposes of this POC, we'll deploy a ceph cluster with only mon, mgrs, and osds to simulate the environment a customer will be in before starting the migration. The goal of the first POC is to ensure that: - We can keep the mon IP addresses moving them to the CephStorage nodes. - We can drain the existing controller nodes and shut them down. - We can deploy additional monitors to the existing nodes, promoting them as _admin nodes that can be used by administrators to manage the ceph cluster and perform day2 operations against it. - We can keep the cluster operational during the migration.
"},{"location":"ceph/ceph_rbd/#prerequisites","title":"Prerequisites","text":"The Storage Nodes should be configured to have both storage and storage_mgmt network to make sure we can use both Ceph public and cluster networks.
This step is the only one where the interaction with TripleO is required. From 17+ we don\u2019t have to run any stack update, however, we have commands that should be performed to run os-net-config on the bare-metal node and configure additional networks.
Make sure the network is defined in metalsmith.yaml for the CephStorageNodes:
- name: CephStorage\n count: 2\n instances:\n - hostname: oc0-ceph-0\n name: oc0-ceph-0\n - hostname: oc0-ceph-1\n name: oc0-ceph-1\n defaults:\n networks:\n - network: ctlplane\n vif: true\n - network: storage_cloud_0\n subnet: storage_cloud_0_subnet\n - network: storage_mgmt_cloud_0\n subnet: storage_mgmt_cloud_0_subnet\n network_config:\n template: templates/single_nic_vlans/single_nic_vlans_storage.j2\n
Then run:
openstack overcloud node provision \\\n -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \\\n --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam\n
Verify that the storage network is running on the node:
(undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a\nWarning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts.\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n
"},{"location":"ceph/ceph_rbd/#migrate-mons-and-mgrs-on-the-two-existing-cephstorage-nodes","title":"Migrate mon(s) and mgr(s) on the two existing CephStorage nodes","text":"Create a ceph spec based on the default roles with the mon/mgr on the controller nodes.
openstack overcloud ceph spec -o ceph_spec.yaml -y \\\n --stack overcloud-0 overcloud-baremetal-deployed-0.yaml\n
Deploy the Ceph cluster
openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \\\n --stack overcloud-0 -o deployed_ceph.yaml \\\n --network-data ~/oc0-network-data.yaml \\\n --ceph-spec ~/ceph_spec.yaml\n
Note:
The ceph_spec.yaml, which is the OSP-generated description of the ceph cluster, will be used, later in the process, as the basic template required by cephadm to update the status/info of the daemons.
Check the status of the cluster:
[ceph: root@oc0-controller-0 /]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_OK\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m)\n mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk\n osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
[ceph: root@oc0-controller-0 /]# ceph orch host ls\nHOST ADDR LABELS STATUS\noc0-ceph-0 192.168.24.14 osd\noc0-ceph-1 192.168.24.7 osd\noc0-controller-0 192.168.24.15 _admin mgr mon\noc0-controller-1 192.168.24.23 _admin mgr mon\noc0-controller-2 192.168.24.13 _admin mgr mon\n
The goal of the next section is to migrate the oc0-controller-{1,2} daemons into oc0-ceph-{0,1} as the very basic scenario that demonstrates we can actually make this kind of migration using cephadm.
"},{"location":"ceph/ceph_rbd/#migrate-oc0-controller-1-into-oc0-ceph-0","title":"Migrate oc0-controller-1 into oc0-ceph-0","text":"ssh into controller-0, then
cephadm shell -v /home/ceph-admin/specs:/specs
ssh into ceph-0, then
sudo \u201cwatch podman ps\u201d # watch the new mon/mgr being deployed here
(optional) if mgr is active in the source node, then:
ceph mgr fail <mgr instance>\n
From the cephadm shell, remove the labels on oc0-controller-1
for label in mon mgr _admin; do\n ceph orch host rm label oc0-controller-1 $label;\n done\n
Add the missing labels to oc0-ceph-0
[ceph: root@oc0-controller-0 /]#\n> for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done\nAdded label mon to host oc0-ceph-0\nAdded label mgr to host oc0-ceph-0\nAdded label _admin to host oc0-ceph-0\n
Drain and force-remove the oc0-controller-1 node
[ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1\nScheduled to remove the following daemons from host 'oc0-controller-1'\ntype id\n-------------------- ---------------\nmon oc0-controller-1\nmgr oc0-controller-1.mtxohd\ncrash oc0-controller-1\n
[ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force\nRemoved host 'oc0-controller-1'\n\n[ceph: root@oc0-controller-0 /]# ceph orch host ls\nHOST ADDR LABELS STATUS\noc0-ceph-0 192.168.24.14 osd\noc0-ceph-1 192.168.24.7 osd\noc0-controller-0 192.168.24.15 mgr mon _admin\noc0-controller-2 192.168.24.13 _admin mgr mon\n
If you have only 3 mon nodes, and the drain of the node doesn\u2019t work as expected (the containers are still there), then SSH to controller-1 and force-purge the containers in the node:
[root@oc0-controller-1 ~]# sudo podman ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n5c1ad36472bc quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1\n3b14cc7bf4dd quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mgr.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd\n\n[root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force\n\n[root@oc0-controller-1 ~]# sudo podman ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n
Note: cephadm rm-cluster on a node that is not part of the cluster anymore has the effect of removing all the containers and doing some cleanup on the filesystem.
Before shutting the oc0-controller-1 down, move the IP address (on the same network) to the oc0-ceph-0 node:
mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0]\n\n[root@oc0-controller-1 ~]# ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-ex inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n6: vlan100 inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\\ valid_lft forever preferred_lft forever\n7: vlan12 inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n8: vlan11 inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n9: vlan13 inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\\ valid_lft forever preferred_lft forever\n10: vlan70 inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n11: vlan1 inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n12: vlan14 inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\\ valid_lft forever preferred_lft forever\n
On the oc0-ceph-0:
[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n[heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11\n[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\\ valid_lft forever preferred_lft forever\n6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\\ valid_lft forever preferred_lft forever\n7: vlan11 inet 172.16.11.121/32 scope global vlan11\\ valid_lft forever preferred_lft forever\n8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\\ valid_lft forever preferred_lft forever\n
Poweroff oc0-controller-1.
Add the new mon on oc0-ceph-0 using the old IP address:
[ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121\nDeployed mon.oc0-ceph-0 on host 'oc0-ceph-0'\n
Check the new container in the oc0-ceph-0 node:
b581dc8bbb78 quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-ceph-0... 24 seconds ago Up 24 seconds ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0\n
On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:
cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file\n\n[ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml\n\n--- ceph_spec.yaml.bkp 2022-07-29 15:41:34.516329643 +0000\n+++ ceph_spec.yaml 2022-07-29 15:28:26.455329643 +0000\n@@ -7,14 +7,6 @@\n - mgr\n service_type: host\n ---\n-addr: 192.168.24.12\n-hostname: oc0-controller-1\n-labels:\n-- _admin\n-- mon\n-- mgr\n-service_type: host\n----\n addr: 192.168.24.19\n hostname: oc0-controller-2\n labels:\n@@ -38,7 +30,7 @@\n placement:\n hosts:\n - oc0-controller-0\n- - oc0-controller-1\n+ - oc0-ceph-0\n - oc0-controller-2\n service_id: mon\n service_name: mon\n@@ -47,8 +39,8 @@\n placement:\n hosts:\n - oc0-controller-0\n- - oc0-controller-1\n - oc0-controller-2\n+ - oc0-ceph-0\n service_id: mgr\n service_name: mgr\n service_type: mgr\n
Apply the resulting spec:
ceph orch apply -i ceph_spec.yaml \n\n The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm\n\n[ceph: root@oc0-controller-0 specs]# ceph orch ls\nNAME PORTS RUNNING REFRESHED AGE PLACEMENT\ncrash 4/4 5m ago 61m *\nmgr 3/3 5m ago 69s oc0-controller-0;oc0-ceph-0;oc0-controller-2\nmon 3/3 5m ago 70s oc0-controller-0;oc0-ceph-0;oc0-controller-2\nosd.default_drive_group 8 2m ago 69s oc0-ceph-0;oc0-ceph-1\n\n[ceph: root@oc0-controller-0 specs]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_WARN\n 1 stray host(s) with 1 daemon(s) not managed by cephadm\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m)\n mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb\n osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
Fix the warning by refreshing the mgr:
ceph mgr fail oc0-controller-0.xzgtvo\n
And at this point the cluster is clean:
[ceph: root@oc0-controller-0 specs]# ceph -s\n cluster:\n id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3\n health: HEALTH_OK\n\n services:\n mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m)\n mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb\n osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs\n\n data:\n pools: 1 pools, 1 pgs\n objects: 0 objects, 0 B\n usage: 43 MiB used, 400 GiB / 400 GiB avail\n pgs: 1 active+clean\n
oc0-controller-1 has been removed and powered off without leaving traces on the ceph cluster.
The same approach and the same steps can be applied to migrate oc0-controller-2 to oc0-ceph-1.
"},{"location":"ceph/ceph_rbd/#screen-recording","title":"Screen Recording:","text":"In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated Storage nodes, the RGW daemons living in the OpenStack Controller nodes will be migrated into the existing external RHEL nodes (typically the Compute nodes for an HCI environment or CephStorage nodes in the remaining use cases).
"},{"location":"ceph/ceph_rgw/#requirements","title":"Requirements","text":"Ceph 5+ applies strict constraints in the way daemons can be colocated within the same node. The resulting topology depends on the available hardware, as well as the amount of Ceph services present in the Controller nodes which are going to be retired. The following document describes the procedure required to migrate the RGW component (and keep an HA model using the Ceph Ingress daemon in a common TripleO scenario where Controller nodes represent the spec placement where the service is deployed. As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the Ceph daemons on the CephStorage nodes where at least three nodes are required in a scenario that sees only RGW and RBD (no dashboard):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingressWith dashboard, and without Manila at least four nodes are required (dashboard has no failover):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash dashboard/grafana osd rgw/ingress (free)With dashboard and Manila 5 nodes minimum are required (and dashboard has no failover):
osd mon/mgr/crash rgw/ingress osd mon/mgr/crash rgw/ingress osd mon/mgr/crash mds/ganesha/ingress osd rgw/ingress mds/ganesha/ingress osd mds/ganesha/ingress dashboard/grafana"},{"location":"ceph/ceph_rgw/#current-status","title":"Current Status","text":"(undercloud) [stack@undercloud-0 ~]$ metalsmith list\n\n\n +------------------------+ +----------------+\n | IP Addresses | | Hostname |\n +------------------------+ +----------------+\n | ctlplane=192.168.24.25 | | cephstorage-0 |\n | ctlplane=192.168.24.10 | | cephstorage-1 |\n | ctlplane=192.168.24.32 | | cephstorage-2 |\n | ctlplane=192.168.24.28 | | compute-0 |\n | ctlplane=192.168.24.26 | | compute-1 |\n | ctlplane=192.168.24.43 | | controller-0 |\n | ctlplane=192.168.24.7 | | controller-1 |\n | ctlplane=192.168.24.41 | | controller-2 |\n +------------------------+ +----------------+\n
SSH into controller-0
and check the pacemaker
status: this will help identify the relevant information that we need to know before starting the RGW migration.
Full List of Resources:\n * ip-192.168.24.46 (ocf:heartbeat:IPaddr2): Started controller-0\n * ip-10.0.0.103 (ocf:heartbeat:IPaddr2): Started controller-1\n * ip-172.17.1.129 (ocf:heartbeat:IPaddr2): Started controller-2\n * ip-172.17.3.68 (ocf:heartbeat:IPaddr2): Started controller-0\n * ip-172.17.4.37 (ocf:heartbeat:IPaddr2): Started controller-1\n * Container bundle set: haproxy-bundle\n\n[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:\n * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-2\n * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-0\n * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-1\n
Use the ip
command to identify the ranges of the storage networks.
[heat-admin@controller-0 ~]$ ip -o -4 a\n\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n7: br-ex inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n8: vlan70 inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n8: vlan70 inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\\ valid_lft forever preferred_lft forever\n9: vlan50 inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\\ valid_lft forever preferred_lft forever\n10: vlan30 inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n10: vlan30 inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n11: vlan20 inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\\ valid_lft forever preferred_lft forever\n12: vlan40 inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\\ valid_lft forever preferred_lft forever\n
In this example:
Identify the network that we previously had in haproxy and propagate it (via TripleO) to the CephStorage nodes. This network is used to reserve a new VIP that will be owned by Ceph and used as the entry point for the RGW service.
ssh into controller-0
and check the current HaProxy configuration until we find ceph_rgw
section:
$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg\n\n...\n...\nlisten ceph_rgw\n bind 10.0.0.103:8080 transparent\n bind 172.17.3.68:8080 transparent\n mode http\n balance leastconn\n http-request set-header X-Forwarded-Proto https if { ssl_fc }\n http-request set-header X-Forwarded-Proto http if !{ ssl_fc }\n http-request set-header X-Forwarded-Port %[dst_port]\n option httpchk GET /swift/healthcheck\n option httplog\n option forwardfor\n server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2\n server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2\n server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2\n
Double check the network used as HaProxy frontend:
[controller-0]$ ip -o -4 a\n\n...\n7: br-ex inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n...\n
As described in the previous section, the check on controller-0 shows that we are exposing the services using the external network, which is not present in the CephStorage nodes, and we need to propagate it via TripleO.
"},{"location":"ceph/ceph_rgw/#propagate-the-haproxy-frontend-network-to-cephstorage-nodes","title":"Propagate theHaProxy
frontend network to CephStorage
nodes","text":"Change the nic template used to define the ceph-storage network interfaces and add the new config section.
---\nnetwork_config:\n- type: interface\n name: nic1\n use_dhcp: false\n dns_servers: {{ ctlplane_dns_nameservers }}\n addresses:\n - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}\n routes: {{ ctlplane_host_routes }}\n- type: vlan\n vlan_id: {{ storage_mgmt_vlan_id }}\n device: nic1\n addresses:\n - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }}\n routes: {{ storage_mgmt_host_routes }}\n- type: interface\n name: nic2\n use_dhcp: false\n defroute: false\n- type: vlan\n vlan_id: {{ storage_vlan_id }}\n device: nic2\n addresses:\n - ip_netmask: {{ storage_ip }}/{{ storage_cidr }}\n routes: {{ storage_host_routes }}\n- type: ovs_bridge\n name: {{ neutron_physical_bridge_name }}\n dns_servers: {{ ctlplane_dns_nameservers }}\n domain: {{ dns_search_domains }}\n use_dhcp: false\n addresses:\n - ip_netmask: {{ external_ip }}/{{ external_cidr }}\n routes: {{ external_host_routes }}\n members:\n - type: interface\n name: nic3\n primary: true\n
In addition, add the External Network to the baremetal.yaml
file used by metalsmith and run the overcloud node provision
command passing the --network-config
option:
- name: CephStorage\n count: 3\n hostname_format: cephstorage-%index%\n instances:\n - hostname: cephstorage-0\n name: ceph-0\n - hostname: cephstorage-1\n name: ceph-1\n - hostname: cephstorage-2\n name: ceph-2\n defaults:\n profile: ceph-storage\n network_config:\n template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2\n networks:\n - network: ctlplane\n vif: true\n - network: storage\n - network: storage_mgmt\n - network: external\n
(undercloud) [stack@undercloud-0]$\n\nopenstack overcloud node provision\n -o overcloud-baremetal-deployed-0.yaml\n --stack overcloud\n --network-config -y\n $PWD/network/baremetal_deployment.yaml\n
Check the new network on the CephStorage
nodes:
[root@cephstorage-0 ~]# ip -o -4 a\n\n1: lo inet 127.0.0.1/8 scope host lo\\ valid_lft forever preferred_lft forever\n2: enp1s0 inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\\ valid_lft forever preferred_lft forever\n11: vlan40 inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\\ valid_lft forever preferred_lft forever\n12: vlan30 inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\\ valid_lft forever preferred_lft forever\n14: br-ex inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\\ valid_lft forever preferred_lft forever\n
And now it\u2019s time to start migrating the RGW backends and build the ingress on top of them.
"},{"location":"ceph/ceph_rgw/#migrate-the-rgw-backends","title":"Migrate the RGW backends","text":"To match the cardinality diagram we use cephadm labels to refer to a group of nodes where a given daemon type should be deployed.
Add the RGW label to the cephstorage nodes:
for i in 0 1 2; {\n ceph orch host label add cephstorage-$i rgw;\n}\n
[ceph: root@controller-0 /]#\n\nfor i in 0 1 2; {\n ceph orch host label add cephstorage-$i rgw;\n}\n\nAdded label rgw to host cephstorage-0\nAdded label rgw to host cephstorage-1\nAdded label rgw to host cephstorage-2\n\n[ceph: root@controller-0 /]# ceph orch host ls\n\nHOST ADDR LABELS STATUS\ncephstorage-0 192.168.24.54 osd rgw\ncephstorage-1 192.168.24.44 osd rgw\ncephstorage-2 192.168.24.30 osd rgw\ncontroller-0 192.168.24.45 _admin mon mgr\ncontroller-1 192.168.24.11 _admin mon mgr\ncontroller-2 192.168.24.38 _admin mon mgr\n\n6 hosts in cluster\n
During the overcloud deployment, RGW is applied at step2 (external_deployment_steps), and a cephadm compatible spec is generated in /home/ceph-admin/specs/rgw
from the ceph_mkspec ansible module. Find and patch the RGW spec, specifying the right placement using the labels approach, and change the rgw backend port to 8090 to avoid conflicts with the Ceph Ingress Daemon (*)
[root@controller-0 heat-admin]# cat rgw\n\nnetworks:\n- 172.17.3.0/24\nplacement:\n hosts:\n - controller-0\n - controller-1\n - controller-2\nservice_id: rgw\nservice_name: rgw.rgw\nservice_type: rgw\nspec:\n rgw_frontend_port: 8080\n rgw_realm: default\n rgw_zone: default\n
Patch the spec replacing controller nodes with the label key
---\nnetworks:\n- 172.17.3.0/24\nplacement:\n label: rgw\nservice_id: rgw\nservice_name: rgw.rgw\nservice_type: rgw\nspec:\n rgw_frontend_port: 8090\n rgw_realm: default\n rgw_zone: default\n
(*) cephadm_check_port
Apply the new RGW spec using the orchestrator CLI:
$ cephadm shell -m /home/ceph-admin/specs/rgw\n$ cephadm shell -- ceph orch apply -i /mnt/rgw\n
Which triggers the redeploy:
...\nosd.9 cephstorage-2\nrgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 starting\nrgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 starting\nrgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 starting\nrgw.rgw.controller-1.eyvrzw controller-1 172.17.3.146:8080 running (5h)\nrgw.rgw.controller-2.navbxa controller-2 172.17.3.66:8080 running (5h)\n\n...\nosd.9 cephstorage-2\nrgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 running (19s)\nrgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 running (16s)\nrgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 running (13s)\n
At this point, we need to make sure that the new RGW backends are reachable on the new ports, but we\u2019re going to enable an IngressDaemon on port 8080 later in the process. For this reason, ssh on each RGW node (the CephStorage nodes) and add the iptables rule to allow connections to both 8080 and 8090 ports in the CephStorage nodes.
iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment \"ceph rgw ingress\" -j ACCEPT\n\niptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment \"ceph rgw backends\" -j ACCEPT\n\nfor port in 8080 8090; { \n for i in 25 10 32; {\n ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \\\n -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \\\n -j ACCEPT;\n }\n}\n
From a Controller node (e.g. controller-0) try to reach (curl) the rgw backends:
for i in 26 23 81; do {\n echo \"----\"\n curl 172.17.3.$i:8090;\n echo \"----\"\n echo\ndone\n
And you should observe the following:
----\nQuery 172.17.3.23\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n\n----\nQuery 172.17.3.26\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n\n----\nQuery 172.17.3.81\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>\n---\n
"},{"location":"ceph/ceph_rgw/#note","title":"NOTE","text":"In case RGW backends are migrated in the CephStorage nodes, there\u2019s no \u201cinternalAPI\u201d network(this is not true in the case of HCI). Reconfig the RGW keystone endpoint, pointing to the external Network that has been propagated (see the previous section)
[ceph: root@controller-0 /]# ceph config dump | grep keystone\nglobal basic rgw_keystone_url http://172.16.1.111:5000\n\n[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000\n
"},{"location":"ceph/ceph_rgw/#deploy-a-ceph-ingressdaemon","title":"Deploy a Ceph IngressDaemon","text":"HaProxy
is managed by TripleO via Pacemaker
: the three running instances at this point will point to the old RGW backends, resulting in a wrong, not working configuration. Since we\u2019re going to deploy the Ceph Ingress Daemon, the first thing to do is remove the existing ceph_rgw
config, clean up the config created by TripleO and restart the service to make sure other services are not affected by this change.
ssh on each Controller node and remove the following is the section from /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg
:
listen ceph_rgw\n bind 10.0.0.103:8080 transparent\n mode http\n balance leastconn\n http-request set-header X-Forwarded-Proto https if { ssl_fc }\n http-request set-header X-Forwarded-Proto http if !{ ssl_fc }\n http-request set-header X-Forwarded-Port %[dst_port]\n option httpchk GET /swift/healthcheck\n option httplog\n option forwardfor\n server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2\n server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2\n server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2\n
Restart haproxy-bundle
and make sure it\u2019s started:
[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle\nhaproxy-bundle successfully restarted\n\n\n[root@controller-0 ~]# sudo pcs status | grep haproxy\n\n * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]:\n * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-0\n * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-1\n * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-2\n
Double check no process is bound to 8080 anymore\u201d
[root@controller-0 ~]# ss -antop | grep 8080\n[root@controller-0 ~]#\n
And the swift CLI should fail at this point:
(overcloud) [root@cephstorage-0 ~]# swift list\n\nHTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc41beb0430>: Failed to establish a new connection: [Errno 111] Connection refused'))\n
Now we can start deploying the Ceph IngressDaemon on the CephStorage nodes.
Set the required images for both HaProxy and Keepalived
[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3\n\n[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5\n
Prepare the ingress spec and mount it to cephadm:
$ sudo vim /home/ceph-admin/specs/rgw_ingress\n
and paste the following content:
---\nservice_type: ingress\nservice_id: rgw.rgw\nplacement:\n label: rgw\nspec:\n backend_service: rgw.rgw\n virtual_ip: 10.0.0.89/24\n frontend_port: 8080\n monitor_port: 8898\n virtual_interface_networks:\n - 10.0.0.0/24\n
Mount the generated spec and apply it using the orchestrator CLI:
$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress\n$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress\n
Wait until the ingress is deployed and query the resulting endpoint:
[ceph: root@controller-0 /]# ceph orch ls\n\nNAME PORTS RUNNING REFRESHED AGE PLACEMENT\ncrash 6/6 6m ago 3d *\ningress.rgw.rgw 10.0.0.89:8080,8898 6/6 37s ago 60s label:rgw\nmds.mds 3/3 6m ago 3d controller-0;controller-1;controller-2\nmgr 3/3 6m ago 3d controller-0;controller-1;controller-2\nmon 3/3 6m ago 3d controller-0;controller-1;controller-2\nosd.default_drive_group 15 37s ago 3d cephstorage-0;cephstorage-1;cephstorage-2\nrgw.rgw ?:8090 3/3 37s ago 4m label:rgw\n
[ceph: root@controller-0 /]# curl 10.0.0.89:8080\n\n---\n<?xml version=\"1.0\" encoding=\"UTF-8\"?><ListAllMyBucketsResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>[ceph: root@controller-0 /]#\n\u2014\n
The result above shows that we\u2019re able to reach the backend from the IngressDaemon, which means we\u2019re almost ready to interact with it using the swift CLI.
"},{"location":"ceph/ceph_rgw/#update-the-object-store-endpoints","title":"Update the object-store endpoints","text":"The endpoints still point to the old VIP owned by pacemaker, but given it\u2019s still used by other services and we reserved a new VIP on the same network, before any other action we should update the object-store endpoint.
List the current endpoints:
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object\n\n| 1326241fb6b6494282a86768311f48d1 | regionOne | swift | object-store | True | internal | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |\n| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift | object-store | True | public | http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s |\n| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift | object-store | True | admin | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s |\n
Update the endpoints pointing to the Ingress VIP:
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url \"http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s\" 95596a2d92c74c15b83325a11a4f07a3\n\n(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store\n| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift | object-store | True | internal | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |\n| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift | object-store | True | public | http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s |\n| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift | object-store | True | admin | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s |\n
And repeat the same action for both internal and admin. Test the migrated service.
(overcloud) [stack@undercloud-0 ~]$ swift list --debug\n\nDEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint\nDEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens\nDEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000\nDEBUG:urllib3.connectionpool:http://10.0.0.115:5000 \"POST /v3/auth/tokens HTTP/1.1\" 201 7795\nDEBUG:keystoneclient.auth.identity.v3.base:{\"token\": {\"methods\": [\"password\"], \"user\": {\"domain\": {\"id\": \"default\", \"name\": \"Default\"}, \"id\": \"6f87c7ffdddf463bbc633980cfd02bb3\", \"name\": \"admin\", \"password_expires_at\": null}, \n\n\n...\n...\n...\n\nDEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H \"X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos\" -H \"Accept-Encoding: gzip\"\nDEBUG:swiftclient:RESP STATUS: 200 OK\nDEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'}\nDEBUG:swiftclient:RESP BODY: b'[]'\n
Run tempest tests against object-storage:
(overcloud) [stack@undercloud-0 tempest-dir]$ tempest run --regex tempest.api.object_storage\n...\n...\n...\n======\nTotals\n======\nRan: 141 tests in 606.5579 sec.\n - Passed: 128\n - Skipped: 13\n - Expected Fail: 0\n - Unexpected Success: 0\n - Failed: 0\nSum of execute time for each test: 657.5183 sec.\n\n==============\nWorker Balance\n==============\n - Worker 0 (1 tests) => 0:10:03.400561\n - Worker 1 (2 tests) => 0:00:24.531916\n - Worker 2 (4 tests) => 0:00:10.249889\n - Worker 3 (30 tests) => 0:00:32.730095\n - Worker 4 (51 tests) => 0:00:26.246044\n - Worker 5 (6 tests) => 0:00:20.114803\n - Worker 6 (20 tests) => 0:00:16.290323\n - Worker 7 (27 tests) => 0:00:17.103827\n
"},{"location":"ceph/ceph_rgw/#additional-resources","title":"Additional Resources","text":"A screen recording is available here.
"},{"location":"contributing/development_environment/","title":"Development environment","text":"This is a guide for an install_yamls based Adoption environment with network isolation as an alternative to the CRC and Vagrant TripleO Standalone development environment guide.
The Adoption development environment utilizes install_yamls for CRC VM creation and for creation of the VM that hosts the original Wallaby OpenStack in Standalone configuration.
"},{"location":"contributing/development_environment/#environment-prep","title":"Environment prep","text":"Get install_yamls:
git clone https://github.com/openstack-k8s-operators/install_yamls.git\n
Install tools for operator development:
cd ~/install_yamls/devsetup\nmake download_tools\n
"},{"location":"contributing/development_environment/#deployment-of-crc-with-network-isolation","title":"Deployment of CRC with network isolation","text":"cd ~/install_yamls/devsetup\nPULL_SECRET=$HOME/pull-secret.txt CPUS=12 MEMORY=40000 DISK=100 make crc\n\neval $(crc oc-env)\noc login -u kubeadmin -p 12345678 https://api.crc.testing:6443\n\nmake crc_attach_default_interface\n
"},{"location":"contributing/development_environment/#development-environment-with-openstack-ironic","title":"Development environment with Openstack ironic","text":"Create the BMaaS network (crc-bmaas
) and virtual baremetal nodes controlled by a RedFish BMC emulator.
cd .. # back to install_yamls\nmake nmstate\nmake namespace\ncd devsetup # back to install_yamls/devsetup\nmake bmaas\n
A node definition YAML file to use with the openstack baremetal create <file>.yaml
command can be generated for the virtual baremetal nodes by running the bmaas_generate_nodes_yaml
make target. Store it in a temp file for later.
make bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml\n
Set variables to deploy edpm Standalone with additional network (baremetal
) and compute driver ironic
.
cat << EOF > /tmp/addtional_nets.json\n[\n {\n \"type\": \"network\",\n \"name\": \"crc-bmaas\",\n \"standalone_config\": {\n \"type\": \"ovs_bridge\",\n \"name\": \"baremetal\",\n \"mtu\": 1500,\n \"vip\": true,\n \"ip_subnet\": \"172.20.1.0/24\",\n \"allocation_pools\": [\n {\n \"start\": \"172.20.1.100\",\n \"end\": \"172.20.1.150\"\n }\n ],\n \"host_routes\": [\n {\n \"destination\": \"192.168.130.0/24\",\n \"nexthop\": \"172.20.1.1\"\n }\n ]\n }\n }\n]\nEOF\nexport EDPM_COMPUTE_ADDITIONAL_NETWORKS=$(jq -c . /tmp/addtional_nets.json)\nexport STANDALONE_COMPUTE_DRIVER=ironic\nexport NTP_SERVER=pool.ntp.org # Only neccecary if not on the RedHat network ...\nexport EDPM_COMPUTE_CEPH_ENABLED=false # Optional\n
Use the install_yamls devsetup to create a virtual machine connected to the isolated networks.
Create the edpm-compute-0 virtual machine.
cd install_yamls/devsetup\nmake standalone\n
"},{"location":"contributing/development_environment/#install-the-openstack-k8s-operators-openstack-operator","title":"Install the openstack-k8s-operators (openstack-operator)","text":"cd .. # back to install_yamls\nmake crc_storage\nmake input\nmake openstack\n
"},{"location":"contributing/development_environment/#convenience-steps","title":"Convenience steps","text":"To make our life easier we can copy the deployment passwords we'll be using in the backend services deployment phase of the data plane adoption.
scp -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100:/root/tripleo-standalone-passwords.yaml ~/\n
If we want to be able to easily run openstack
commands from the host without actually installing the package and copying the configuration file from the VM we can create a simple alias:
alias openstack=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 OS_CLOUD=standalone openstack\"\n
"},{"location":"contributing/development_environment/#route-networks","title":"Route networks","text":"Route VLAN20 to have access to the MariaDB cluster:
EDPM_BRIDGE=$(sudo virsh dumpxml edpm-compute-0 | grep -oP \"(?<=bridge=').*(?=')\")\nsudo ip link add link $EDPM_BRIDGE name vlan20 type vlan id 20\nsudo ip addr add dev vlan20 172.17.0.222/24\nsudo ip link set up dev vlan20\n
"},{"location":"contributing/development_environment/#snapshotrevert","title":"Snapshot/revert","text":"When the deployment of the Standalone OpenStack is finished, it's a good time to snapshot the machine, so that multiple Adoption attempts can be done without having to deploy from scratch.
cd ~/install_yamls/devsetup\nmake standalone_snapshot\n
And when you wish to revert the Standalone deployment to the snapshotted state:
cd ~/install_yamls/devsetup\nmake standalone_revert\n
Similar snapshot could be done for the CRC virtual machine, but the developer environment reset on CRC side can be done sufficiently via the install_yamls *_cleanup
targets. This is further detailed in the section: Reset the environment to pre-adoption state
# Enroll baremetal nodes\nmake bmaas_generate_nodes_yaml | tail -n +2 | tee /tmp/ironic_nodes.yaml\nscp -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa /tmp/ironic_nodes.yaml root@192.168.122.100:\nssh -i $HOME/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\n\nexport OS_CLOUD=standalone\nopenstack baremetal create /root/ironic_nodes.yaml\nexport IRONIC_PYTHON_AGENT_RAMDISK_ID=$(openstack image show deploy-ramdisk -c id -f value)\nexport IRONIC_PYTHON_AGENT_KERNEL_ID=$(openstack image show deploy-kernel -c id -f value)\nfor node in $(openstack baremetal node list -c UUID -f value); do\n openstack baremetal node set $node \\\n --driver-info deploy_ramdisk=${IRONIC_PYTHON_AGENT_RAMDISK_ID} \\\n --driver-info deploy_kernel=${IRONIC_PYTHON_AGENT_KERNEL_ID} \\\n --resource-class baremetal \\\n --property capabilities='boot_mode:uefi'\ndone\n\n# Create a baremetal flavor\nopenstack flavor create baremetal --ram 1024 --vcpus 1 --disk 15 \\\n --property resources:VCPU=0 \\\n --property resources:MEMORY_MB=0 \\\n --property resources:DISK_GB=0 \\\n --property resources:CUSTOM_BAREMETAL=1 \\\n --property capabilities:boot_mode=\"uefi\"\n\n# Create image\nIMG=Fedora-Cloud-Base-38-1.6.x86_64.qcow2\nURL=https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/$IMG\ncurl -o /tmp/${IMG} -L $URL\nDISK_FORMAT=$(qemu-img info /tmp/${IMG} | grep \"file format:\" | awk '{print $NF}')\nopenstack image create --container-format bare --disk-format ${DISK_FORMAT} Fedora-Cloud-Base-38 < /tmp/${IMG}\n\nexport BAREMETAL_NODES=$(openstack baremetal node list -c UUID -f value)\n# Manage nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal node manage $node\ndone\n\n# Wait for nodes to reach \"manageable\" state\nwatch openstack baremetal node list\n\n# Inspect baremetal nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal introspection start $node\ndone\n\n# Wait for inspection to complete\nwatch openstack baremetal introspection list\n\n# Provide nodes\nfor node in $BAREMETAL_NODES; do\n openstack baremetal node provide $node\ndone\n\n# Wait for nodes to reach \"available\" state\nwatch openstack baremetal node list\n\n# Create an instance on baremetal\nopenstack server show baremetal-test || {\n openstack server create baremetal-test --flavor baremetal --image Fedora-Cloud-Base-38 --nic net-id=provisioning --wait\n}\n\n# Check instance status and network connectivity\nopenstack server show baremetal-test\nping -c 4 $(openstack server show baremetal-test -f json -c addresses | jq -r .addresses.provisioning[0])\n
","text":"export OS_CLOUD=standalone\nsource ~/install_yamls/devsetup/scripts/edpm-deploy-instance.sh\n
Confirm the image UUID can be seen in Ceph's images pool.
ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l\n
Create a Cinder volume, a backup from it, and snapshot it.
openstack volume create --image cirros --bootable --size 1 disk\nopenstack volume backup create --name backup disk\nopenstack volume snapshot create --volume disk snapshot\n
Add volume to the test VM
openstack server add volume test disk\n
"},{"location":"contributing/development_environment/#performing-the-data-plane-adoption","title":"Performing the Data Plane Adoption","text":"The development environment is now set up, you can go to the Adoption documentation and perform adoption manually, or run the test suite against your environment.
"},{"location":"contributing/development_environment/#reset-the-environment-to-pre-adoption-state","title":"Reset the environment to pre-adoption state","text":"The development environment must be rolled back in case we want to execute another Adoption run.
Delete the data-plane and control-plane resources from the CRC vm
oc delete osdp openstack\noc delete oscp openstack\n
Revert the standalone vm to the snapshotted state
cd ~/install_yamls/devsetup\nmake standalone_revert\n
Clean up and initialize the storage PVs in CRC vm cd ..\nmake crc_storage_cleanup\nmake crc_storage\n
"},{"location":"contributing/development_environment/#experimenting-with-an-additional-compute-node","title":"Experimenting with an additional compute node","text":"The following is not on the critical path of preparing the development environment for Adoption, but it shows how to make the environment work with an additional compute node VM.
The remaining steps should be completed on the hypervisor hosting crc and edpm-compute-0.
"},{"location":"contributing/development_environment/#deploy-ng-control-plane-with-ceph","title":"Deploy NG Control Plane with Ceph","text":"Export the Ceph configuration from edpm-compute-0 into a secret.
SSH=$(ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100)\nKEY=$($SSH \"cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0\")\nCONF=$($SSH \"cat /etc/ceph/ceph.conf | base64 -w 0\")\n\ncat <<EOF > ceph_secret.yaml\napiVersion: v1\ndata:\n ceph.client.openstack.keyring: $KEY\n ceph.conf: $CONF\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\ntype: Opaque\nEOF\n\noc create -f ceph_secret.yaml\n
Deploy the NG control plane with Ceph as backend for Glance and Cinder. As described in the install_yamls README, use the sample config located at https://github.com/openstack-k8s-operators/openstack-operator/blob/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml but make sure to replace the _FSID_
in the sample with the one from the secret created in the previous step. curl -o /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml https://raw.githubusercontent.com/openstack-k8s-operators/openstack-operator/main/config/samples/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\nFSID=$(oc get secret ceph-conf-files -o json | jq -r '.data.\"ceph.conf\"' | base64 -d | grep fsid | sed -e 's/fsid = //') && echo $FSID\nsed -i \"s/_FSID_/${FSID}/\" /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\noc apply -f /tmp/core_v1beta1_openstackcontrolplane_network_isolation_ceph.yaml\n
A NG control plane which uses the same Ceph backend should now be functional. If you create a test image on the NG system to confirm it works from the configuration above, be sure to read the warning in the next section.
Before beginning adoption testing or development you may wish to deploy an EDPM node as described in the following section.
"},{"location":"contributing/development_environment/#warning-about-two-openstacks-and-one-ceph","title":"Warning about two OpenStacks and one Ceph","text":"Though workloads can be created in the NG deployment to test, be careful not to confuse them with workloads from the Wallaby cluster to be migrated. The following scenario is now possible.
A Glance image exists on the Wallaby OpenStack to be adopted.
[stack@standalone standalone]$ export OS_CLOUD=standalone\n[stack@standalone standalone]$ openstack image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 33a43519-a960-4cd0-a593-eca56ee553aa | cirros | active |\n+--------------------------------------+--------+--------+\n[stack@standalone standalone]$\n
If you now create an image with the NG cluster, then a Glance image will exsit on the NG OpenStack which will adopt the workloads of the wallaby. [fultonj@hamfast ng]$ export OS_CLOUD=default\n[fultonj@hamfast ng]$ export OS_PASSWORD=12345678\n[fultonj@hamfast ng]$ openstack image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 4ebccb29-193b-4d52-9ffd-034d440e073c | cirros | active |\n+--------------------------------------+--------+--------+\n[fultonj@hamfast ng]$\n
Both Glance images are stored in the same Ceph pool. ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100 sudo cephadm shell -- rbd -p images ls -l\nInferring fsid 7133115f-7751-5c2f-88bd-fbff2f140791\nUsing recent ceph image quay.rdoproject.org/tripleowallabycentos9/daemon@sha256:aa259dd2439dfaa60b27c9ebb4fb310cdf1e8e62aa7467df350baf22c5d992d8\nNAME SIZE PARENT FMT PROT LOCK\n33a43519-a960-4cd0-a593-eca56ee553aa 273 B 2\n33a43519-a960-4cd0-a593-eca56ee553aa@snap 273 B 2 yes\n4ebccb29-193b-4d52-9ffd-034d440e073c 112 MiB 2\n4ebccb29-193b-4d52-9ffd-034d440e073c@snap 112 MiB 2 yes\n
However, as far as each Glance service is concerned each has one image. Thus, in order to avoid confusion during adoption the test Glance image on the NG OpenStack should be deleted. openstack image delete 4ebccb29-193b-4d52-9ffd-034d440e073c\n
Connecting the NG OpenStack to the existing Ceph cluster is part of the adoption procedure so that the data migration can be minimized but understand the implications of the above example."},{"location":"contributing/development_environment/#deploy-edpm-compute-1","title":"Deploy edpm-compute-1","text":"edpm-compute-0 is not available as a standard EDPM system to be managed by edpm-ansible or dataplane-operator because it hosts the wallaby deployment which will be adopted and after adoption it will only host the Ceph server.
Use the install_yamls devsetup to create additional virtual machines and be sure that the EDPM_COMPUTE_SUFFIX
is set to 1
or greater. Do not set EDPM_COMPUTE_SUFFIX
to 0
or you could delete the Wallaby system created in the previous section.
When deploying EDPM nodes add an extraMounts
like the following in the OpenStackDataPlaneNodeSet
CR nodeTemplate
so that they will be configured to use the same Ceph cluster.
edpm-compute:\n nodeTemplate:\n extraMounts:\n - extraVolType: Ceph\n volumes:\n - name: ceph\n secret:\n secretName: ceph-conf-files\n mounts:\n - name: ceph\n mountPath: \"/etc/ceph\"\n readOnly: true\n
A NG data plane which uses the same Ceph backend should now be functional. Be careful about not confusing new workloads to test the NG OpenStack with the Wallaby OpenStack as described in the previous section.
"},{"location":"contributing/development_environment/#begin-adoption-testing-or-development","title":"Begin Adoption Testing or Development","text":"We should now have:
An environment above is assumed to be available in the Glance Adoption documentation. You may now follow other Data Plane Adoption procedures described in the documentation. The same pattern can be applied to other services.
"},{"location":"contributing/documentation/","title":"Contributing to documentation","text":""},{"location":"contributing/documentation/#rendering-documentation-locally","title":"Rendering documentation locally","text":"Install docs build requirements into virtualenv:
python3 -m venv local/docs-venv\nsource local/docs-venv/bin/activate\npip install -r docs/doc_requirements.txt\n
Serve docs site on localhost:
mkdocs serve\n
Click the link it outputs. As you save changes to files modified in your editor, the browser will automatically show the new content.
"},{"location":"contributing/documentation/#patterns-and-tips-for-contributing-to-documentation","title":"Patterns and tips for contributing to documentation","text":"Pages concerning individual components/services should make sense in the context of the broader adoption procedure. While adopting a service in isolation is an option for developers, let's write the documentation with the assumption the adoption procedure is being done in full, going step by step (one doc after another).
The procedure should be written with production use in mind. This repository could be used as a starting point for product technical documentation. We should not tie the documentation to something that wouldn't translate well from dev envs to production.
If possible, try to make code snippets copy-pastable. Use shell variables if the snippets should be parametrized. Use oc
rather than kubectl
in snippets.
Focus on the \"happy path\" in the docs as much as possible, troubleshooting info can go into the Troubleshooting page, or alternatively a troubleshooting section at the end of the document, visibly separated from the main procedure.
The full procedure will inevitably happen to be quite long, so let's try to be concise in writing to keep the docs consumable (but not to a point of making things difficult to understand or omitting important things).
A bash alias can be created for long command however when implementing them in the test roles you should transform them to avoid command not found errors. From:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n\nopenstack endpoint list | grep network\n
TO: alias openstack=\"oc exec -t openstackclient -- openstack\"\n\n${BASH_ALIASES[openstack]} endpoint list | grep network\n
The adoption docs repository also includes a test suite for Adoption. There are targets in the Makefile which can be used to execute the test suite:
test-minimal
- a minimal test scenario, the eventual set of services in this scenario should be the \"core\" services needed to launch a VM. This scenario assumes local storage backend for services like Glance and Cinder.
test-with-ceph
- like 'minimal' but with Ceph storage backend for Glance and Cinder.
Create tests/vars.yaml
and tests/secrets.yaml
by copying the included samples (tests/vars.sample.yaml
, tests/secrets.sample.yaml
).
Walk through the tests/vars.yaml
and tests/secrets.yaml
files and see if you need to edit any values. If you are using the documented development environment, majority of the defaults should work out of the box. The comments in the YAML files will guide you regarding the expected values. You may want to double check that these variables suit your environment:
install_yamls_path
tripleo_passwords
controller*_ssh
edpm_privatekey_path
timesync_ntp_servers
The interface between the execution infrastructure and the test suite is an Ansible inventory and variables files. Inventory and variable samples are provided. To run the tests, follow this procedure:
sudo dnf -y install python-devel\npython3 -m venv venv\nsource venv/bin/activate\npip install openstackclient osc_placement jmespath\nansible-galaxy collection install community.general\n
make test-with-ceph
(the documented development environment does include Ceph).If you are using Ceph-less environment, you should run make test-minimal
.
Please be aware of the following when changing the test suite:
The purpose of the test suite is to verify what the user would run if they were following the docs. We don't want to loosely rewrite the docs into Ansible code following Ansible best practices. We want to test the exact same bash commands/snippets that are written in the docs. This often means that we should be using the shell
module and do a verbatim copy/paste from docs, instead of using the best Ansible module for the task at hand.
The following instructions create OpenStackControlPlane CR with basic backend services deployed, and all the OpenStack services disabled. This will be the foundation of the podified control plane.
In subsequent steps, we'll import the original databases and then add podified OpenStack control plane services.
"},{"location":"openstack/backend_services_deployment/#prerequisites","title":"Prerequisites","text":"The cloud which we want to adopt is up and running. It's on OpenStack Wallaby release.
The openstack-operator
is deployed, but OpenStackControlPlane
is not deployed.
For developer/CI environments, the openstack operator can be deployed by running make openstack
inside install_yamls repo.
For production environments, the deployment method will likely be different.
For developer/CI environments driven by install_yamls, make sure you've run make crc_storage
.
ADMIN_PASSWORD=SomePassword\n
To use the existing OpenStack deployment password:
ADMIN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AdminPassword:' | awk -F ': ' '{ print $2; }')\n
E.g. in developer environments with TripleO Standalone, the passwords can be extracted like this:
CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')\nGLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')\nHEAT_AUTH_ENCRYPTION_KEY=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatAuthEncryptionKey:' | awk -F ': ' '{ print $2; }')\nHEAT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' HeatPassword:' | awk -F ': ' '{ print $2; }')\nIRONIC_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' IronicPassword:' | awk -F ': ' '{ print $2; }')\nMANILA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' ManilaPassword:' | awk -F ': ' '{ print $2; }')\nNEUTRON_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NeutronPassword:' | awk -F ': ' '{ print $2; }')\nNOVA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' NovaPassword:' | awk -F ': ' '{ print $2; }')\nOCTAVIA_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' OctaviaPassword:' | awk -F ': ' '{ print $2; }')\nPLACEMENT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' PlacementPassword:' | awk -F ': ' '{ print $2; }')\n
"},{"location":"openstack/backend_services_deployment/#pre-checks","title":"Pre-checks","text":""},{"location":"openstack/backend_services_deployment/#procedure-backend-services-deployment","title":"Procedure - backend services deployment","text":"oc project openstack\n
The procedure for this will vary, but in developer/CI environments we use install_yamls:
# in install_yamls\nmake input\n
$ADMIN_PASSWORD
is different than the already set password in osp-secret
, amend the AdminPassword
key in the osp-secret
correspondingly:oc set data secret/osp-secret \"AdminPassword=$ADMIN_PASSWORD\"\n
osp-secret
to match the service account passwords from the original deployment:oc set data secret/osp-secret \"CinderPassword=$CINDER_PASSWORD\"\noc set data secret/osp-secret \"GlancePassword=$GLANCE_PASSWORD\"\noc set data secret/osp-secret \"HeatAuthEncryptionKey=$HEAT_AUTH_ENCRYPTION_KEY\"\noc set data secret/osp-secret \"HeatPassword=$HEAT_PASSWORD\"\noc set data secret/osp-secret \"IronicPassword=$IRONIC_PASSWORD\"\noc set data secret/osp-secret \"ManilaPassword=$MANILA_PASSWORD\"\noc set data secret/osp-secret \"NeutronPassword=$NEUTRON_PASSWORD\"\noc set data secret/osp-secret \"NovaPassword=$NOVA_PASSWORD\"\noc set data secret/osp-secret \"OctaviaPassword=$OCTAVIA_PASSWORD\"\noc set data secret/osp-secret \"PlacementPassword=$PLACEMENT_PASSWORD\"\n
oc apply -f - <<EOF\napiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n\n cinder:\n enabled: false\n template:\n cinderAPI: {}\n cinderScheduler: {}\n cinderBackup: {}\n cinderVolumes: {}\n\n dns:\n template:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: ctlplane\n metallb.universe.tf/allow-shared-ip: ctlplane\n metallb.universe.tf/loadBalancerIPs: 192.168.122.80\n spec:\n type: LoadBalancer\n options:\n - key: server\n values:\n - 192.168.122.1\n replicas: 1\n\n glance:\n enabled: false\n template:\n glanceAPI: {}\n\n horizon:\n enabled: false\n template: {}\n\n ironic:\n enabled: false\n template:\n ironicConductors: []\n\n keystone:\n enabled: false\n template: {}\n\n manila:\n enabled: false\n template:\n manilaAPI: {}\n manilaScheduler: {}\n manilaShares: {}\n\n mariadb:\n templates:\n openstack:\n storageRequest: 500M\n openstack-cell1:\n storageRequest: 500M\n\n memcached:\n enabled: true\n templates:\n memcached:\n replicas: 1\n\n neutron:\n enabled: false\n template: {}\n\n nova:\n enabled: false\n template: {}\n\n ovn:\n enabled: false\n template:\n ovnDBCluster:\n ovndbcluster-nb:\n dbType: NB\n storageRequest: 10G\n networkAttachment: internalapi\n ovndbcluster-sb:\n dbType: SB\n storageRequest: 10G\n networkAttachment: internalapi\n ovnNorthd:\n networkAttachment: internalapi\n replicas: 1\n ovnController:\n networkAttachment: tenant\n\n placement:\n enabled: false\n template: {}\n\n rabbitmq:\n templates:\n rabbitmq:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.85\n spec:\n type: LoadBalancer\n rabbitmq-cell1:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.86\n spec:\n type: LoadBalancer\n\n telemetry:\n enabled: false\n template: {}\nEOF\n
"},{"location":"openstack/backend_services_deployment/#post-checks","title":"Post-checks","text":"oc get pod mariadb-openstack -o jsonpath='{.status.phase}{\"\\n\"}'\n
"},{"location":"openstack/ceph_backend_configuration/","title":"Ceph backend configuration (if applicable)","text":"If the original deployment uses a Ceph storage backend for any service (e.g. Glance, Cinder, Nova, Manila), the same backend must be used in the adopted deployment and CRs must be configured accordingly.
"},{"location":"openstack/ceph_backend_configuration/#prerequisites","title":"Prerequisites","text":"OpenStackControlPlane
CR must already exist.Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
CEPH_SSH=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\"\nCEPH_KEY=$($CEPH_SSH \"cat /etc/ceph/ceph.client.openstack.keyring | base64 -w 0\")\nCEPH_CONF=$($CEPH_SSH \"cat /etc/ceph/ceph.conf | base64 -w 0\")\n
"},{"location":"openstack/ceph_backend_configuration/#modify-capabilities-of-the-openstack-user-to-accommodate-manila","title":"Modify capabilities of the \"openstack\" user to accommodate Manila","text":"On TripleO environments, the CephFS driver in Manila is configured to use its own keypair. For convenience, let's modify the openstack
user so that we can use it across all OpenStack services.
Using the same user across the services serves two purposes: - The capabilities of the user required to interact with the Manila service became far simpler and hence, more became more secure with RHOSP 18. - It is simpler to create a common ceph secret (keyring and ceph config file) and propagate the secret to all services that need it.
$CEPH_SSH cephadm shell\nceph auth caps client.openstack \\\n mgr 'allow *' \\\n mon 'allow r, profile rbd' \\\n osd 'profile rbd pool=vms, profile rbd pool=volumes, profile rbd pool=images, allow rw pool manila_data'\n
"},{"location":"openstack/ceph_backend_configuration/#ceph-backend-configuration","title":"Ceph backend configuration","text":"Create the ceph-conf-files
secret, containing Ceph configuration:
oc apply -f - <<EOF\napiVersion: v1\ndata:\n ceph.client.openstack.keyring: $CEPH_KEY\n ceph.conf: $CEPH_CONF\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\ntype: Opaque\nEOF\n
The content of the file should look something like this:
---\napiVersion: v1\nkind: Secret\nmetadata:\n name: ceph-conf-files\n namespace: openstack\nstringData:\n ceph.client.openstack.keyring: |\n [client.openstack]\n key = <secret key>\n caps mgr = \"allow *\"\n caps mon = \"profile rbd\"\n caps osd = \"profile rbd pool=images\"\n ceph.conf: |\n [global]\n fsid = 7a1719e8-9c59-49e2-ae2b-d7eb08c695d4\n mon_host = 10.1.1.2,10.1.1.3,10.1.1.4\n
Configure extraMounts
within the OpenStackControlPlane
CR:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n extraMounts:\n - name: v1\n region: r1\n extraVol:\n - propagation:\n - CinderVolume\n - CinderBackup\n - GlanceAPI\n - ManilaShare\n extraVolType: Ceph\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n mounts:\n - name: ceph\n mountPath: \"/etc/ceph\"\n readOnly: true\n'\n
"},{"location":"openstack/ceph_backend_configuration/#getting-ceph-fsid","title":"Getting Ceph FSID","text":"Configuring some OpenStack services to use Ceph backend may require the FSID value. You can fetch the value from the config like so:
CEPH_FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data.\"ceph.conf\"' | base64 -d | grep fsid | sed -e 's/fsid = //')\n
"},{"location":"openstack/cinder_adoption/","title":"Cinder adoption","text":"Adopting a director deployed Cinder service into OpenStack may require some thought because it's not always a simple process.
Usually the adoption process entails:
cinder.conf
file.This guide provides necessary knowledge to complete these steps in most situations, but it still requires knowledge on how OpenStack services work and the structure of a Cinder configuration file.
"},{"location":"openstack/cinder_adoption/#limitations","title":"Limitations","text":"There are currently some limitations that are worth highlighting; some are related to this guideline while some to the operator:
There is no global nodeSelector
for all cinder volumes, so it needs to be specified per backend. This may change in the future.
There is no global customServiceConfig
or customServiceConfigSecrets
for all cinder volumes, so it needs to be specified per backend. This may change in the future.
Adoption of LVM backends, where the volume data is stored in the compute nodes, is not currently being documented in this process. It may get documented in the future.
Support for Cinder backends that require kernel modules not included in RHEL has not been tested in Operator deployed OpenStack so it is not documented in this guide.
Adoption of DCN/Edge deployment is not currently described in this guide.
Previous Adoption steps completed. Notably, cinder service must have been stopped and the service databases must already be imported into the podified MariaDB.
Storage network has been properly configured on the OpenShift cluster.
No new environmental variables need to be defined, though we use the CONTROLLER1_SSH
that was defined in a previous step for the pre-checks.
We are going to need the contents of cinder.conf
, so we may want to download it to have it locally accessible:
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf > cinder.conf\n
"},{"location":"openstack/cinder_adoption/#prepare-openshift","title":"Prepare OpenShift","text":"As explained the planning section before deploying OpenStack in OpenShift we need to ensure that the networks are ready, that we have decided the node selection, and also make sure any necessary changes to the OpenShift nodes have been made. For Cinder volume and backup services all these 3 must be carefully considered.
"},{"location":"openstack/cinder_adoption/#node-selection","title":"Node Selection","text":"We may need, or want, to restrict the OpenShift nodes where cinder volume and backup services can run.
The best example of when we need to do node selection for a specific cinder service in when we deploy Cinder with the LVM driver. In that scenario the LVM data where the volumes are stored only exists in a specific host, so we need to pin the cinder-volume service to that specific OpenShift node. Running the service on any other OpenShift node would not work. Since nodeSelector
only works on labels we cannot use the OpenShift host node name to restrict the LVM backend and we'll need to identify it using a unique label, an existing or new one:
$ oc label nodes worker0 lvm=cinder-volumes\n
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n cinder:\n enabled: true\n template:\n cinderVolumes:\n lvm-iscsi:\n nodeSelector:\n lvm: cinder-volumes\n< . . . >\n
As mentioned in the Node Selector guide, an example where we need to use labels is when using FC storage and we don't have HBA cards in all our OpenShift nodes. In this scenario we would need to restrict all the cinder volume backends (not only the FC one) as well as the backup services.
Depending on the cinder backends, their configuration, and the usage of Cinder, we can have network intensive cinder volume services with lots of I/O as well as cinder backup services that are not only network intensive but also memory and CPU intensive. This may be a concern for the OpenShift human operators, and they may want to use the nodeSelector
to prevent these service from interfering with their other OpenShift workloads
Please make sure to read the Nodes Selector guide before continuing, as we'll be referring to some of the concepts explained there in the following sections.
When selecting the nodes where cinder volume is going to run please remember that cinder-volume may also use local storage when downloading a glance image for the create volume from image operation, and it can require a considerable amount of space when having concurrent operations and not using cinder volume cache.
If we don't have nodes with enough local disk space for the temporary images we can use a remote NFS location for the images. This is something that we had to manually setup in Director deployments, but with operators we can easily do it automatically using the extra volumes feature ()extraMounts
.
Due to the specifics of the storage transport protocols some changes may be required on the OpenShift side, and although this is something that must be documented by the Vendor here wer are going to provide some generic instructions that can serve as a guide for the different transport protocols.
Check the backend sections in our cinder.conf
file that are listed in the enabled_backends
configuration option to figure out the transport storage protocol used by the backend.
Depending on the backend we can find the transport protocol:
Looking at the volume_driver
configuration option, as it may contain the protocol itself: RBD, iSCSI, FC...
Looking at the target_protocol
configuration option
Warning: Any time a MachineConfig
is used to make changes to OpenShift nodes the node will reboot!! Act accordingly.
There's nothing to do for NFS. OpenShift can connect to NFS backends without any additional changes.
"},{"location":"openstack/cinder_adoption/#rbdceph","title":"RBD/Ceph","text":"There's nothing to do for RBD/Ceph in terms of preparing the nodes, OpenShift can connect to Ceph backends without any additional changes. Credentials and configuration files will need to be provided to the services though.
"},{"location":"openstack/cinder_adoption/#iscsi","title":"iSCSI","text":"Connecting to iSCSI volumes requires that the iSCSI initiator is running on the OpenShift hosts hosts where volume and backup services are going to run, because the Linux Open iSCSI initiator doesn't currently support network namespaces, so we must only run 1 instance of the service for the normal OpenShift usage, plus the OpenShift CSI plugins, plus the OpenStack services.
If we are not already running iscsid
on the OpenShift nodes then we'll need to apply a MachineConfig
similar to this one:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-enable-iscsid\nspec:\n config:\n ignition:\n version: 3.2.0\n systemd:\n units:\n - enabled: true\n name: iscsid.service\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
For production deployments using iSCSI volumes we always recommend setting up multipathing, please look at the multipathing section to see how to configure it.
TODO: Add, or at least mention, the Nova eDPM side for iSCSI.
"},{"location":"openstack/cinder_adoption/#fc","title":"FC","text":"There's nothing to do for FC volumes to work, but the cinder volume and cinder backup services need to run in an OpenShift host that has HBAs, so if there are nodes that don't have HBAs then we'll need to use labels to restrict where these services can run, as mentioned in the [node selection section] (#node-selection).
This also means that for virtualized OpenShift clusters using FC we'll need to expose the host's HBAs inside the VM.
For production deployments using FC volumes we always recommend setting up multipathing, please look at the multipathing section to see how to configure it.
"},{"location":"openstack/cinder_adoption/#nvme-of","title":"NVMe-oF","text":"Connecting to NVMe-oF volumes requires that the nvme kernel modules are loaded on the OpenShift hosts.
If we are not already loading the nvme-fabrics
module on the OpenShift nodes where volume and backup services are going to run then we'll need to apply a MachineConfig
similar to this one:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-load-nvme-fabrics\nspec:\n config:\n ignition:\n version: 3.2.0\n storage:\n files:\n - path: /etc/modules-load.d/nvme_fabrics.conf\n overwrite: false\n # Mode must be decimal, this is 0644\n mode: 420\n user:\n name: root\n group:\n name: root\n contents:\n # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.\n # This is the rfc2397 text/plain string format\n source: data:,nvme-fabrics\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
We are only loading the nvme-fabrics
module because it takes care of loading the transport specific modules (tcp, rdma, fc) as needed.
For production deployments using NVMe-oF volumes we always recommend using multipathing. For NVMe-oF volumes OpenStack uses native multipathing, called ANA.
Once the OpenShift nodes have rebooted and are loading the nvme-fabrics
module we can confirm that the Operating System is configured and supports ANA by checking on the host:
cat /sys/module/nvme_core/parameters/multipath\n
Attention: ANA doesn't use the Linux Multipathing Device Mapper, but the *current OpenStack code requires multipathd
on compute nodes to be running for Nova to be able to use multipathing, so please remember to follow the multipathing part for compute nodes on the multipathing section.
TODO: Add, or at least mention, the Nova eDPM side for NVMe-oF.
"},{"location":"openstack/cinder_adoption/#multipathing","title":"Multipathing","text":"For iSCSI and FC protocols we always recommend using multipathing, which has 4 parts:
To prepare the OpenShift hosts we need to ensure that the Linux Multipath Device Mapper is configured and running on the OpenShift hosts, and we do that using MachineConfig
like this one:
# Includes the /etc/multipathd.conf contents and the systemd unit changes\napiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: worker\n service: cinder\n name: 99-master-cinder-enable-multipathd\nspec:\n config:\n ignition:\n version: 3.2.0\n storage:\n files:\n - path: /etc/multipath.conf\n overwrite: false\n # Mode must be decimal, this is 0600\n mode: 384\n user:\n name: root\n group:\n name: root\n contents:\n # Source can be a http, https, tftp, s3, gs, or data as defined in rfc2397.\n # This is the rfc2397 text/plain string format\n source: data:,defaults%20%7B%0A%20%20user_friendly_names%20no%0A%20%20recheck_wwid%20yes%0A%20%20skip_kpartx%20yes%0A%20%20find_multipaths%20yes%0A%7D%0A%0Ablacklist%20%7B%0A%7D\n systemd:\n units:\n - enabled: true\n name: multipathd.service\n
Remember that if we are using labels to restrict the nodes where cinder services are running we'll need to use a MachineConfigPool
as described in the nodes selector guide to limit the effects of the MachineConfig
to only the nodes were our services may run.
If we are using a toy single node deployment to test the process we may need to replace worker
with master
in the MachineConfig
.
To configure the cinder services to use multipathing we need to enable the use_multipath_for_image_xfer
configuration option in all the backend sections and in the [DEFAULT]
section for the backup service, but in Podified deployments we don't need to worry about it, because that's the default. So as long as we don't override it setting use_multipath_for_image_xfer = false
then multipathing will work as long as the service is running on the OpenShift host.
TODO: Add, or at least mention, the Nova eDPM side for Multipathing once it's implemented.
"},{"location":"openstack/cinder_adoption/#configurations","title":"Configurations","text":"As described in the planning Cinder is configured using configuration snippets instead of using obscure configuration parameters defined by the installer.
The recommended way to deploy Cinder volume backends has changed to remove old limitations, add flexibility, and improve operations in general.
When deploying with Director we used to run a single Cinder volume service with all our backends (each backend would run on its own process), and even though that way of deploying is still supported, we don't recommend it. We recommend using a volume service per backend since it's a superior deployment model.
So for an LVM and a Ceph backend we would have 2 entries in cinderVolume
and, as mentioned in the limitations section, we cannot set global defaults for all volume services, so we would have to define it for each of them, like this:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n enabled: true\n template:\n cinderVolume:\n lvm:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n [lvm]\n< . . . >\n ceph:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n [ceph]\n< . . . >\n
Reminder that for volume backends that have sensitive information using Secret
and the customServiceConfigSecrets
key is the recommended way to go.
For adoption instead of using a whole deployment manifest we'll use a targeted patch, like we did with other services, and in this patch we will enable the different cinder services with their specific configurations.
WARNING: Check that all configuration options are still valid for the new OpenStack version, since configuration options may have been deprecated, removed, or added. This applies to both backend driver specific configuration options and other generic options.
There are 2 ways to prepare a cinder configuration for adoption, tailor-making it or doing it quick and dirty. There is no difference in how Cinder will operate with both methods, so we are free to chose, though we recommend tailor-making it whenever possible.
The high level explanation of the tailor-made approach is:
Determine what part of the configuration is generic for all the cinder services and remove anything that would change when deployed in OpenShift, like the connection
in the [dabase]
section, the transport_url
and log_dir
in [DEFAULT]
, the whole [coordination]
section. This configuration goes into the customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
) at the cinder: template:
level.
Determine if there's any scheduler specific configuration and add it to the customServiceConfig
section in cinder: template: cinderScheduler
.
Determine if there's any API specific configuration and add it to the customServiceConfig
section in cinder: template: cinderAPI
.
If we have cinder backup deployed, then we'll get the cinder backup relevant configuration options and add them to customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
) at the cinder: template: cinderBackup:
level. We should remove the host
configuration in the [DEFAULT]
section to facilitate supporting multiple replicas in the future.
Determine the individual volume backend configuration for each of the drivers. The configuration will not only be the specific driver section, it should also include the [backend_defaults]
section and FC zoning sections is they are being used, because the cinder operator doesn't support a customServiceConfig
section global for all volume services. Each backend would have its own section under cinder: template: cinderVolumes
and the configuration would go in customServiceConfig
(or a Secret
and then used in customServiceConfigSecrets
).
Check if any of the cinder volume drivers being used requires a custom vendor image. If they do, find the location of the image in the vendor's instruction available in the w OpenStack Cinder ecosystem page and add it under the specific's driver section using the containerImage
key. For example, if we had a Pure Storage array and the driver was already certified for OSP18, then we would have something like this:
spec:\n cinder:\n enabled: true\n template:\n cinderVolume:\n pure:\n containerImage: registry.connect.redhat.com/purestorage/openstack-cinder-volume-pure-rhosp-18-0'\n customServiceConfigSecrets:\n - openstack-cinder-pure-cfg\n< . . . >\n
Secrets
or ConfigMap
to store the information in OpenShift and then the extraMounts
key. For example, for the Ceph credentials stored in a Secret
called ceph-conf-files
we would patch the top level extraMounts
in OpenstackControlPlane
:spec:\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/ceph\n name: ceph\n readOnly: true\n propagation:\n - CinderVolume\n - CinderBackup\n - Glance\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n
But for a service specific one, like the API policy, we would do it directly on the service itself, in this example we include the cinder API configuration that references the policy we are adding from a ConfigMap
called my-cinder-conf
that has a key policy
with the contents of the policy: spec:\n cinder:\n enabled: true\n template:\n cinderAPI:\n customServiceConfig: |\n [oslo_policy]\n policy_file=/etc/cinder/api/policy.yaml\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/cinder/api\n name: policy\n readOnly: true\n propagation:\n - CinderAPI\n volumes:\n - name: policy\n projected:\n sources:\n - configMap:\n name: my-cinder-conf\n items:\n - key: policy\n path: policy.yaml\n
The quick and dirty process is more straightforward:
Create an agnostic configuration file removing any specifics from the old deployment's cinder.conf
file, like the connection
in the [dabase]
section, the transport_url
and log_dir
in [DEFAULT]
, the whole [coordination]
section, etc..
Assuming the configuration has sensitive information, drop the modified contents of the whole file into a Secret
.
Reference this secret in all the services, creating a cinder volumes section for each backend and just adding the respective enabled_backends
option.
Add external files as mentioned in the last bullet of the tailor-made configuration explanation.
Example of what the quick and dirty configuration patch would look like:
spec:\n cinder:\n enabled: true\n template:\n cinderAPI:\n customServiceConfigSecrets:\n - cinder-conf\n cinderScheduler:\n customServiceConfigSecrets:\n - cinder-conf\n cinderBackup:\n customServiceConfigSecrets:\n - cinder-conf\n cinderVolume:\n lvm1:\n customServiceConfig: |\n [DEFAULT]\n enabled_backends = lvm1\n customServiceConfigSecrets:\n - cinder-conf\n lvm2:\n customServiceConfig: |\n [DEFAULT]\n enabled_backends = lvm2\n customServiceConfigSecrets:\n - cinder-conf\n
"},{"location":"openstack/cinder_adoption/#configuration-generation-helper-tool","title":"Configuration generation helper tool","text":"Creating the right Cinder configuration files to deploy using Operators may sometimes be a complicated experience, especially the first times, so we have a helper tool that can create a draft of the files from a cinder.conf
file.
This tool is not meant to be a automation tool, it's mostly to help us get the gist of it, maybe point out some potential pitfalls and reminders.
Attention: The tools requires PyYAML
Python package to be installed (pip install PyYAML
).
This cinder-cfg.py script defaults to reading the cinder.conf
file from the current directory (unless --config
option is used) and outputs files to the current directory (unless --out-dir
option is used).
In the output directory we'll always get a cinder.patch
file with the Cinder specific configuration patch to apply to the OpenStackControlPlane
CR but we may also get an additional file called cinder-prereq.yaml
file with some Secrets
and MachineConfigs
.
Example of an invocation setting input and output explicitly to the defaults for a Ceph backend:
$ python cinder-cfg.py --config cinder.conf --out-dir ./\nWARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using \"extraMounts\" or remove the option.\n\nWARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.\n\nWARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.\n\nWARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!\n\nOutput written at ./: cinder.patch\n
The script outputs some warnings to let us know things we may need to do manually -adding the custom policy, provide the ceph configuration files- and also let us know a change in how the service_user
has been removed.
A different example when using multiple backends, one of them being a 3PAR FC could be:
$ python cinder-cfg.py --config cinder.conf --out-dir ./\nWARNING:root:Cinder is configured to use ['/etc/cinder/policy.yaml'] as policy file, please ensure this file is available for the podified cinder services using \"extraMounts\" or remove the option.\n\nERROR:root:Backend hpe_fc requires a vendor container image, but there is no certified image available yet. Patch will use the last known image for reference, but IT WILL NOT WORK\n\nWARNING:root:Deployment uses Ceph, so make sure the Ceph credentials and configuration are present in OpenShift as a asecret and then use the extra volumes to make them available in all the services that would need them.\n\nWARNING:root:You were using user ['nova'] to talk to Nova, but in podified we prefer using the service keystone username, in this case ['cinder']. Dropping that configuration.\n\nWARNING:root:Configuration is using FC, please ensure all your OpenShift nodes have HBAs or use labels to ensure that Volume and Backup services are scheduled on nodes with HBAs.\n\nWARNING:root:ALWAYS REVIEW RESULTS, OUTPUT IS JUST A ROUGH DRAFT!!\n\nOutput written at ./: cinder.patch, cinder-prereq.yaml\n
In this case we can see that there are additional messages, so let's quickly go over them:
cinder.patch
file: cinderVolumes:\n hpe-fc:\n containerImage: registry.connect.redhat.com/hpe3parcinder/openstack-cinder-volume-hpe3parcinder17-0\n
The FC message reminds us that this transport protocol requires specific HBA cards to be present on the nodes where cinder services are running.
In this case we also see that it has created the cinder-prereq.yaml
file and if we look into it we'll see there is one MachineConfig
and one Secret
. The MachineConfig
is called 99-master-cinder-enable-multipathd
and like the name suggests enables multipathing on all the OCP worker nodes. The Secret
is called openstackcinder-volumes-hpe_fc
and contains the 3PAR backend configuration because it has sensitive information (credentials), and in the cinder.patch
file we'll see that it uses this configuration:
cinderVolumes:\n hpe-fc:\n customServiceConfigSecrets:\n - openstackcinder-volumes-hpe_fc\n
Assuming we have already stopped cinder services, prepared the OpenShift nodes, deployed the OpenStack operators and a bare OpenStack manifest, and migrated the database, and prepared the patch manifest with the Cinder service configuration, all that's left is to apply the patch and wait for the operator to apply the changes and deploy the Cinder services.
Our recommendation is to write the patch manifest into a file, for example cinder.patch
and then apply it with something like:
oc patch openstackcontrolplane openstack --type=merge --patch-file=cinder.patch\n
For example, for the RBD deployment from the Development Guide the cinder.patch
would look like this:
spec:\n extraMounts:\n - extraVol:\n - extraVolType: Ceph\n mounts:\n - mountPath: /etc/ceph\n name: ceph\n readOnly: true\n propagation:\n - CinderVolume\n - CinderBackup\n - Glance\n volumes:\n - name: ceph\n projected:\n sources:\n - secret:\n name: ceph-conf-files\n cinder:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n cinderAPI:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n default_volume_type=tripleo\n cinderScheduler:\n replicas: 1\n cinderBackup:\n networkAttachments:\n - storage\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n backup_driver=cinder.backup.drivers.ceph.CephBackupDriver\n backup_ceph_conf=/etc/ceph/ceph.conf\n backup_ceph_user=openstack\n backup_ceph_pool=backups\n cinderVolumes:\n ceph:\n networkAttachments:\n - storage\n replicas: 1\n customServiceConfig: |\n [tripleo_ceph]\n backend_host=hostgroup\n volume_backend_name=tripleo_ceph\n volume_driver=cinder.volume.drivers.rbd.RBDDriver\n rbd_ceph_conf=/etc/ceph/ceph.conf\n rbd_user=openstack\n rbd_pool=volumes\n rbd_flatten_volume_from_snapshot=False\n report_discard_supported=True\n
Once the services have been deployed we'll need to clean up the old scheduler and backup services which will appear as being down while we have others that appear as being up:
openstack volume service list\n\n+------------------+------------------------+------+---------+-------+----------------------------+\n| Binary | Host | Zone | Status | State | Updated At |\n+------------------+------------------------+------+---------+-------+----------------------------+\n| cinder-backup | standalone.localdomain | nova | enabled | down | 2023-06-28T11:00:59.000000 |\n| cinder-scheduler | standalone.localdomain | nova | enabled | down | 2023-06-28T11:00:29.000000 |\n| cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2023-06-28T17:00:03.000000 |\n| cinder-scheduler | cinder-scheduler-0 | nova | enabled | up | 2023-06-28T17:00:02.000000 |\n| cinder-backup | cinder-backup-0 | nova | enabled | up | 2023-06-28T17:00:01.000000 |\n+------------------+------------------------+------+---------+-------+----------------------------+\n
In this case we need to remove services for hosts standalone.localdomain
oc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-backup standalone.localdomain\noc exec -it cinder-scheduler-0 -- cinder-manage service remove cinder-scheduler standalone.localdomain\n
The reason why we haven't preserved the name of the backup service is because we have taken the opportunity to change its configuration to support Active-Active, even though we are not doing so right now because we have 1 replica.
Now that we have the Cinder services running we know that the DB schema migration has been completed and we can proceed to apply the DB data migrations. While it is not necessary to run these data migrations at this precise moment, because we can just run them right before the next upgrade, we consider that for adoption it's best to run them now to make sure there are no issues before running production workloads on the deployment.
The command to run the DB data migrations is:
oc exec -it cinder-scheduler-0 -- cinder-manage db online_data_migrations\n
"},{"location":"openstack/cinder_adoption/#post-checks","title":"Post-checks","text":"Before we can run any checks we need to set the right cloud configuration for the openstack
command to be able to connect to our OpenShift control plane.
Just like we did in the KeyStone adoption step we ensure we have the openstack
alias defined:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n
Now we can run a set of tests to confirm that the deployment is there using our old database contents:
openstack endpoint list --service cinderv3\n
openstack volume service list\n
openstack volume type list\nopenstack volume list\nopenstack volume snapshot list\nopenstack volume backup list\n
To confirm that everything not only looks good but it's also properly working we recommend doing some basic operations:
Create a volume from an image to check that the connection to glance is working.
openstack volume create --image cirros --bootable --size 1 disk_new\n
Backup the old attached volume to a new backup. Example:
openstack --os-volume-api-version 3.47 volume create --backup backup restored\n
We don't boot a nova instance using the new volume from image or try to detach the old volume because nova and cinder are still not connected.
"},{"location":"openstack/edpm_adoption/","title":"EDPM adoption","text":""},{"location":"openstack/edpm_adoption/#prerequisites","title":"Prerequisites","text":"Define the shell variables used in the Fast-forward upgrade steps below. The values are just illustrative, use values that are correct for your environment:
PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)\n
"},{"location":"openstack/edpm_adoption/#pre-checks","title":"Pre-checks","text":"oc apply -f - <<EOF\napiVersion: network.openstack.org/v1beta1\nkind: NetConfig\nmetadata:\n name: netconfig\nspec:\n networks:\n - name: CtlPlane\n dnsDomain: ctlplane.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 192.168.122.120\n start: 192.168.122.100\n - end: 192.168.122.200\n start: 192.168.122.150\n cidr: 192.168.122.0/24\n gateway: 192.168.122.1\n - name: InternalApi\n dnsDomain: internalapi.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.17.0.250\n start: 172.17.0.100\n cidr: 172.17.0.0/24\n vlan: 20\n - name: External\n dnsDomain: external.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 10.0.0.250\n start: 10.0.0.100\n cidr: 10.0.0.0/24\n gateway: 10.0.0.1\n - name: Storage\n dnsDomain: storage.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.18.0.250\n start: 172.18.0.100\n cidr: 172.18.0.0/24\n vlan: 21\n - name: StorageMgmt\n dnsDomain: storagemgmt.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.20.0.250\n start: 172.20.0.100\n cidr: 172.20.0.0/24\n vlan: 23\n - name: Tenant\n dnsDomain: tenant.example.com\n subnets:\n - name: subnet1\n allocationRanges:\n - end: 172.19.0.250\n start: 172.19.0.100\n cidr: 172.19.0.0/24\n vlan: 22\nEOF\n
"},{"location":"openstack/edpm_adoption/#procedure-edpm-adoption","title":"Procedure - EDPM adoption","text":"oc apply -f - <<EOF\napiVersion: v1\nkind: Secret\nmetadata:\n name: dataplane-adoption-secret\n namespace: openstack\ndata:\n ssh-privatekey: |\n$(cat ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa | base64 | sed 's/^/ /')\nEOF\n
nova-migration-ssh-key
secretcd \"$(mktemp -d)\"\nssh-keygen -f ./id -t ed25519 -N ''\noc get secret nova-migration-ssh-key || oc create secret generic nova-migration-ssh-key \\\n -n openstack \\\n --from-file=ssh-privatekey=id \\\n --from-file=ssh-publickey=id.pub \\\n --type kubernetes.io/ssh-auth\nrm -f id*\ncd -\n
oc apply -f - <<EOF\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: nova-compute-extraconfig\n namespace: openstack\ndata:\n 19-nova-compute-cell1-workarounds.conf: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n---\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneService\nmetadata:\n name: nova-compute-extraconfig\n namespace: openstack\nspec:\n label: nova.compute.extraconfig\n configMaps:\n - nova-compute-extraconfig\n secrets:\n - nova-cell1-compute-config\n - nova-migration-ssh-key\n playbook: osp.edpm.nova\nEOF\n
The secret nova-cell<X>-compute-config
is auto-generated for each cell<X>
. That secret, alongside nova-migration-ssh-key
, should always be specified for each custom OpenStackDataPlaneService
related to Nova.
oc apply -f - <<EOF\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneNodeSet\nmetadata:\n name: openstack\nspec:\n networkAttachments:\n - ctlplane\n preProvisioned: true\n services:\n - download-cache\n - configure-network\n - validate-network\n - install-os\n - configure-os\n - run-os\n - libvirt\n - nova-compute-extraconfig\n - ovn\n env:\n - name: ANSIBLE_CALLBACKS_ENABLED\n value: \"profile_tasks\"\n - name: ANSIBLE_FORCE_COLOR\n value: \"True\"\n nodes:\n standalone:\n hostName: standalone\n ansible:\n ansibleHost: 192.168.122.100\n networks:\n - defaultRoute: true\n fixedIP: 192.168.122.100\n name: CtlPlane\n subnetName: subnet1\n - name: InternalApi\n subnetName: subnet1\n - name: Storage\n subnetName: subnet1\n - name: Tenant\n subnetName: subnet1\n nodeTemplate:\n ansibleSSHPrivateKeySecret: dataplane-adoption-secret\n managementNetwork: ctlplane\n ansible:\n ansibleUser: root\n ansiblePort: 22\n ansibleVars:\n service_net_map:\n nova_api_network: internal_api\n nova_libvirt_network: internal_api\n\n # edpm_network_config\n # Default nic config template for a EDPM compute node\n # These vars are edpm_network_config role vars\n edpm_network_config_override: \"\"\n edpm_network_config_template: |\n ---\n {% set mtu_list = [ctlplane_mtu] %}\n {% for network in role_networks %}\n {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}\n {%- endfor %}\n {% set min_viable_mtu = mtu_list | max %}\n network_config:\n - type: ovs_bridge\n name: {{ neutron_physical_bridge_name }}\n mtu: {{ min_viable_mtu }}\n use_dhcp: false\n dns_servers: {{ ctlplane_dns_nameservers }}\n domain: {{ dns_search_domains }}\n addresses:\n - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}\n routes: {{ ctlplane_host_routes }}\n members:\n - type: interface\n name: nic1\n mtu: {{ min_viable_mtu }}\n # force the MAC address of the bridge to this interface\n primary: true\n {% for network in role_networks %}\n - type: vlan\n mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}\n vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}\n addresses:\n - ip_netmask:\n {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}\n routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}\n {% endfor %}\n\n edpm_network_config_hide_sensitive_logs: false\n #\n # These vars are for the network config templates themselves and are\n # considered EDPM network defaults.\n neutron_physical_bridge_name: br-ctlplane\n neutron_public_interface_name: eth0\n role_networks:\n - InternalApi\n - Storage\n - Tenant\n networks_lower:\n External: external\n InternalApi: internal_api\n Storage: storage\n Tenant: tenant\n\n # edpm_nodes_validation\n edpm_nodes_validation_validate_controllers_icmp: false\n edpm_nodes_validation_validate_gateway_icmp: false\n\n timesync_ntp_servers:\n - hostname: clock.redhat.com\n - hostname: clock2.redhat.com\n\n edpm_ovn_controller_agent_image: quay.io/podified-antelope-centos9/openstack-ovn-controller:current-podified\n edpm_iscsid_image: quay.io/podified-antelope-centos9/openstack-iscsid:current-podified\n edpm_logrotate_crond_image: quay.io/podified-antelope-centos9/openstack-cron:current-podified\n edpm_nova_compute_container_image: quay.io/podified-antelope-centos9/openstack-nova-compute:current-podified\n edpm_nova_libvirt_container_image: quay.io/podified-antelope-centos9/openstack-nova-libvirt:current-podified\n edpm_ovn_metadata_agent_image: quay.io/podified-antelope-centos9/openstack-neutron-metadata-agent-ovn:current-podified\n\n gather_facts: false\n enable_debug: false\n # edpm firewall, change the allowed CIDR if needed\n edpm_sshd_configure_firewall: true\n edpm_sshd_allowed_ranges: ['192.168.122.0/24']\n # SELinux module\n edpm_selinux_mode: enforcing\n plan: overcloud\nEOF\n
oc apply -f - <<EOF\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneDeployment\nmetadata:\n name: openstack\nspec:\n nodeSets:\n - openstack\nEOF\n
"},{"location":"openstack/edpm_adoption/#post-checks","title":"Post-checks","text":"Check if all the Ansible EE pods reaches Completed
status:
# watching the pods\nwatch oc get pod -l app=openstackansibleee\n
# following the ansible logs with:\noc logs -l app=openstackansibleee -f --max-log-requests 10\n
Wait for the dataplane node set to reach the Ready status:
oc wait --for condition=Ready osdpns/openstack --timeout=30m\n
Nova services rolling upgrade cannot be done during adoption, there is in a lock-step with Nova control plane services, because those are managed independently by EDPM ansible, and Kubernetes operators. Nova service operator and OpenStack Dataplane operator ensure upgrading is done independently of each other, by configuring [upgrade_levels]compute=auto
for Nova services. Nova control plane services apply the change right after CR is patched. Nova compute EDPM services will catch up the same config change with ansible deployment later on.
NOTE: Additional orchestration happening around the FFU workarounds configuration for Nova compute EDPM service is a subject of future changes.
Wait for cell1 Nova compute EDPM services version updated (it may take some time):
oc exec -it mariadb-openstack-cell1 -- mysql --user=root --password=${PODIFIED_DB_ROOT_PASSWORD} \\\n -e \"select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';\"\n
The above query should return an empty result as a completion criterion. Remove pre-FFU workarounds for Nova control plane services:
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '\nspec:\n nova:\n template:\n cellTemplates:\n cell0:\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n cell1:\n metadataServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n apiServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n metadataServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n schedulerServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n'\n
Wait for Nova control plane services' CRs to become ready:
oc wait --for condition=Ready --timeout=300s Nova/nova\n
Remove pre-FFU workarounds for Nova compute EDPM services:
oc apply -f - <<EOF\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: nova-compute-ffu\n namespace: openstack\ndata:\n 20-nova-compute-cell1-ffu-cleanup.conf: |\n [workarounds]\n disable_compute_service_check_for_ffu=false\n---\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneService\nmetadata:\n name: nova-compute-ffu\n namespace: openstack\nspec:\n label: nova.compute.ffu\n configMaps:\n - nova-compute-ffu\n secrets:\n - nova-cell1-compute-config\n - nova-migration-ssh-key\n playbook: osp.edpm.nova\n---\napiVersion: dataplane.openstack.org/v1beta1\nkind: OpenStackDataPlaneDeployment\nmetadata:\n name: openstack-nova-compute-ffu\n namespace: openstack\nspec:\n nodeSets:\n - openstack\n servicesOverride:\n - nova-compute-ffu\nEOF\n
Wait for Nova compute EDPM service to become ready:
oc wait --for condition=Ready osdpd/openstack-nova-compute-ffu --timeout=5m\n
Run Nova DB online migrations to complete FFU:
oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations\noc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations\n
Adopting Glance means that an existing OpenStackControlPlane
CR, where Glance is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
When the procedure is over, the expectation is to see the GlanceAPI
service up and running: the Keystone endpoints
should be updated and the same backend of the source Cloud will be available. If the conditions above are met, the adoption is considered concluded.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;SNO
/ CodeReadyContainers
is running on the other side;Ceph
cluster is reachable by both crc
and TripleO
As already done for Keystone, the Glance Adoption follows the same pattern.
"},{"location":"openstack/glance_adoption/#using-local-storage-backend","title":"Using local storage backend","text":"When Glance should be deployed with local storage backend (not Ceph), patch OpenStackControlPlane to deploy Glance:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n glance:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n storageClass: \"local-storage\"\n storageRequest: 10G\n glanceAPI:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n networkAttachments:\n - storage\n'\n
"},{"location":"openstack/glance_adoption/#using-ceph-storage-backend","title":"Using Ceph storage backend","text":"If a Ceph backend is used, the customServiceConfig
parameter should be used to inject the right configuration to the GlanceAPI
instance.
Make sure the Ceph-related secret (ceph-conf-files
) was created in the openstack
namespace and that the extraMounts
property of the OpenStackControlPlane
CR has been configured properly. These tasks are described in an earlier Adoption step Ceph storage backend configuration.
cat << EOF > glance_patch.yaml\nspec:\n glance:\n enabled: true\n template:\n databaseInstance: openstack\n customServiceConfig: |\n [DEFAULT]\n enabled_backends=default_backend:rbd\n [glance_store]\n default_backend=default_backend\n [default_backend]\n rbd_store_ceph_conf=/etc/ceph/ceph.conf\n rbd_store_user=openstack\n rbd_store_pool=images\n store_description=Ceph glance store backend.\n storageClass: \"local-storage\"\n storageRequest: 10G\n glanceAPI:\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n networkAttachments:\n - storage\nEOF\n
If you have previously backup your Openstack services configuration file from the old environment: pull openstack configuration os-diff you can use os-diff to compare and make sure the configuration is correct.
pushd os-diff\n./os-diff cdiff --service glance -c /tmp/collect_tripleo_configs/glance/etc/glance/glance-api.conf -o glance_patch.yaml\n
This will producre the difference between both ini configuration files.
Patch OpenStackControlPlane to deploy Glance with Ceph backend:
oc patch openstackcontrolplane openstack --type=merge --patch-file glance_patch.yaml\n
"},{"location":"openstack/glance_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/glance_adoption/#test-the-glance-service-from-the-openstack-cli","title":"Test the glance service from the OpenStack CLI","text":"You can compare and make sure the configuration has been correctly applied to the glance pods by running
./os-diff cdiff --service glance -c /etc/glance/glance.conf.d/02-config.conf -o glance_patch.yaml --frompod -p glance-api\n
If no line appear, then the configuration is correctly done.
Inspect the resulting glance pods:
GLANCE_POD=`oc get pod |grep glance-external-api | cut -f 1 -d' '`\noc exec -t $GLANCE_POD -c glance-api -- cat /etc/glance/glance.conf.d/02-config.conf\n\n[DEFAULT]\nenabled_backends=default_backend:rbd\n[glance_store]\ndefault_backend=default_backend\n[default_backend]\nrbd_store_ceph_conf=/etc/ceph/ceph.conf\nrbd_store_user=openstack\nrbd_store_pool=images\nstore_description=Ceph glance store backend.\n\noc exec -t $GLANCE_POD -c glance-api -- ls /etc/ceph\nceph.client.openstack.keyring\nceph.conf\n
Ceph secrets are properly mounted, at this point let's move to the OpenStack CLI and check the service is active and the endpoints are properly updated.
(openstack)$ service list | grep image\n\n| fc52dbffef36434d906eeb99adfc6186 | glance | image |\n\n(openstack)$ endpoint list | grep image\n\n| 569ed81064f84d4a91e0d2d807e4c1f1 | regionOne | glance | image | True | internal | http://glance-internal-openstack.apps-crc.testing |\n| 5843fae70cba4e73b29d4aff3e8b616c | regionOne | glance | image | True | public | http://glance-public-openstack.apps-crc.testing |\n| 709859219bc24ab9ac548eab74ad4dd5 | regionOne | glance | image | True | admin | http://glance-admin-openstack.apps-crc.testing |\n
Check the images that we previously listed in the source Cloud are available in the adopted service:
(openstack)$ image list\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |\n+--------------------------------------+--------+--------+\n
"},{"location":"openstack/glance_adoption/#image-upload","title":"Image upload","text":"We can test that an image can be created on from the adopted service.
(openstack)$ alias openstack=\"oc exec -t openstackclient -- openstack\"\n(openstack)$ curl -L -o /tmp/cirros-0.5.2-x86_64-disk.img http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img\n qemu-img convert -O raw /tmp/cirros-0.5.2-x86_64-disk.img /tmp/cirros-0.5.2-x86_64-disk.img.raw\n openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img.raw cirros2\n openstack image list\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 273 100 273 0 0 1525 0 --:--:-- --:--:-- --:--:-- 1533\n 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n100 15.5M 100 15.5M 0 0 17.4M 0 --:--:-- --:--:-- --:--:-- 17.4M\n\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n| Field | Value |\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n| container_format | bare |\n| created_at | 2023-01-31T21:12:56Z |\n| disk_format | raw |\n| file | /v2/images/46a3eac1-7224-40bc-9083-f2f0cd122ba4/file |\n| id | 46a3eac1-7224-40bc-9083-f2f0cd122ba4 |\n| min_disk | 0 |\n| min_ram | 0 |\n| name | cirros |\n| owner | 9f7e8fdc50f34b658cfaee9c48e5e12d |\n| properties | os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros', owner_specified.openstack.sha256='' |\n| protected | False |\n| schema | /v2/schemas/image |\n| status | queued |\n| tags | |\n| updated_at | 2023-01-31T21:12:56Z |\n| visibility | shared |\n+------------------+--------------------------------------------------------------------------------------------------------------------------------------------+\n\n+--------------------------------------+--------+--------+\n| ID | Name | Status |\n+--------------------------------------+--------+--------+\n| 46a3eac1-7224-40bc-9083-f2f0cd122ba4 | cirros2| active |\n| c3158cad-d50b-452f-bec1-f250562f5c1f | cirros | active |\n+--------------------------------------+--------+--------+\n\n\n(openstack)$ oc rsh ceph\nsh-4.4$ ceph -s\nr cluster:\n id: 432d9a34-9cee-4109-b705-0c59e8973983\n health: HEALTH_OK\n\n services:\n mon: 1 daemons, quorum a (age 4h)\n mgr: a(active, since 4h)\n osd: 1 osds: 1 up (since 4h), 1 in (since 4h)\n\n data:\n pools: 5 pools, 160 pgs\n objects: 46 objects, 224 MiB\n usage: 247 MiB used, 6.8 GiB / 7.0 GiB avail\n pgs: 160 active+clean\n\nsh-4.4$ rbd -p images ls\n46a3eac1-7224-40bc-9083-f2f0cd122ba4\nc3158cad-d50b-452f-bec1-f250562f5c1f\n
"},{"location":"openstack/heat_adoption/","title":"Heat adoption","text":"Adopting Heat means that an existing OpenStackControlPlane
CR, where Heat is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
After the adoption process has been completed, a user can expect that they will then have CR's for Heat
, HeatAPI
, HeatEngine
and HeatCFNAPI
. Additionally, a user should have endpoints created within Keystone to facilitate the above mentioned servies.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;As already done for Keystone, the Heat Adoption follows a similar pattern.
Patch the osp-secret
to update the HeatAuthEncryptionKey
and HeatPassword
. This needs to match what you have configured in the existing TripleO Heat configuration.
You can retrieve and verify the existing auth_encryption_key
and service
passwords via:
[stack@rhosp17 ~]$ grep -E 'HeatPassword|HeatAuth' ~/overcloud-deploy/overcloud/overcloud-passwords.yaml\n HeatAuthEncryptionKey: Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2\n HeatPassword: dU2N0Vr2bdelYH7eQonAwPfI3\n
And verifying on one of the Controllers that this is indeed the value in use:
[stack@rhosp17 ~]$ ansible -i overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml overcloud-controller-0 -m shell -a \"grep auth_encryption_key /var/lib/config-data/puppet-generated/heat/etc/heat/heat.conf | grep -Ev '^#|^$'\" -b\novercloud-controller-0 | CHANGED | rc=0 >>\nauth_encryption_key=Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2\n
This password needs to be base64 encoded and added to the osp-secret
\u276f echo Q60Hj8PqbrDNu2dDCbyIQE2dibpQUPg2 | base64\nUTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK\n\n\u276f oc patch secret osp-secret --type='json' -p='[{\"op\" : \"replace\" ,\"path\" : \"/data/HeatAuthEncryptionKey\" ,\"value\" : \"UTYwSGo4UHFickROdTJkRENieUlRRTJkaWJwUVVQZzIK\"}]'\nsecret/osp-secret patched\n
Patch OpenStackControlPlane to deploy Heat:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n heat:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n memcachedInstance: memcached\n passwordSelectors:\n authEncryptionKey: HeatAuthEncryptionKey\n database: HeatDatabasePassword\n service: HeatPassword\n'\n
"},{"location":"openstack/heat_adoption/#post-checks","title":"Post-checks","text":"Ensure all of the CR's reach the \"Setup Complete\" state:
\u276f oc get Heat,HeatAPI,HeatEngine,HeatCFNAPI\nNAME STATUS MESSAGE\nheat.heat.openstack.org/heat True Setup complete\n\nNAME STATUS MESSAGE\nheatapi.heat.openstack.org/heat-api True Setup complete\n\nNAME STATUS MESSAGE\nheatengine.heat.openstack.org/heat-engine True Setup complete\n\nNAME STATUS MESSAGE\nheatcfnapi.heat.openstack.org/heat-cfnapi True Setup complete\n
"},{"location":"openstack/heat_adoption/#check-that-heat-service-is-registered-in-keystone","title":"Check that Heat service is registered in Keystone","text":" oc exec -it openstackclient -- openstack service list -c Name -c Type\n+------------+----------------+\n| Name | Type |\n+------------+----------------+\n| heat | orchestration |\n| glance | image |\n| heat-cfn | cloudformation |\n| ceilometer | Ceilometer |\n| keystone | identity |\n| placement | placement |\n| cinderv3 | volumev3 |\n| nova | compute |\n| neutron | network |\n+------------+----------------+\n
\u276f oc exec -it openstackclient -- openstack endpoint list --service=heat -f yaml\n- Enabled: true\n ID: 1da7df5b25b94d1cae85e3ad736b25a5\n Interface: public\n Region: regionOne\n Service Name: heat\n Service Type: orchestration\n URL: http://heat-api-public-openstack-operators.apps.okd.bne-shift.net/v1/%(tenant_id)s\n- Enabled: true\n ID: 414dd03d8e9d462988113ea0e3a330b0\n Interface: internal\n Region: regionOne\n Service Name: heat\n Service Type: orchestration\n URL: http://heat-api-internal.openstack-operators.svc:8004/v1/%(tenant_id)s\n
"},{"location":"openstack/heat_adoption/#check-heat-engine-services-are-up","title":"Check Heat engine services are up","text":" oc exec -it openstackclient -- openstack orchestration service list -f yaml\n- Binary: heat-engine\n Engine ID: b16ad899-815a-4b0c-9f2e-e6d9c74aa200\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:01.000000'\n- Binary: heat-engine\n Engine ID: 887ed392-0799-4310-b95c-ac2d3e6f965f\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:00.000000'\n- Binary: heat-engine\n Engine ID: 26ed9668-b3f2-48aa-92e8-2862252485ea\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:00.000000'\n- Binary: heat-engine\n Engine ID: 1011943b-9fea-4f53-b543-d841297245fd\n Host: heat-engine-6d47856868-p7pzz\n Hostname: heat-engine-6d47856868-p7pzz\n Status: up\n Topic: engine\n Updated At: '2023-10-11T21:48:01.000000'\n
"},{"location":"openstack/heat_adoption/#verify-you-can-now-see-your-heat-stacks-again","title":"Verify you can now see your Heat stacks again","text":"We can now test that user can create networks, subnets, ports, routers etc.
\u276f openstack stack list -f yaml\n- Creation Time: '2023-10-11T22:03:20Z'\n ID: 20f95925-7443-49cb-9561-a1ab736749ba\n Project: 4eacd0d1cab04427bc315805c28e66c9\n Stack Name: test-networks\n Stack Status: CREATE_COMPLETE\n Updated Time: null\n
"},{"location":"openstack/horizon_adoption/","title":"Horizon adoption","text":""},{"location":"openstack/horizon_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/horizon_adoption/#procedure-horizon-adoption","title":"Procedure - Horizon adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n horizon:\n enabled: true\n apiOverride:\n route: {}\n template:\n memcachedInstance: memcached\n secret: osp-secret\n'\n
"},{"location":"openstack/horizon_adoption/#post-checks","title":"Post-checks","text":"oc get horizon\n
200
PUBLIC_URL=$(oc get horizon horizon -o jsonpath='{.status.endpoint}')\ncurl --silent --output /dev/stderr --head --write-out \"%{http_code}\" \"$PUBLIC_URL/dashboard/auth/login/?next=/dashboard/\" | grep 200\n
"},{"location":"openstack/ironic_adoption/","title":"Ironic adoption","text":""},{"location":"openstack/ironic_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/ironic_adoption/#pre-checks","title":"Pre-checks","text":"TODO
"},{"location":"openstack/ironic_adoption/#procedure-ironic-adoption","title":"Procedure - Ironic adoption","text":"TODO
"},{"location":"openstack/ironic_adoption/#post-checks","title":"Post-checks","text":"TODO
"},{"location":"openstack/keystone_adoption/","title":"Keystone adoption","text":""},{"location":"openstack/keystone_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/keystone_adoption/#pre-checks","title":"Pre-checks","text":""},{"location":"openstack/keystone_adoption/#procedure-keystone-adoption","title":"Procedure - Keystone adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n keystone:\n enabled: true\n apiOverride:\n route: {}\n template:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n databaseInstance: openstack\n secret: osp-secret\n'\n
openstack
command in the adopted deployment:alias openstack=\"oc exec -t openstackclient -- openstack\"\n
openstack endpoint list | grep keystone | awk '/admin/{ print $2; }' | xargs ${BASH_ALIASES[openstack]} endpoint delete || true\n\nfor service in cinderv3 glance manila manilav2 neutron nova placement swift; do\n openstack service list | awk \"/ $service /{ print \\$2; }\" | xargs ${BASH_ALIASES[openstack]} service delete || true\ndone\n
"},{"location":"openstack/keystone_adoption/#post-checks","title":"Post-checks","text":"openstack endpoint list | grep keystone\n
"},{"location":"openstack/manila_adoption/","title":"Manila adoption","text":"OpenStack Manila is the Shared File Systems service. It provides OpenStack users with a self-service API to create and manage file shares. File shares (or simply, \"shares\"), are built for concurrent read/write access by any number of clients. This, coupled with the inherent elasticity of the underlying storage makes the Shared File Systems service essential in cloud environments with require RWX (\"read write many\") persistent storage.
"},{"location":"openstack/manila_adoption/#networking","title":"Networking","text":"File shares in OpenStack are accessed directly over a network. Hence, it is essential to plan the networking of the cloud to create a successful and sustainable orchestration layer for shared file systems.
Manila supports two levels of storage networking abstractions - one where users can directly control the networking for their respective file shares; and another where the storage networking is configured by the OpenStack administrator. It is important to ensure that the networking in the Red Hat OpenStack Platform 17.1 matches the network plans for your new cloud after adoption. This ensures that tenant workloads remain connected to storage through the adoption process, even as the control plane suffers a minor interruption. Manila's control plane services are not in the data path; and shutting down the API, scheduler and share manager services will not impact access to existing shared file systems.
Typically, storage and storage device management networks are separate. Manila services only need access to the storage device management network. For example, if a Ceph cluster was used in the deployment, the \"storage\" network refers to the Ceph cluster's public network, and Manila's share manager service needs to be able to reach it.
"},{"location":"openstack/manila_adoption/#prerequisites","title":"Prerequisites","text":"manila-share
service will be deployed can reach the management network that the storage system is in.driver_handles_share_servers=True
), ensure that neutron has been deployed prior to adopting manila services.Define the CONTROLLER1_SSH
environment variable, if it hasn't been defined already. Then copy the configuration file from RHOSP 17.1 for reference.
$CONTROLLER1_SSH cat /var/lib/config-data/puppet-generated/manila/etc/manila/manila.conf | awk '!/^ *#/ && NF' > ~/manila.conf\n
Review this configuration, alongside any configuration changes that were noted since RHOSP 17.1. Not all of it makes sense to bring into the new cloud environment:
[database]
), service authentication (auth_strategy
, [keystone_authtoken]
), message bus configuration (transport_url
, control_exchange
), the default paste config (api_paste_config
) and inter-service communication configuration ([neutron]
, [nova]
, [cinder]
, [glance]
[oslo_messaging_*]
). So all of these can be ignored.osapi_share_listen
configuration. In RHOSP 18, we rely on OpenShift's routes and ingress.ConfigMap
. The following sample spec illustrates how a ConfigMap
called manila-policy
can be set up with the contents of a file called policy.yaml
. spec:\n manila:\n enabled: true\n template:\n manilaAPI:\n customServiceConfig: |\n [oslo_policy]\n policy_file=/etc/manila/policy.yaml\n extraMounts:\n - extraVol:\n - extraVolType: Undefined\n mounts:\n - mountPath: /etc/manila/\n name: policy\n readOnly: true\n propagation:\n - ManilaAPI\n volumes:\n - name: policy\n projected:\n sources:\n - configMap:\n name: manila-policy\n items:\n - key: policy\n path: policy.yaml\n
- The Manila API service needs the enabled_share_protocols
option to be added in the customServiceConfig
section in manila: template: manilaAPI
. - If you had scheduler overrides, add them to the customServiceConfig
section in manila: template: manilaScheduler
. - If you had multiple storage backend drivers configured with RHOSP 17.1, you will need to split them up when deploying RHOSP 18. Each storage backend driver needs to use its own instance of the manila-share
service. - If a storage backend driver needs a custom container image, find it on the RHOSP Ecosystem Catalog and set manila: template: manilaShares: <custom name> : containerImage
value. The following example illustrates multiple storage backend drivers, using custom container images. spec:\n manila:\n enabled: true\n template:\n manilaAPI:\n customServiceConfig: |\n [DEFAULT]\n enabled_share_protocols = nfs\n replicas: 3\n manilaScheduler:\n replicas: 3\n manilaShares:\n netapp:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends = netapp\n [netapp]\n driver_handles_share_servers = False\n share_backend_name = netapp\n share_driver = manila.share.drivers.netapp.common.NetAppDriver\n netapp_storage_family = ontap_cluster\n netapp_transport_type = http\n replicas: 1\n pure:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends=pure-1\n [pure-1]\n driver_handles_share_servers = False\n share_backend_name = pure-1\n share_driver = manila.share.drivers.purestorage.flashblade.FlashBladeShareDriver\n flashblade_mgmt_vip = 203.0.113.15\n flashblade_data_vip = 203.0.10.14\n containerImage: registry.connect.redhat.com/purestorage/openstack-manila-share-pure-rhosp-18-0\n replicas: 1\n
customServiceConfigSecrets
key. An example:cat << __EOF__ > ~/netapp_secrets.conf\n\n[netapp]\nnetapp_server_hostname = 203.0.113.10\nnetapp_login = fancy_netapp_user\nnetapp_password = secret_netapp_password\nnetapp_vserver = mydatavserver\n__EOF__\n\noc create secret generic osp-secret-manila-netapp --from-file=~/netapp_secrets.conf -n openstack\n
customConfigSecrets
can be used in any service, the following is a config example using the secret we created as above. spec:\n manila:\n enabled: true\n template:\n < . . . >\n manilaShares:\n netapp:\n customServiceConfig: |\n [DEFAULT]\n debug = true\n enabled_share_backends = netapp\n [netapp]\n driver_handles_share_servers = False\n share_backend_name = netapp\n share_driver = manila.share.drivers.netapp.common.NetAppDriver\n netapp_storage_family = ontap_cluster\n netapp_transport_type = http\n customServiceConfigSecrets:\n - osp-secret-manila-netapp\n replicas: 1\n < . . . >\n
extraMounts
. For example, when using ceph, you'd need Manila's ceph user's keyring file as well as the ceph.conf
configuration file available. These are mounted via extraMounts
as done with the example below.share_backend_name
) remain as they did on RHOSP 17.1.manilaAPI
service and the manilaScheduler
service to 3. You should ensure to set the replica count of the manilaShares
service/s to 1. manilaShares
section. The example below connects the manilaShares
instance with the CephFS backend driver to the storage
network. Patch OpenStackControlPlane to deploy Manila; here's an example that uses Native CephFS:
cat << __EOF__ > ~/manila.patch\nspec:\n manila:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n manilaAPI:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n template:\n manilaAPI:\n replicas: 3\n customServiceConfig: |\n [DEFAULT]\n enabled_share_protocols = cephfs\n manilaScheduler:\n replicas: 3\n manilaShares:\n cephfs:\n replicas: 1\n customServiceConfig: |\n [DEFAULT]\n enabled_share_backends = tripleo_ceph\n [tripleo_ceph]\n driver_handles_share_servers=False\n share_backend_name=tripleo_ceph\n share_driver=manila.share.drivers.cephfs.driver.CephFSDriver\n cephfs_conf_path=/etc/ceph/ceph.conf\n cephfs_auth_id=openstack\n cephfs_cluster_name=ceph\n cephfs_volume_mode=0755\n cephfs_protocol_helper_type=CEPHFS\n networkAttachments:\n - storage\n__EOF__\n
oc patch openstackcontrolplane openstack --type=merge --patch-file=~/manila.patch\n
"},{"location":"openstack/manila_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/manila_adoption/#inspect-the-resulting-manila-service-pods","title":"Inspect the resulting manila service pods","text":"oc get pods -l service=manila \n
"},{"location":"openstack/manila_adoption/#check-that-manila-api-service-is-registered-in-keystone","title":"Check that Manila API service is registered in Keystone","text":"openstack service list | grep manila\n
openstack endpoint list | grep manila\n\n| 1164c70045d34b959e889846f9959c0e | regionOne | manila | share | True | internal | http://manila-internal.openstack.svc:8786/v1/%(project_id)s |\n| 63e89296522d4b28a9af56586641590c | regionOne | manilav2 | sharev2 | True | public | https://manila-public-openstack.apps-crc.testing/v2 |\n| af36c57adcdf4d50b10f484b616764cc | regionOne | manila | share | True | public | https://manila-public-openstack.apps-crc.testing/v1/%(project_id)s |\n| d655b4390d7544a29ce4ea356cc2b547 | regionOne | manilav2 | sharev2 | True | internal | http://manila-internal.openstack.svc:8786/v2 |\n
"},{"location":"openstack/manila_adoption/#verify-resources","title":"Verify resources","text":"We can now test the health of the service
openstack share service list\nopenstack share pool list --detail\n
We can check on existing workloads
openstack share list\nopenstack share snapshot list\n
We can create further resources
openstack share create cephfs 10 --snapshot mysharesnap --name myshareclone\n
"},{"location":"openstack/mariadb_copy/","title":"MariaDB data copy","text":"This document describes how to move the databases from the original OpenStack deployment to the MariaDB instances in the OpenShift cluster.
NOTE This example scenario describes a simple single-cell setup. Real multi-stack topology recommended for production use results in different cells DBs layout, and should be using different naming schemes (not covered here this time).
"},{"location":"openstack/mariadb_copy/#prerequisites","title":"Prerequisites","text":"Make sure the previous Adoption steps have been performed successfully.
The OpenStackControlPlane resource must be already created at this point.
Podified MariaDB and RabbitMQ are running. No other podified control plane services are running.
OpenStack services have been stopped
There must be network routability between:
The adoption host and the original MariaDB.
The adoption host and the podified MariaDB.
Note that this routability requirement may change in the future, e.g. we may require routability from the original MariaDB to podified MariaDB.
Podman package is installed
CONTROLLER1_SSH
, CONTROLLER2_SSH
, and CONTROLLER3_SSH
are configured.
Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
MARIADB_IMAGE=quay.io/podified-antelope-centos9/openstack-mariadb:current-podified\n\nPODIFIED_MARIADB_IP=$(oc get svc --selector \"cr=mariadb-openstack\" -ojsonpath='{.items[0].spec.clusterIP}')\nPODIFIED_CELL1_MARIADB_IP=$(oc get svc --selector \"cr=mariadb-openstack-cell1\" -ojsonpath='{.items[0].spec.clusterIP}')\nPODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d)\n\n# Replace with your environment's MariaDB IP:\nSOURCE_MARIADB_IP=192.168.122.100\nSOURCE_DB_ROOT_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' MysqlRootPassword:' | awk -F ': ' '{ print $2; }')\n\n# The CHARACTER_SET and collation should match the source DB\n# if the do not then it will break foreign key relationships\n# for any tables that are created in the future as part of db sync\nCHARACTER_SET=utf8\nCOLLATION=utf8_general_ci\n
"},{"location":"openstack/mariadb_copy/#pre-checks","title":"Pre-checks","text":"podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \\\n mysql -h \"$SOURCE_MARIADB_IP\" -uroot \"-p$SOURCE_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\n
podman run -i --rm --userns=keep-id -u $UID $MARIADB_IMAGE \\\n mysqlcheck --all-databases -h $SOURCE_MARIADB_IP -u root \"-p$SOURCE_DB_ROOT_PASSWORD\" | grep -v OK\n
oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\n mysql -h \"$PODIFIED_MARIADB_IP\" -uroot \"-p$PODIFIED_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\n mysql -h \"$PODIFIED_CELL1_MARIADB_IP\" -uroot \"-p$PODIFIED_DB_ROOT_PASSWORD\" -e 'SHOW databases;'\n
"},{"location":"openstack/mariadb_copy/#procedure-data-copy","title":"Procedure - data copy","text":"NOTE: We'll need to transition Nova services imported later on into a superconductor architecture. For that, delete the old service records in cells DBs, starting from the cell1. New records will be registered with different hostnames provided by the Nova service operator. All Nova services, except the compute agent, have no internal state, and its service records can be safely deleted. Also we need to rename the former default
cell as cell1
.
mkdir ~/adoption-db\ncd ~/adoption-db\n
podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $MARIADB_IMAGE bash <<EOF\n\n# Note we do not want to dump the information and performance schema tables so we filter them\nmysql -h ${SOURCE_MARIADB_IP} -u root \"-p${SOURCE_DB_ROOT_PASSWORD}\" -N -e 'show databases' | grep -E -v 'schema|mysql' | while read dbname; do\n echo \"Dumping \\${dbname}\"\n mysqldump -h $SOURCE_MARIADB_IP -uroot \"-p$SOURCE_DB_ROOT_PASSWORD\" \\\n --single-transaction --complete-insert --skip-lock-tables --lock-tables=0 \\\n \"\\${dbname}\" > \"\\${dbname}\".sql\ndone\n\nEOF\n
# db schemas to rename on import\ndeclare -A db_name_map\ndb_name_map[\"nova\"]=\"nova_cell1\"\ndb_name_map[\"ovs_neutron\"]=\"neutron\"\n\n# db servers to import into\ndeclare -A db_server_map\ndb_server_map[\"default\"]=${PODIFIED_MARIADB_IP}\ndb_server_map[\"nova_cell1\"]=${PODIFIED_CELL1_MARIADB_IP}\n\n# db server root password map\ndeclare -A db_server_password_map\ndb_server_password_map[\"default\"]=${PODIFIED_DB_ROOT_PASSWORD}\ndb_server_password_map[\"nova_cell1\"]=${PODIFIED_DB_ROOT_PASSWORD}\n\nall_db_files=$(ls *.sql)\nfor db_file in ${all_db_files}; do\n db_name=$(echo ${db_file} | awk -F'.' '{ print $1; }')\n if [[ -v \"db_name_map[${db_name}]\" ]]; then\n echo \"renaming ${db_name} to ${db_name_map[${db_name}]}\"\n db_name=${db_name_map[${db_name}]}\n fi\n db_server=${db_server_map[\"default\"]}\n if [[ -v \"db_server_map[${db_name}]\" ]]; then\n db_server=${db_server_map[${db_name}]}\n fi\n db_password=${db_server_password_map[\"default\"]}\n if [[ -v \"db_server_password_map[${db_name}]\" ]]; then\n db_password=${db_server_password_map[${db_name}]}\n fi\n echo \"creating ${db_name} in ${db_server}\"\n container_name=$(echo \"mariadb-client-${db_name}-create\" | sed 's/_/-/g')\n oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \\\n mysql -h \"${db_server}\" -uroot \"-p${db_password}\" << EOF\nCREATE DATABASE IF NOT EXISTS ${db_name} DEFAULT CHARACTER SET ${CHARACTER_SET} DEFAULT COLLATE ${COLLATION};\nEOF\n echo \"importing ${db_name} into ${db_server}\"\n container_name=$(echo \"mariadb-client-${db_name}-restore\" | sed 's/_/-/g')\n oc run ${container_name} --image ${MARIADB_IMAGE} -i --rm --restart=Never -- \\\n mysql -h \"${db_server}\" -uroot \"-p${db_password}\" \"${db_name}\" < \"${db_file}\"\ndone\noc exec -it mariadb-openstack -- mysql --user=root --password=${db_server_password_map[\"default\"]} -e \\\n \"update nova_api.cell_mappings set name='cell1' where name='default';\"\noc exec -it mariadb-openstack-cell1 -- mysql --user=root --password=${db_server_password_map[\"default\"]} -e \\\n \"delete from nova_cell1.services where host not like '%nova-cell1-%' and services.binary != 'nova-compute';\"\n
"},{"location":"openstack/mariadb_copy/#post-checks","title":"Post-checks","text":"oc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep keystone\n# ensure neutron db is renamed from ovs_neutron\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep neutron\n# ensure nova cell1 db is extracted to a separate db server and renamed from nova to nova_cell1\noc run mariadb-client --image $MARIADB_IMAGE -i --rm --restart=Never -- \\\nmysql -h \"${PODIFIED_CELL1_MARIADB_IP}\" -uroot \"-p${PODIFIED_DB_ROOT_PASSWORD}\" -e 'SHOW databases;' \\\n | grep nova_cell1\n
mariadb-client
might have returned a pod security warning related to the restricted:latest
security context constraint. This is due to default security context constraints and will not prevent pod creation by the admission controller. You'll see a warning for the short-lived pod but it will not interfere with functionality. For more info visit hereAdopting Neutron means that an existing OpenStackControlPlane
CR, where Neutron is supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.
When the procedure is over, the expectation is to see the NeutronAPI
service up and running: the Keystone endpoints
should be updated and the same backend of the source Cloud will be available. If the conditions above are met, the adoption is considered concluded.
This guide also assumes that:
TripleO
environment (the source Cloud) is running on one side;SNO
/ CodeReadyContainers
is running on the other side.As already done for Keystone, the Neutron Adoption follows the same pattern.
Patch OpenStackControlPlane to deploy Neutron:
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n neutron:\n enabled: true\n apiOverride:\n route: {}\n template:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n databaseInstance: openstack\n secret: osp-secret\n networkAttachments:\n - internalapi\n'\n
"},{"location":"openstack/neutron_adoption/#post-checks","title":"Post-checks","text":""},{"location":"openstack/neutron_adoption/#inspect-the-resulting-neutron-pods","title":"Inspect the resulting neutron pods","text":"NEUTRON_API_POD=`oc get pods -l service=neutron | tail -n 1 | cut -f 1 -d' '`\noc exec -t $NEUTRON_API_POD -c neutron-api -- cat /etc/neutron/neutron.conf\n
"},{"location":"openstack/neutron_adoption/#check-that-neutron-api-service-is-registered-in-keystone","title":"Check that Neutron API service is registered in Keystone","text":"openstack service list | grep network\n
openstack endpoint list | grep network\n\n| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | neutron | network | True | public | http://neutron-public-openstack.apps-crc.testing |\n| b943243e596847a9a317c8ce1800fa98 | regionOne | neutron | network | True | internal | http://neutron-internal.openstack.svc:9696 |\n| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | neutron | network | True | admin | http://192.168.122.99:9696 |\n
"},{"location":"openstack/neutron_adoption/#create-sample-resources","title":"Create sample resources","text":"We can now test that user can create networks, subnets, ports, routers etc.
openstack network create net\nopenstack subnet create --network net --subnet-range 10.0.0.0/24 subnet\nopenstack router create router\n
NOTE: this page should be expanded to include information on SR-IOV adoption.
"},{"location":"openstack/node-selector/","title":"Node Selector","text":"There are a variety of reasons why we may want to restrict the nodes where OpenStack services can be placed:
The mechanism provided by the OpenStack operators to achieve this is through the use of labels.
We would either label the OpenShift nodes or use existing labels they already have, and then use those labels in the OpenStack manifests in the nodeSelector
field.
The nodeSelector
field in the OpenStack manifests follows the standard OpenShift nodeSelector
field, please refer to the OpenShift documentation on the matter additional information.
This field is present at all the different levels of the OpenStack manifests:
OpenStackControlPlane
object.cinder
element in the OpenStackControlPlane
.cinderVolume
element within the cinder
element in the OpenStackControlPlane
.This allows a fine grained control of the placement of the OpenStack services with minimal repetition.
Values of the nodeSelector
are propagated to the next levels unless they are overwritten. This means that a nodeSelector
value at the deployment level will affect all the OpenStack services.
For example we can add label type: openstack
to any 3 OpenShift nodes:
$ oc label nodes worker0 type=openstack\n$ oc label nodes worker1 type=openstack\n$ oc label nodes worker2 type=openstack\n
And then in our OpenStackControlPlane
we can use the label to place all the services in those 3 nodes:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n nodeSelector:\n type: openstack\n< . . . >\n
What if we don't mind where any OpenStack services go but the cinder volume and backup services because we are using FC and we only have HBAs on a subset of nodes? Then we can just use the selector on for those specific services, which for the sake of this example we'll assume they have the label fc_card: true
:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n secret: osp-secret\n storageClass: local-storage\n cinder:\n template:\n cinderVolumes:\n pure_fc:\n nodeSelector:\n fc_card: true\n< . . . >\n lvm-iscsi:\n nodeSelector:\n fc_card: true\n< . . . >\n cinderBackup:\n nodeSelector:\n fc_card: true\n< . . . >\n
The Cinder operator does not currently have the possibility of defining the nodeSelector
in cinderVolumes
, so we need to specify it on each of the backends.
It's possible to leverage labels added by the node feature discovery operator to place OpenStack services.
"},{"location":"openstack/node-selector/#machineconfig","title":"MachineConfig","text":"Some services require us to have services or kernel modules running on the hosts where they run, for example iscsid
or multipathd
daemons, or the nvme-fabrics
kernel module.
For those cases we'll use MachineConfig
manifests, and if we are restricting the nodes we are placing the OpenStack services using the nodeSelector
then we'll also want to limit where the MachineConfig
is applied.
To define where the MachineConfig
can be applied we'll need to use a MachineConfigPool
that links the MachineConfig
to the nodes.
For example to be able to limit MachineConfig
to the 3 OpenShift nodes we marked with the type: openstack
label we would create the MachineConfigPool
like this:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfigPool\nmetadata:\n name: openstack\nspec:\n machineConfigSelector:\n matchLabels:\n machineconfiguration.openshift.io/role: openstack\n nodeSelector:\n matchLabels:\n type: openstack\n
And then we could use it in the MachineConfig
:
apiVersion: machineconfiguration.openshift.io/v1\nkind: MachineConfig\nmetadata:\n labels:\n machineconfiguration.openshift.io/role: openstack\n< . . . >\n
Refer to the OpenShift documentation for additional information on MachineConfig
and MachineConfigPools
WARNING: Applying a MachineConfig
to an OpenShift node will make the node reboot.
NOTE This example scenario describes a simple single-cell setup. Real multi-stack topology recommended for production use results in different cells DBs layout, and should be using different naming schemes (not covered here this time).
"},{"location":"openstack/nova_adoption/#prerequisites","title":"Prerequisites","text":"Define the shell variables and aliases used in the steps below. The values are just illustrative, use values that are correct for your environment:
alias openstack=\"oc exec -t openstackclient -- openstack\"\n
"},{"location":"openstack/nova_adoption/#procedure-nova-adoption","title":"Procedure - Nova adoption","text":"NOTE: We assume Nova Metadata deployed on the top level and not on each cell level, so this example imports it the same way. If the source deployment has a per cell metadata deployment, adjust the given below patch as needed. Metadata service cannot be run in cell0
.
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch '\nspec:\n nova:\n enabled: true\n apiOverride:\n route: {}\n template:\n secret: osp-secret\n apiServiceTemplate:\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n metadataServiceTemplate:\n enabled: true # deploy single nova metadata on the top level\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n schedulerServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n cellTemplates:\n cell0:\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n cell1:\n metadataServiceTemplate:\n enabled: false # enable here to run it in a cell instead\n override:\n service:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n conductorServiceTemplate:\n customServiceConfig: |\n [workarounds]\n disable_compute_service_check_for_ffu=true\n'\n
oc wait --for condition=Ready --timeout=300s Nova/nova\n
The local Conductor services will be started for each cell, while the superconductor runs in cell0
. Note that disable_compute_service_check_for_ffu
is mandatory for all imported Nova services, until the external dataplane imported, and until Nova Compute services fast-forward upgraded.
openstack endpoint list | grep nova\nopenstack server list\n
"},{"location":"openstack/ovn_adoption/","title":"OVN data migration","text":"This document describes how to move OVN northbound and southbound databases from the original OpenStack deployment to ovsdb-server instances running in the OpenShift cluster.
"},{"location":"openstack/ovn_adoption/#rationale","title":"Rationale","text":"While it may be argued that the podified Neutron ML2/OVN driver and OVN northd service will reconstruct the databases on startup, the reconstruction may be time consuming on large existing clusters. The procedure below allows to speed up data migration and avoid unnecessary data plane disruptions due to incomplete OpenFlow table contents.
"},{"location":"openstack/ovn_adoption/#prerequisites","title":"Prerequisites","text":"Define the shell variables used in the steps below. The values are just illustrative, use values that are correct for your environment:
OVSDB_IMAGE=quay.io/podified-antelope-centos9/openstack-ovn-base:current-podified\nSOURCE_OVSDB_IP=172.17.1.49\n\n# ssh commands to reach the original controller machines\nCONTROLLER_SSH=\"ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone\"\n\n# ssh commands to reach the original compute machines\nCOMPUTE_SSH=\"ssh -F ~/director_standalone/vagrant_ssh_config vagrant@standalone\"\n
The real value of the SOURCE_OVSDB_IP
can be get from the puppet generated configs:
grep -rI 'ovn_[ns]b_conn' /var/lib/config-data/puppet-generated/\n
"},{"location":"openstack/ovn_adoption/#procedure","title":"Procedure","text":"${CONTROLLER_SSH} sudo systemctl stop tripleo_ovn_cluster_northd.service\n
client=\"podman run -i --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE ovsdb-client\"\n${client} backup tcp:$SOURCE_OVSDB_IP:6641 > ovs-nb.db\n${client} backup tcp:$SOURCE_OVSDB_IP:6642 > ovs-sb.db\n
oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n ovn:\n\u00a0 \u00a0 enabled: true\n template:\n ovnDBCluster:\n ovndbcluster-nb:\n dbType: NB\n storageRequest: 10G\n networkAttachment: internalapi\n ovndbcluster-sb:\n dbType: SB\n storageRequest: 10G\n networkAttachment: internalapi\n'\n
PODIFIED_OVSDB_NB_IP=$(kubectl get po ovsdbserver-nb-0 -o jsonpath='{.metadata.annotations.k8s\\.v1\\.cni\\.cncf\\.io/network-status}' | jq 'map(. | select(.name==\"openstack/internalapi\"))[0].ips[0]' | tr -d '\"')\nPODIFIED_OVSDB_SB_IP=$(kubectl get po ovsdbserver-sb-0 -o jsonpath='{.metadata.annotations.k8s\\.v1\\.cni\\.cncf\\.io/network-status}' | jq 'map(. | select(.name==\"openstack/internalapi\"))[0].ips[0]' | tr -d '\"')\n
podman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client get-schema tcp:$PODIFIED_OVSDB_NB_IP:6641 > ./ovs-nb.ovsschema && ovsdb-tool convert ovs-nb.db ./ovs-nb.ovsschema\"\npodman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client get-schema tcp:$PODIFIED_OVSDB_SB_IP:6642 > ./ovs-sb.ovsschema && ovsdb-tool convert ovs-sb.db ./ovs-sb.ovsschema\"\n
podman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client restore tcp:$PODIFIED_OVSDB_NB_IP:6641 < ovs-nb.db\"\npodman run -it --rm --userns=keep-id -u $UID -v $PWD:$PWD:z,rw -w $PWD $OVSDB_IMAGE bash -c \"ovsdb-client restore tcp:$PODIFIED_OVSDB_SB_IP:6642 < ovs-sb.db\"\n
oc exec -it ovsdbserver-nb-0 -- ovn-nbctl show\noc exec -it ovsdbserver-sb-0 -- ovn-sbctl list Chassis\n
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovs-vsctl set open . external_ids:ovn-remote=tcp:$PODIFIED_OVSDB_SB_IP:6642\n
You should now see the following warning in the ovn_controller
container logs:
2023-03-16T21:40:35Z|03095|ovsdb_cs|WARN|tcp:172.17.1.50:6642: clustered database server has stale data; trying another server\n
${COMPUTE_SSH} sudo podman exec -it ovn_controller ovn-appctl -t ovn-controller sb-cluster-state-reset\n
This should complete connection of the controller process to the new remote. See in logs:
2023-03-16T21:42:31Z|03134|main|INFO|Resetting southbound database cluster state\n2023-03-16T21:42:33Z|03135|reconnect|INFO|tcp:172.17.1.50:6642: connected\n
$ ${COMPUTE_SSH} sudo systemctl restart tripleo_ovn_controller.service\n
ovn-northd
service that will keep OVN northbound and southbound databases in sync.oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n ovn:\n\u00a0 \u00a0 enabled: true\n template:\n ovnNorthd:\n networkAttachment: internalapi\n'\n
"},{"location":"openstack/placement_adoption/","title":"Placement adoption","text":""},{"location":"openstack/placement_adoption/#prerequisites","title":"Prerequisites","text":"(There are no shell variables necessary currently.)
"},{"location":"openstack/placement_adoption/#procedure-placement-adoption","title":"Procedure - Placement adoption","text":"oc patch openstackcontrolplane openstack --type=merge --patch '\nspec:\n placement:\n enabled: true\n apiOverride:\n route: {}\n template:\n databaseInstance: openstack\n secret: osp-secret\n override:\n service:\n internal:\n metadata:\n annotations:\n metallb.universe.tf/address-pool: internalapi\n metallb.universe.tf/allow-shared-ip: internalapi\n metallb.universe.tf/loadBalancerIPs: 172.17.0.80\n spec:\n type: LoadBalancer\n'\n
"},{"location":"openstack/placement_adoption/#post-checks","title":"Post-checks","text":"alias openstack=\"oc exec -t openstackclient -- openstack\"\n\nopenstack endpoint list | grep placement\n\n\n# Without OpenStack CLI placement plugin installed:\nPLACEMENT_PUBLIC_URL=$(openstack endpoint list -c 'Service Name' -c 'Service Type' -c URL | grep placement | grep public | awk '{ print $6; }')\noc exec -t openstackclient -- curl \"$PLACEMENT_PUBLIC_URL\"\n\n# With OpenStack CLI placement plugin installed:\nopenstack resource class list\n
"},{"location":"openstack/planning/","title":"Planning the new deployment","text":"Just like you did back when you installed your Director deployed OpenStack, the upgrade/migration to the podified OpenStack requires planning various aspects of the environment such as node roles, planning your network topology, and storage.
In this document we cover some of this planning, but it is recommended to read the whole adoption guide before actually starting the process to be sure that there is a global understanding of the whole process.
"},{"location":"openstack/planning/#configurations","title":"Configurations","text":"There is a fundamental difference between the Director and Operator deployments regarding the configuration of the services.
In Director deployments many of the service configurations are abstracted by Director specific configuration options. A single Director option may trigger changes for multiple services and support for drivers (for example Cinder's) required patches to the Director code base.
In Operator deployments this has changed to what we believe is a simpler approach: reduce the installer specific knowledge and leverage OpenShift and OpenStack service specific knowledge whenever possible.
To this effect OpenStack services will have sensible defaults for OpenShift deployments and human operators will provide configuration snippets to provide necessary configuration, such as cinder backend configuration, or to override the defaults.
This shortens the distance between a service specific configuration file (such as cinder.conf
) and what the human operator provides in the manifests.
These configuration snippets are passed to the operators in the different customServiceConfig
sections available in the manifests, and then they are layered in the services available in the following levels. To illustrate this, if we were to set a configuration at the top Cinder level (spec: cinder: template:
) then it would be applied to all the cinder services; for example to enable debug in all the cinder services we would do:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n template:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n< . . . >\n
If we only wanted to set it for one of the cinder services, for example the scheduler, then we would use the cinderScheduler
section instead:
apiVersion: core.openstack.org/v1beta1\nkind: OpenStackControlPlane\nmetadata:\n name: openstack\nspec:\n cinder:\n template:\n cinderScheduler:\n customServiceConfig: |\n [DEFAULT]\n debug = True\n< . . . >\n
In openshift it is not recommended to store sensitive information like the credentials to the cinder storage array in the CRs, so most OpenStack operators have a mechanism to use OpenShift's Secrets
for sensitive configuration parameters of the services and then use then by reference in the customServiceConfigSecrets
section which is analogous to the customServiceConfig
.
The contents of the Secret
references passed in the customServiceConfigSecrets
will have the same format as customServiceConfig
: a snippet with the section/s and configuration options.
When there are sensitive information in the service configuration then it becomes a matter of personal preference whether to store all the configuration in the Secret
or only the sensitive parts, but remember that if we split the configuration between Secret
and customServiceConfig
we still need the section header (eg: [DEFAULT]
) to be present in both places.
Attention should be paid to each service's adoption process as they may have some particularities regarding their configuration.
"},{"location":"openstack/planning/#configuration-tooling","title":"Configuration tooling","text":"In order to help users to handle the configuration for the TripleO and Openstack services the tool: https://github.com/openstack-k8s-operators/os-diff has been develop to compare the configuration files between the TripleO deployment and the next gen cloud. Make sure Golang is installed and configured on your env:
git clone https://github.com/openstack-k8s-operators/os-diff\npushd os-diff\nmake build\n
Then configure ansible.cfg and ssh-config file according to your environment:
Host *\n IdentitiesOnly yes\n\nHost virthost\n Hostname virthost\n IdentityFile ~/.ssh/id_rsa\n User root\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n\n\nHost standalone\n Hostname standalone\n IdentityFile ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa\n User root\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n\nHost crc\n Hostname crc\n IdentityFile ~/.ssh/id_rsa\n User stack\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n
And test your connection:
ssh -F ssh.config standalone\n
"},{"location":"openstack/planning/#node-roles","title":"Node Roles","text":"In Director deployments we had 4 different standard roles for the nodes: Controller
, Compute
, Ceph Storage
, Swift Storage
, but in podified OpenStack we just make a distinction based on where things are running, in OpenShift or external to it.
When adopting a Director OpenStack your Compute
nodes will directly become external nodes, so there should not be much additional planning needed there.
In many deployments being adopted the Controller
nodes will require some thought because we'll have many OpenShift nodes where the controller services could run, and we have to decide which ones we want to use, how we are going to use them, and make sure those nodes are ready to run the services.
In most deployments running OpenStack services on master
nodes can have a seriously adverse impact on the OpenShift cluster, so we recommend placing OpenStack services on non master
nodes.
By default OpenStack Operators deploy OpenStack services on any worker node, but that is not necessarily what's best for all deployments, and there may be even services that won't even work deployed like that.
When planing a deployment it's good to remember that not all the services on an OpenStack deployments are the same as they have very different requirements.
Looking at the Cinder component we can clearly see different requirements for its services: the cinder-scheduler is a very light service with low memory, disk, network, and CPU usage; cinder-api service has a higher network usage due to resource listing requests; the cinder-volume service will have a high disk and network usage since many of its operations are in the data path (offline volume migration, create volume from image, etc.), and then we have the cinder-backup service which has high memory, network, and CPU (to compress data) requirements.
We also have the Glance and Swift components that are in the data path, and let's not forget RabbitMQ and Galera services.
Given these requirements it may be preferable not to let these services wander all over your OpenShift worker nodes with the possibility of impacting other workloads, or maybe you don't mind the light services wandering around but you want to pin down the heavy ones to a set of infrastructure nodes.
There are also hardware restrictions to take into consideration, because if we are using a Fibre Channel (FC) Cinder backend we'll need the cinder-volume, cinder-backup, and maybe even the glance (if it's using Cinder as a backend) services to run on a OpenShift host that has an HBA.
The OpenStack Operators allow a great deal of flexibility on where to run the OpenStack services, as we can use node labels to define which OpenShift nodes are eligible to run the different OpenStack services. Refer to the Node Selector guide to learn more about using labels to define placement of the OpenStack services.
TODO: Talk about Ceph Storage and Swift Storage nodes, HCI deployments, etc.
"},{"location":"openstack/planning/#network","title":"Network","text":"TODO: Write about isolated networks, NetworkAttachmentDefinition, NetworkAttachmets, etc
"},{"location":"openstack/planning/#storage","title":"Storage","text":"When looking into the storage in an OpenStack deployment we can differentiate 2 different kinds, the storage requirements of the services themselves and the storage used for the OpenStack users that thee services will manage.
These requirements may drive our OpenShift node selection, as mentioned above, and may even require us to do some preparations on the OpenShift nodes before we can deploy the services.
TODO: Galera, RabbitMQ, Swift, Glance, etc.
"},{"location":"openstack/planning/#cinder-requirements","title":"Cinder requirements","text":"The Cinder service has both local storage used by the service and OpenStack user requirements.
Local storage is used for example when downloading a glance image for the create volume from image operation, which can become considerable when having concurrent operations and not using cinder volume cache.
In the Operator deployed OpenStack we now have an easy way to configure the location of the conversion directory to be an NFS share (using the extra volumes feature), something that needed to be done manually before.
Even if it's an adoption and it may seem that there's nothing to consider regarding the Cinder backends, because we'll just be using the same ones we are using in our current deployment, we should still evaluate it, because it may not be so straightforward.
First we need to check the transport protocol the Cinder backends are using: RBD, iSCSI, FC, NFS, NVMe-oF, etc.
Once we know all the transport protocols we are using, we can proceed to make sure we are taking them into consideration when placing the Cinder services (as mentioned above in the Node Roles section) and the right storage transport related binaries are running on the OpenShift nodes.
Detailed information about the specifics for each storage transport protocol can be found in the Cinder Adoption section. Please take a good look at that document before proceeding to be able to plan the adoption better.
"},{"location":"openstack/pull_openstack_configuration/","title":"Pull Openstack configuration","text":"Before starting to adoption workflow, we can start by pulling the configuration from the Openstack services and TripleO on our file system in order to backup the configuration files and then use it for later, during the configuration of the adopted services and for the record to compare and make sure nothing has been missed or misconfigured.
Make sure you have pull the os-diff repository and configure according to your environment: Configure os-diff
"},{"location":"openstack/pull_openstack_configuration/#pull-configuration-from-a-tripleo-deployment","title":"Pull configuration from a TripleO deployment","text":"Once you make sure the ssh connnection is confugred correctly and os-diff has been built, you can start to pull configuration from your Openstack services.
All the services are describes in an Ansible role:
collect_config vars
Once you enabled the services you need (you can enable everything even if a services is not deployed) you can start to pull the Openstack services configuration files:
pushd os-diff\n./os-diff pull --cloud_engine=podman\n
The configuration will be pulled and stored in:
/tmp/collect_tripleo_configs\n
And you provided another path with:
./os-diff pull --cloud_engine=podman -e local_working_dir=$HOME\n
Once the ansible playbook has been run, you should have into your local directory a directory per services
\u25be tmp/\n \u25be collect_tripleo_configs/\n \u25be glance/\n
"},{"location":"openstack/stop_openstack_services/","title":"Stop OpenStack services","text":"Before we can start with the adoption we need to make sure that the OpenStack services have been stopped.
This is an important step to avoid inconsistencies in the data migrated for the data-plane adoption procedure caused by resource changes after the DB has been copied to the new deployment.
Some services are easy to stop because they only perform short asynchronous operations, but other services are a bit more complex to gracefully stop because they perform synchronous or long running operations that we may want to complete instead of aborting them.
Since gracefully stopping all services is non-trivial and beyond the scope of this guide we'll proceed with the force method but present a couple of recommendations on how to check some things in the services.
"},{"location":"openstack/stop_openstack_services/#variables","title":"Variables","text":"Define the shell variables used in the steps below. The values are just illustrative and refer to a single node standalone director deployment, use values that are correct for your environment:
CONTROLLER1_SSH=\"ssh -i ~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa root@192.168.122.100\"\nCONTROLLER2_SSH=\"\"\nCONTROLLER3_SSH=\"\"\n
We chose to use these ssh variables with the ssh commands instead of using ansible to try to create instructions that are independent on where they are running, but ansible commands could be used to achieve the same result if we are in the right host, for example to stop a service:
. stackrc ansible -i $(which tripleo-ansible-inventory) Controller -m shell -a \"sudo systemctl stop tripleo_horizon.service\" -b\n
NOTE Nova computpe services in this guide are running on the same controller hosts. Adjust CONTROLLER${i}_SSH
commands and ServicesToStop
given below to your source environment specific topology.
We can stop OpenStack services at any moment, but we may leave things in an undesired state, so at the very least we should have a look to confirm that there are no long running operations that require other services.
Ensure that there are no ongoing instance live migrations, volume migrations (online or offline), volume creation, backup restore, attaching, detaching, etc.
openstack server list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'\nopenstack volume list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'| grep -vi error\nopenstack volume backup list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|' | grep -vi error\nopenstack share list --all-projects -c ID -c Status |grep -E '\\| .+ing \\|'| grep -vi error\nopenstack image list -c ID -c Status |grep -E '\\| .+ing \\|'\n
"},{"location":"openstack/stop_openstack_services/#stopping-control-plane-services","title":"Stopping control plane services","text":"We can stop OpenStack services at any moment, but we may leave things in an undesired state, so at the very least we should have a look to confirm that there are no ongoing operations.
1- Connect to all the controller nodes. 2- Stop the services. 3- Make sure all the services are stopped. 4- Repeat steps 1-3 for compute hosts (workloads running on dataplane will not be affected)
The cinder-backup service on OSP 17.1 could be running as Active-Passive under pacemaker or as Active-Active, so we'll have to check how it's running and stop it.
These steps can be automated with a simple script that relies on the previously defined environmental variables and function:
# Update the services list to be stopped\nServicesToStop=(\"tripleo_horizon.service\"\n \"tripleo_keystone.service\"\n \"tripleo_cinder_api.service\"\n \"tripleo_cinder_api_cron.service\"\n \"tripleo_cinder_scheduler.service\"\n \"tripleo_cinder_backup.service\"\n \"tripleo_glance_api.service\"\n \"tripleo_manila_api.service\"\n \"tripleo_manila_api_cron.service\"\n \"tripleo_manila_scheduler.service\"\n \"tripleo_neutron_api.service\"\n \"tripleo_nova_api.service\"\n \"tripleo_placement_api.service\"\n \"tripleo_nova_api_cron.service\"\n \"tripleo_nova_api.service\"\n \"tripleo_nova_conductor.service\"\n \"tripleo_nova_metadata.service\"\n \"tripleo_nova_scheduler.service\"\n \"tripleo_nova_vnc_proxy.service\"\n # Compute services on dataplane\n \"tripleo_nova_compute.service\"\n \"tripleo_nova_libvirt.target\"\n \"tripleo_nova_migration_target.service\"\n \"tripleo_nova_virtlogd_wrapper.service\"\n \"tripleo_nova_virtnodedevd.service\"\n \"tripleo_nova_virtproxyd.service\"\n \"tripleo_nova_virtqemud.service\"\n \"tripleo_nova_virtsecretd.service\"\n \"tripleo_nova_virtstoraged.service\")\n\nPacemakerResourcesToStop=(\"openstack-cinder-volume\"\n \"openstack-cinder-backup\"\n \"openstack-manila-share\")\n\necho \"Stopping systemd OpenStack services\"\nfor service in ${ServicesToStop[*]}; do\n for i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Stopping the $service in controller $i\"\n if ${!SSH_CMD} sudo systemctl is-active $service; then\n ${!SSH_CMD} sudo systemctl stop $service\n fi\n fi\n done\ndone\n\necho \"Checking systemd OpenStack services\"\nfor service in ${ServicesToStop[*]}; do\n for i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Checking status of $service in controller $i\"\n if ! ${!SSH_CMD} systemctl show $service | grep ActiveState=inactive >/dev/null; then\n echo \"ERROR: Service $service still running on controller $i\"\n fi\n fi\n done\ndone\n\necho \"Stopping pacemaker OpenStack services\"\nfor i in {1..3}; do\n SSH_CMD=CONTROLLER${i}_SSH\n if [ ! -z \"${!SSH_CMD}\" ]; then\n echo \"Using controller $i to run pacemaker commands\"\n for resource in ${PacemakerResourcesToStop[*]}; do\n if ${!SSH_CMD} sudo pcs resource config $resource; then\n ${!SSH_CMD} sudo pcs resource disable $resource\n fi\n done\n break\n fi\ndone\n
"},{"location":"openstack/troubleshooting/","title":"Troubleshooting","text":"This document contains information about various issues you might face and how to solve them.
"},{"location":"openstack/troubleshooting/#errimagepull-due-to-missing-authentication","title":"ErrImagePull due to missing authentication","text":"The deployed containers pull the images from private containers registries that can potentially return authentication errors like:
Failed to pull image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\":\nrpc error: code = Unknown desc = unable to retrieve auth token: invalid\nusername/password: unauthorized: Please login to the Red Hat Registry using\nyour Customer Portal credentials.\n
An example of a failed pod:
Normal Scheduled 3m40s default-scheduler Successfully assigned openstack/rabbitmq-server-0 to worker0\n Normal AddedInterface 3m38s multus Add eth0 [10.101.0.41/23] from ovn-kubernetes\n Warning Failed 2m16s (x6 over 3m38s) kubelet Error: ImagePullBackOff\n Normal Pulling 2m5s (x4 over 3m38s) kubelet Pulling image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\"\n Warning Failed 2m5s (x4 over 3m38s) kubelet Failed to pull image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\": rpc error: code ... can be found here: https://access.redhat.com/RegistryAuthentication\n Warning Failed 2m5s (x4 over 3m38s) kubelet Error: ErrImagePull\n Normal BackOff 110s (x7 over 3m38s) kubelet Back-off pulling image \"registry.redhat.io/rhosp-rhel9/openstack-rabbitmq:17.0\"\n
To solve this issue we need to get a valid pull-secret from the official Red Hat console site, store this pull secret locally in a machine with access to the Kubernetes API (service node), and then run:
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull_secret_location.json>\n
The previous command will make available the authentication information in all the cluster's compute nodes, then trigger a new pod deployment to pull the container image with:
kubectl delete pod rabbitmq-server-0 -n openstack\n
And the pod should be able to pull the image successfully. For more information about what container registries requires what type of authentication, check the official docs.
"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index c92a83522..cd774cb19 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,132 +2,132 @@