diff --git a/docs_user/adoption-attributes.adoc b/docs_user/adoption-attributes.adoc index f77d44eff..8051cf772 100644 --- a/docs_user/adoption-attributes.adoc +++ b/docs_user/adoption-attributes.adoc @@ -12,6 +12,8 @@ ifeval::["{build}" == "upstream"] :rhos_curr_ver: Antelope :rhos_prev_ver: Wallaby :OpenStackPreviousInstaller: TripleO +:Ceph: Ceph +:CephCluster: Ceph Storage //Components and services @@ -74,6 +76,8 @@ ifeval::["{build}" == "downstream"] :OpenShift: Red Hat OpenShift Container Platform :OpenShiftShort: RHOCP :OpenStackPreviousInstaller: director +:Ceph: Red Hat Ceph Storage +:CephCluster: Red Hat Ceph Storage //Components and services diff --git a/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc b/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc index a5332ab78..1b5cffd46 100644 --- a/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc +++ b/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc @@ -2,15 +2,15 @@ :context: migrating-ceph-rbd -= Migrating Red Hat Ceph Storage RBD += Migrating Red Hat Ceph Storage RBD to external RHEL nodes -For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running version 6 or later, you must migrate the daemons that are included in the {rhos_prev_long} control plane into the existing external RHEL nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes. +For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running {Ceph} version 6 or later, you must migrate the daemons that are included in the {rhos_prev_long} control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes. To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements: -* Red Hat Ceph Storage is running version 6 or later and is managed by cephadm/orchestrator. -* NFS (ganesha) is migrated from a {OpenStackPreviousInstaller}-based deployment to cephadm.For more information, see xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha cluster]. -* Both the Red Hat Ceph Storage public and cluster networks are propagated, with {OpenStackPreviousInstaller}, to the target nodes. +* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator. +* NFS (ganesha) is migrated from a {OpenStackPreviousInstaller}-based deployment to cephadm. For more information, see xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha cluster]. +* Both the {Ceph} public and cluster networks are propagated, with {OpenStackPreviousInstaller}, to the target nodes. * Ceph Monitors need to keep their IPs to avoid cold migration. include::../modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc[leveloffset=+1] diff --git a/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc b/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc new file mode 100644 index 000000000..0a176452d --- /dev/null +++ b/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc @@ -0,0 +1,23 @@ +[id="migrating-ceph-rgw_{context}"] + +:context: migrating-ceph-rgw + += Migrating {Ceph} RGW to external RHEL nodes + +For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running {Ceph} version 6 or later, you must migrate the RGW daemons that are included in the {rhos_prev_long} Controller nodes into the existing external Red Hat Enterprise Linux (RHEL) nodes. The existing external RHEL nodes typically include the Compute nodes for an HCI environment or {Ceph} nodes. + +To migrate Ceph Object Gateway (RGW), your environment must meet the following requirements: + +* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator. +* An undercloud is still available, and the nodes and networks are managed by {OpenStackPreviousInstaller}. + +include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1] + +include::../modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc[leveloffset=+1] + +include::../modules/proc_migrating-the-rgw-backends.adoc[leveloffset=+1] + +include::../modules/proc_deploying-a-ceph-ingress-daemon.adoc[leveloffset=+1] + +include::../modules/proc_updating-the-object-storage-endpoints.adoc[leveloffset=+1] + diff --git a/docs_user/assemblies/ceph_migration.adoc b/docs_user/assemblies/ceph_migration.adoc index f1762a778..75e63f4e6 100644 --- a/docs_user/assemblies/ceph_migration.adoc +++ b/docs_user/assemblies/ceph_migration.adoc @@ -9,7 +9,6 @@ ifdef::context[:parent-context: {context}] :toc: left :toclevels: 3 -include::../modules/ceph-rgw_migration.adoc[leveloffset=+1] include::../modules/ceph-mds_migration.adoc[leveloffset=+1] include::../modules/ceph-monitoring_migration.adoc[leveloffset=+1] diff --git a/docs_user/main.adoc b/docs_user/main.adoc index 23ff4cbbb..7b5ad13a1 100644 --- a/docs_user/main.adoc +++ b/docs_user/main.adoc @@ -24,6 +24,8 @@ include::assemblies/assembly_adopting-the-data-plane.adoc[leveloffset=+1] include::assemblies/assembly_migrating-ceph-rbd.adoc[leveloffset=+1] +include::assemblies/assembly_migrating-ceph-rgw.adoc[leveloffset=+1] + include::assemblies/ceph_migration.adoc[leveloffset=+1] include::assemblies/swift_migration.adoc[leveloffset=+1] diff --git a/docs_user/modules/ceph-rgw_migration.adoc b/docs_user/modules/ceph-rgw_migration.adoc deleted file mode 100644 index 189ab7446..000000000 --- a/docs_user/modules/ceph-rgw_migration.adoc +++ /dev/null @@ -1,660 +0,0 @@ -[id="migrating-ceph-rgw_{context}"] - - -//:context: migrating-ceph-rgw -//kgilliga: This module might be converted to an assembly. - -= Migrating Ceph RGW - -In this scenario, assuming Ceph is already >= 5, either for HCI or dedicated -Storage nodes, the RGW daemons living in the OpenStack Controller nodes will be -migrated into the existing external RHEL nodes (typically the Compute nodes -for an HCI environment or CephStorage nodes in the remaining use cases). - -== Requirements - -* Ceph is >= 5 and managed by cephadm/orchestrator -* An undercloud is still available: nodes and networks are managed by TripleO - -== Ceph Daemon Cardinality - -*Ceph 5+* applies https://access.redhat.com/articles/1548993[strict constraints] in the way daemons can be colocated -within the same node. The resulting topology depends on the available hardware, -as well as the amount of Ceph services present in the Controller nodes which are -going to be retired. The following document describes the procedure required -to migrate the RGW component (and keep an HA model using the https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw[Ceph Ingress -daemon] in a common TripleO scenario where Controller nodes represent the -https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[spec placement] where the service is deployed. As a general rule, the -number of services that can be migrated depends on the number of available -nodes in the cluster. The following diagrams cover the distribution of the Ceph -daemons on the CephStorage nodes where at least three nodes are required in a -scenario that sees only RGW and RBD (no dashboard): - ----- -| | | | -|----|---------------------|-------------| -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | rgw/ingress | ----- - -With dashboard, and without Manila at least four nodes are required (dashboard -has no failover): - ----- -| | | | -|-----|---------------------|-------------| -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | dashboard/grafana | -| osd | rgw/ingress | (free) | ----- - -With dashboard and Manila 5 nodes minimum are required (and dashboard has no -failover): - ----- -| | | | -|-----|---------------------|-------------------------| -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | rgw/ingress | -| osd | mon/mgr/crash | mds/ganesha/ingress | -| osd | rgw/ingress | mds/ganesha/ingress | -| osd | mds/ganesha/ingress | dashboard/grafana | ----- - -== Current Status - ----- -(undercloud) [stack@undercloud-0 ~]$ metalsmith list - - - +------------------------+ +----------------+ - | IP Addresses | | Hostname | - +------------------------+ +----------------+ - | ctlplane=192.168.24.25 | | cephstorage-0 | - | ctlplane=192.168.24.10 | | cephstorage-1 | - | ctlplane=192.168.24.32 | | cephstorage-2 | - | ctlplane=192.168.24.28 | | compute-0 | - | ctlplane=192.168.24.26 | | compute-1 | - | ctlplane=192.168.24.43 | | controller-0 | - | ctlplane=192.168.24.7 | | controller-1 | - | ctlplane=192.168.24.41 | | controller-2 | - +------------------------+ +----------------+ ----- - -SSH into `controller-0` and check the `pacemaker` status. This will help you -identify the relevant information that you need before you start the -RGW migration. - ----- -Full List of Resources: - * ip-192.168.24.46 (ocf:heartbeat:IPaddr2): Started controller-0 - * ip-10.0.0.103 (ocf:heartbeat:IPaddr2): Started controller-1 - * ip-172.17.1.129 (ocf:heartbeat:IPaddr2): Started controller-2 - * ip-172.17.3.68 (ocf:heartbeat:IPaddr2): Started controller-0 - * ip-172.17.4.37 (ocf:heartbeat:IPaddr2): Started controller-1 - * Container bundle set: haproxy-bundle - -[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]: - * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-2 - * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-0 - * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-1 ----- - -Use the `ip` command to identify the ranges of the storage networks. - ----- -[heat-admin@controller-0 ~]$ ip -o -4 a - -1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever -2: enp1s0 inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever -2: enp1s0 inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever -7: br-ex inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever -8: vlan70 inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever -8: vlan70 inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever -9: vlan50 inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\ valid_lft forever preferred_lft forever -10: vlan30 inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever -10: vlan30 inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever -11: vlan20 inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\ valid_lft forever preferred_lft forever -12: vlan40 inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\ valid_lft forever preferred_lft forever ----- - -In this example: - -* vlan30 represents the Storage Network, where the new RGW instances should be -started on the CephStorage nodes -* br-ex represents the External Network, which is where in the current -environment, haproxy has the frontend VIP assigned - -== Prerequisite: check the frontend network (Controller nodes) - -Identify the network that you previously had in haproxy and propagate it (via -TripleO) to the CephStorage nodes. This network is used to reserve a new VIP -that will be owned by Ceph and used as the entry point for the RGW service. - -ssh into `controller-0` and check the current HaProxy configuration until you -find `ceph_rgw` section: - ----- -$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg - -... -... -listen ceph_rgw - bind 10.0.0.103:8080 transparent - bind 172.17.3.68:8080 transparent - mode http - balance leastconn - http-request set-header X-Forwarded-Proto https if { ssl_fc } - http-request set-header X-Forwarded-Proto http if !{ ssl_fc } - http-request set-header X-Forwarded-Port %[dst_port] - option httpchk GET /swift/healthcheck - option httplog - option forwardfor - server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2 - server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2 - server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2 ----- - -Double check the network used as HaProxy frontend: - ----- -[controller-0]$ ip -o -4 a - -... -7: br-ex inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever -... ----- - -As described in the previous section, the check on controller-0 shows that you -are exposing the services using the external network, which is not present in -the Ceph Storage nodes, and you need to propagate it via TripleO. - -== Propagate the HaProxy frontend network to CephStorage nodes - -Change the NIC template used to define the ceph-storage network interfaces and -add the new config section. - -[source,yaml] ----- ---- -network_config: -- type: interface - name: nic1 - use_dhcp: false - dns_servers: {{ ctlplane_dns_nameservers }} - addresses: - - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }} - routes: {{ ctlplane_host_routes }} -- type: vlan - vlan_id: {{ storage_mgmt_vlan_id }} - device: nic1 - addresses: - - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }} - routes: {{ storage_mgmt_host_routes }} -- type: interface - name: nic2 - use_dhcp: false - defroute: false -- type: vlan - vlan_id: {{ storage_vlan_id }} - device: nic2 - addresses: - - ip_netmask: {{ storage_ip }}/{{ storage_cidr }} - routes: {{ storage_host_routes }} -- type: ovs_bridge - name: {{ neutron_physical_bridge_name }} - dns_servers: {{ ctlplane_dns_nameservers }} - domain: {{ dns_search_domains }} - use_dhcp: false - addresses: - - ip_netmask: {{ external_ip }}/{{ external_cidr }} - routes: {{ external_host_routes }} - members: - - type: interface - name: nic3 - primary: true ----- - -In addition, add the *External* Network to the `baremetal.yaml` file used by -metalsmith and run the `overcloud node provision` command passing the -`--network-config` option: - -[source,yaml] ----- -- name: CephStorage - count: 3 - hostname_format: cephstorage-%index% - instances: - - hostname: cephstorage-0 - name: ceph-0 - - hostname: cephstorage-1 - name: ceph-1 - - hostname: cephstorage-2 - name: ceph-2 - defaults: - profile: ceph-storage - network_config: - template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2 - networks: - - network: ctlplane - vif: true - - network: storage - - network: storage_mgmt - - network: external ----- - ----- -(undercloud) [stack@undercloud-0]$ - -openstack overcloud node provision - -o overcloud-baremetal-deployed-0.yaml - --stack overcloud - --network-config -y - $PWD/network/baremetal_deployment.yaml ----- - -Check the new network on the `CephStorage` nodes: - ----- -[root@cephstorage-0 ~]# ip -o -4 a - -1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever -2: enp1s0 inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever -11: vlan40 inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\ valid_lft forever preferred_lft forever -12: vlan30 inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever -14: br-ex inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever ----- - -And now it's time to start migrating the RGW backends and build the ingress on -top of them. - -== Migrate the RGW backends - -To match the cardinality diagram, you use cephadm labels to refer to a group of -nodes where a given daemon type should be deployed. - -Add the RGW label to the cephstorage nodes: - ----- -for i in 0 1 2; { - ceph orch host label add cephstorage-$i rgw; -} ----- - ----- -[ceph: root@controller-0 /]# - -for i in 0 1 2; { - ceph orch host label add cephstorage-$i rgw; -} - -Added label rgw to host cephstorage-0 -Added label rgw to host cephstorage-1 -Added label rgw to host cephstorage-2 - -[ceph: root@controller-0 /]# ceph orch host ls - -HOST ADDR LABELS STATUS -cephstorage-0 192.168.24.54 osd rgw -cephstorage-1 192.168.24.44 osd rgw -cephstorage-2 192.168.24.30 osd rgw -controller-0 192.168.24.45 _admin mon mgr -controller-1 192.168.24.11 _admin mon mgr -controller-2 192.168.24.38 _admin mon mgr - -6 hosts in cluster ----- - -During the overcloud deployment, RGW is applied at step2 -(external_deployment_steps), and a cephadm compatible spec is generated in -`/home/ceph-admin/specs/rgw` from the https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/ceph_mkspec.py[ceph_mkspec] ansible module. -Find and patch the RGW spec, specifying the right placement using the labels -approach, and change the rgw backend port to *8090* to avoid conflicts -with the https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[Ceph Ingress Daemon] (*) - ----- -[root@controller-0 heat-admin]# cat rgw - -networks: -- 172.17.3.0/24 -placement: - hosts: - - controller-0 - - controller-1 - - controller-2 -service_id: rgw -service_name: rgw.rgw -service_type: rgw -spec: - rgw_frontend_port: 8080 - rgw_realm: default - rgw_zone: default ----- - -Patch the spec replacing controller nodes with the label key - ----- ---- -networks: -- 172.17.3.0/24 -placement: - label: rgw -service_id: rgw -service_name: rgw.rgw -service_type: rgw -spec: - rgw_frontend_port: 8090 - rgw_realm: default - rgw_zone: default ----- - -(*) https://github.com/ceph/ceph/blob/main/src/cephadm/cephadm.py#L1423-L1446[cephadm_check_port] - -Apply the new RGW spec using the orchestrator CLI: - ----- -$ cephadm shell -m /home/ceph-admin/specs/rgw -$ cephadm shell -- ceph orch apply -i /mnt/rgw ----- - -Which triggers the redeploy: - ----- -... -osd.9 cephstorage-2 -rgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 starting -rgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 starting -rgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 starting -rgw.rgw.controller-1.eyvrzw controller-1 172.17.3.146:8080 running (5h) -rgw.rgw.controller-2.navbxa controller-2 172.17.3.66:8080 running (5h) - -... -osd.9 cephstorage-2 -rgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 running (19s) -rgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 running (16s) -rgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 running (13s) ----- - -At this point, you need to make sure that the new RGW backends are reachable on -the new ports, but you are going to enable an *IngressDaemon* on port *8080* -later in the process. For this reason, ssh on each RGW node (the _CephStorage_ -nodes) and add the iptables rule to allow connections to both 8080 and 8090 -ports in the CephStorage nodes. - ----- -iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment "ceph rgw ingress" -j ACCEPT - -iptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment "ceph rgw backends" -j ACCEPT - -for port in 8080 8090; { - for i in 25 10 32; { - ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \ - -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \ - -j ACCEPT; - } -} ----- - -From a Controller node (e.g. controller-0) try to reach (curl) the rgw backends: - ----- -for i in 26 23 81; do { - echo "---" - curl 172.17.3.$i:8090; - echo "---" - echo -done ----- - -And you should observe the following: - ----- ---- -Query 172.17.3.23 -anonymous ---- - ---- -Query 172.17.3.26 -anonymous ---- - ---- -Query 172.17.3.81 -anonymous ---- ----- - -=== NOTE - -In case RGW backends are migrated in the CephStorage nodes, there's no -"`internalAPI`" network(this is not true in the case of HCI). Reconfig the RGW -keystone endpoint, pointing to the external Network that has been propagated -(see the previous section) - ----- -[ceph: root@controller-0 /]# ceph config dump | grep keystone -global basic rgw_keystone_url http://172.16.1.111:5000 - -[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000 ----- - -== Deploy a Ceph IngressDaemon - -`HaProxy` is managed by TripleO via `Pacemaker`: the three running instances at -this point will point to the old RGW backends, resulting in a wrong, not -working configuration. -Since you are going to deploy the https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/ceph_mkspec.py[Ceph Ingress Daemon], the first thing to do -is remove the existing `ceph_rgw` config, clean up the config created by TripleO -and restart the service to make sure other services are not affected by this -change. - -ssh on each Controller node and remove the following is the section from -`/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg`: - ----- -listen ceph_rgw - bind 10.0.0.103:8080 transparent - mode http - balance leastconn - http-request set-header X-Forwarded-Proto https if { ssl_fc } - http-request set-header X-Forwarded-Proto http if !{ ssl_fc } - http-request set-header X-Forwarded-Port %[dst_port] - option httpchk GET /swift/healthcheck - option httplog - option forwardfor - server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2 - server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2 - server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2 ----- - -Restart `haproxy-bundle` and make sure it's started: - ----- -[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle -haproxy-bundle successfully restarted - - -[root@controller-0 ~]# sudo pcs status | grep haproxy - - * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]: - * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-0 - * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-1 - * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-2 ----- - -Double check no process is bound to 8080 anymore`" - ----- -[root@controller-0 ~]# ss -antop | grep 8080 -[root@controller-0 ~]# ----- - -And the swift CLI should fail at this point: - ----- -(overcloud) [root@cephstorage-0 ~]# swift list - -HTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) ----- - -You can start deploying the Ceph IngressDaemon on the CephStorage nodes. - -Set the required images for both HaProxy and Keepalived - ----- -ifeval::["{build}" != "downstream"] -[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3 -[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5 -endif::[] -ifeval::["{build}" == "downstream"] -[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy registry.redhat.io/rhceph/rhceph-haproxy-rhel9:latest -[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived registry.redhat.io/rhceph/keepalived-rhel9:latest -endif::[] - ----- - -Prepare the ingress spec and mount it to cephadm: - ----- -$ sudo vim /home/ceph-admin/specs/rgw_ingress ----- - -and paste the following content: - -[source,yaml] ----- ---- -service_type: ingress -service_id: rgw.rgw -placement: - label: rgw -spec: - backend_service: rgw.rgw - virtual_ip: 10.0.0.89/24 - frontend_port: 8080 - monitor_port: 8898 - virtual_interface_networks: - - 10.0.0.0/24 ----- - -Mount the generated spec and apply it using the orchestrator CLI: - ----- -$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress -$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress ----- - -Wait until the ingress is deployed and query the resulting endpoint: - ----- -[ceph: root@controller-0 /]# ceph orch ls - -NAME PORTS RUNNING REFRESHED AGE PLACEMENT -crash 6/6 6m ago 3d * -ingress.rgw.rgw 10.0.0.89:8080,8898 6/6 37s ago 60s label:rgw -mds.mds 3/3 6m ago 3d controller-0;controller-1;controller-2 -mgr 3/3 6m ago 3d controller-0;controller-1;controller-2 -mon 3/3 6m ago 3d controller-0;controller-1;controller-2 -osd.default_drive_group 15 37s ago 3d cephstorage-0;cephstorage-1;cephstorage-2 -rgw.rgw ?:8090 3/3 37s ago 4m label:rgw ----- - ----- -[ceph: root@controller-0 /]# curl 10.0.0.89:8080 - ---- -anonymous[ceph: root@controller-0 /]# -— ----- - -The result above shows that you are able to reach the backend from the -IngressDaemon, which means you are almost ready to interact with it using the -swift CLI. - -== Update the object-store endpoints - -The endpoints still point to the old VIP owned by pacemaker, but because it is -still used by other services and you reserved a new VIP on the same network, -before any other action you should update the object-store endpoint. - -List the current endpoints: - ----- -(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object - -| 1326241fb6b6494282a86768311f48d1 | regionOne | swift | object-store | True | internal | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s | -| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift | object-store | True | public | http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s | -| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift | object-store | True | admin | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s | ----- - -Update the endpoints pointing to the Ingress VIP: - ----- -(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url "http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s" 95596a2d92c74c15b83325a11a4f07a3 - -(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store -| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift | object-store | True | internal | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s | -| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift | object-store | True | public | http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s | -| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift | object-store | True | admin | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s | ----- - -And repeat the same action for both internal and admin. -Test the migrated service. - ----- -(overcloud) [stack@undercloud-0 ~]$ swift list --debug - -DEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint -DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens -DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000 -DEBUG:urllib3.connectionpool:http://10.0.0.115:5000 "POST /v3/auth/tokens HTTP/1.1" 201 7795 -DEBUG:keystoneclient.auth.identity.v3.base:{"token": {"methods": ["password"], "user": {"domain": {"id": "default", "name": "Default"}, "id": "6f87c7ffdddf463bbc633980cfd02bb3", "name": "admin", "password_expires_at": null}, - - -... -... -... - -DEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H "X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos" -H "Accept-Encoding: gzip" -DEBUG:swiftclient:RESP STATUS: 200 OK -DEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'} -DEBUG:swiftclient:RESP BODY: b'[]' ----- - -Run tempest tests against object-storage: - ----- -(overcloud) [stack@undercloud-0 tempest-dir]$ tempest run --regex tempest.api.object_storage -... -... -... -====== -Totals -====== -Ran: 141 tests in 606.5579 sec. - - Passed: 128 - - Skipped: 13 - - Expected Fail: 0 - - Unexpected Success: 0 - - Failed: 0 -Sum of execute time for each test: 657.5183 sec. - -============== -Worker Balance -============== - - Worker 0 (1 tests) => 0:10:03.400561 - - Worker 1 (2 tests) => 0:00:24.531916 - - Worker 2 (4 tests) => 0:00:10.249889 - - Worker 3 (30 tests) => 0:00:32.730095 - - Worker 4 (51 tests) => 0:00:26.246044 - - Worker 5 (6 tests) => 0:00:20.114803 - - Worker 6 (20 tests) => 0:00:16.290323 - - Worker 7 (27 tests) => 0:00:17.103827 ----- - -== Additional Resources - -A https://asciinema.org/a/560091[screen recording] is available. diff --git a/docs_user/modules/con_ceph-daemon-cardinality.adoc b/docs_user/modules/con_ceph-daemon-cardinality.adoc new file mode 100644 index 000000000..a3c4e8457 --- /dev/null +++ b/docs_user/modules/con_ceph-daemon-cardinality.adoc @@ -0,0 +1,48 @@ +[id="ceph-daemon-cardinality_{context}"] + += {Ceph} daemon cardinality + +{Ceph} 6 and later applies strict constraints in the way daemons can be colocated within the same node. +ifeval::["{build}" != "upstream"] +For more information, see link:https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations]. +endif::[] +The resulting topology depends on the available hardware, as well as the amount of {Ceph} services present in the Controller nodes which are going to be retired. +ifeval::["{build}" != "upstream"] +For more information about the procedure that is required to migrate the RGW component and keep an HA model using the Ceph ingress daemon, see link:{defaultCephURL}/object_gateway_guide/index#high-availability-for-the-ceph-object-gateway[High availability for the Ceph Object Gateway] in _Object Gateway Guide_. +endif::[] +ifeval::["{build}" != "downstream"] +The following document describes the procedure required to migrate the RGW component (and keep an HA model using the https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw[Ceph Ingress daemon] in a common {OpenStackPreviousInstaller} scenario where Controller nodes represent the +https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[spec placement] where the service is deployed. +endif::[] +As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the {Ceph} daemons on the {Ceph} nodes where at least three nodes are required in a scenario that sees only RGW and RBD, without the {dashboard_first_ref}: + +---- +| | | | +|----|---------------------|-------------| +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | rgw/ingress | +---- + +With the {dashboard}, and without {rhos_component_storage_file_first_ref} at least four nodes are required. The {dashboard} has no failover: + +---- +| | | | +|-----|---------------------|-------------| +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | dashboard/grafana | +| osd | rgw/ingress | (free) | +---- + +With the {dashboard} and the {rhos_component_storage_file}, 5 nodes minimum are required, and the {dashboard} has no failover: + +---- +| | | | +|-----|---------------------|-------------------------| +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | rgw/ingress | +| osd | mon/mgr/crash | mds/ganesha/ingress | +| osd | rgw/ingress | mds/ganesha/ingress | +| osd | mds/ganesha/ingress | dashboard/grafana | +---- \ No newline at end of file diff --git a/docs_user/modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc b/docs_user/modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc new file mode 100644 index 000000000..3e1fe7e5f --- /dev/null +++ b/docs_user/modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc @@ -0,0 +1,205 @@ +[id="completing-prerequisites-for-migrating-ceph-rgw_{context}"] + += Completing prerequisites for migrating {Ceph} RGW + +You must complete the following prerequisites before you begin the {Ceph} RGW migration. + +.Procedure + +. Check the current status of the {Ceph} nodes: ++ +---- +(undercloud) [stack@undercloud-0 ~]$ metalsmith list + + + +------------------------+ +----------------+ + | IP Addresses | | Hostname | + +------------------------+ +----------------+ + | ctlplane=192.168.24.25 | | cephstorage-0 | + | ctlplane=192.168.24.10 | | cephstorage-1 | + | ctlplane=192.168.24.32 | | cephstorage-2 | + | ctlplane=192.168.24.28 | | compute-0 | + | ctlplane=192.168.24.26 | | compute-1 | + | ctlplane=192.168.24.43 | | controller-0 | + | ctlplane=192.168.24.7 | | controller-1 | + | ctlplane=192.168.24.41 | | controller-2 | + +------------------------+ +----------------+ +---- + +. Log in to `controller-0` and check the `pacemaker` status to help you +identify the information that you need before you start the RGW migration. ++ +---- +Full List of Resources: + * ip-192.168.24.46 (ocf:heartbeat:IPaddr2): Started controller-0 + * ip-10.0.0.103 (ocf:heartbeat:IPaddr2): Started controller-1 + * ip-172.17.1.129 (ocf:heartbeat:IPaddr2): Started controller-2 + * ip-172.17.3.68 (ocf:heartbeat:IPaddr2): Started controller-0 + * ip-172.17.4.37 (ocf:heartbeat:IPaddr2): Started controller-1 + * Container bundle set: haproxy-bundle + +[undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]: + * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-2 + * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-0 + * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-1 +---- + +. Use the `ip` command to identify the ranges of the storage networks. ++ +---- +[heat-admin@controller-0 ~]$ ip -o -4 a + +1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever +2: enp1s0 inet 192.168.24.45/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever +2: enp1s0 inet 192.168.24.46/32 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever +7: br-ex inet 10.0.0.122/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever +8: vlan70 inet 172.17.5.22/24 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever +8: vlan70 inet 172.17.5.94/32 brd 172.17.5.255 scope global vlan70\ valid_lft forever preferred_lft forever +9: vlan50 inet 172.17.2.140/24 brd 172.17.2.255 scope global vlan50\ valid_lft forever preferred_lft forever +10: vlan30 inet 172.17.3.73/24 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever +10: vlan30 inet 172.17.3.68/32 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever +11: vlan20 inet 172.17.1.88/24 brd 172.17.1.255 scope global vlan20\ valid_lft forever preferred_lft forever +12: vlan40 inet 172.17.4.24/24 brd 172.17.4.255 scope global vlan40\ valid_lft forever preferred_lft forever +---- ++ +* vlan30 represents the Storage Network, where the new RGW instances should be +started on the {CephCluster} nodes. +* br-ex represents the External Network, which is where in the current +environment, haproxy has the frontend Virtual IP (VIP) assigned. + +. Identify the network that you previously had in haproxy and propagate it through +{OpenStackPreviousInstaller} to the {CephCluster} nodes. This network is used to reserve a new VIP +that is owned by {Ceph} and used as the entry point for the RGW service. + +.. Log into `controller-0` and check the current HAProxy configuration until you +find `ceph_rgw` section: ++ +---- +$ less /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg + +... +... +listen ceph_rgw + bind 10.0.0.103:8080 transparent + bind 172.17.3.68:8080 transparent + mode http + balance leastconn + http-request set-header X-Forwarded-Proto https if { ssl_fc } + http-request set-header X-Forwarded-Proto http if !{ ssl_fc } + http-request set-header X-Forwarded-Port %[dst_port] + option httpchk GET /swift/healthcheck + option httplog + option forwardfor + server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2 + server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2 + server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2 +---- + +.. Confirm that the network is used as an HAProxy frontend: ++ +---- +[controller-0]$ ip -o -4 a + +... +7: br-ex inet 10.0.0.106/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever +... +---- ++ +This example shows that `controller-0` is exposing the services by using the external network, which is not present in +the {Ceph} nodes, and you need to propagate it through {OpenStackPreviousInstaller}. + +. Propagate the HAProxy frontend network to {CephCluster} nodes. + +.. Change the NIC template used to define the `ceph-storage` network interfaces and add the new config section: ++ +[source,yaml] +---- +--- +network_config: +- type: interface + name: nic1 + use_dhcp: false + dns_servers: {{ ctlplane_dns_nameservers }} + addresses: + - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }} + routes: {{ ctlplane_host_routes }} +- type: vlan + vlan_id: {{ storage_mgmt_vlan_id }} + device: nic1 + addresses: + - ip_netmask: {{ storage_mgmt_ip }}/{{ storage_mgmt_cidr }} + routes: {{ storage_mgmt_host_routes }} +- type: interface + name: nic2 + use_dhcp: false + defroute: false +- type: vlan + vlan_id: {{ storage_vlan_id }} + device: nic2 + addresses: + - ip_netmask: {{ storage_ip }}/{{ storage_cidr }} + routes: {{ storage_host_routes }} +- type: ovs_bridge + name: {{ neutron_physical_bridge_name }} + dns_servers: {{ ctlplane_dns_nameservers }} + domain: {{ dns_search_domains }} + use_dhcp: false + addresses: + - ip_netmask: {{ external_ip }}/{{ external_cidr }} + routes: {{ external_host_routes }} + members: + - type: interface + name: nic3 + primary: true +---- + +.. In addition, add the External Network to the `baremetal.yaml` file used by +metalsmith: ++ +[source,yaml] +---- +- name: CephStorage + count: 3 + hostname_format: cephstorage-%index% + instances: + - hostname: cephstorage-0 + name: ceph-0 + - hostname: cephstorage-1 + name: ceph-1 + - hostname: cephstorage-2 + name: ceph-2 + defaults: + profile: ceph-storage + network_config: + template: /home/stack/composable_roles/network/nic-configs/ceph-storage.j2 + networks: + - network: ctlplane + vif: true + - network: storage + - network: storage_mgmt + - network: external +---- + +.. Run the `overcloud node provision` command passing the `--network-config` option: ++ +---- +(undercloud) [stack@undercloud-0]$ + +openstack overcloud node provision + -o overcloud-baremetal-deployed-0.yaml + --stack overcloud + --network-config -y + $PWD/network/baremetal_deployment.yaml +---- + +.. Check the new network on the {CephCluster} nodes: ++ +---- +[root@cephstorage-0 ~]# ip -o -4 a + +1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever +2: enp1s0 inet 192.168.24.54/24 brd 192.168.24.255 scope global enp1s0\ valid_lft forever preferred_lft forever +11: vlan40 inet 172.17.4.43/24 brd 172.17.4.255 scope global vlan40\ valid_lft forever preferred_lft forever +12: vlan30 inet 172.17.3.23/24 brd 172.17.3.255 scope global vlan30\ valid_lft forever preferred_lft forever +14: br-ex inet 10.0.0.133/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever +---- diff --git a/docs_user/modules/proc_creating-a-ceph-nfs-cluster.adoc b/docs_user/modules/proc_creating-a-ceph-nfs-cluster.adoc index c1f3193fa..752f20b5b 100644 --- a/docs_user/modules/proc_creating-a-ceph-nfs-cluster.adoc +++ b/docs_user/modules/proc_creating-a-ceph-nfs-cluster.adoc @@ -18,7 +18,7 @@ deployed via {OpenStackPreviousInstaller}. .. Identify the node definition file used in the environment. This is the input file associated with the `openstack overcloud node provision` command. For example, this file may be called `overcloud-baremetal-deploy.yaml` -.. Edit the networks associated with the `CephStorage` nodes to include the +.. Edit the networks associated with the {CephCluster} nodes to include the `StorageNFS` network: + [source,yaml] @@ -45,7 +45,7 @@ command. For example, this file may be called `overcloud-baremetal-deploy.yaml` - network: storage_mgmt - network: storage_nfs ---- -.. Edit the network configuration template file for the `CephStorage` nodes +.. Edit the network configuration template file for the {CephCluster} nodes to include an interface connecting to the `StorageNFS` network. In the example above, the path to the network configuration template file is `/home/stack/network/nic-configs/ceph-storage.j2`. This file is modified @@ -61,7 +61,7 @@ to include the following NIC template: routes: {{ storage_nfs_host_routes }} ---- .. Re-run the `openstack overcloud node provision` command to update the -`CephStorage` nodes. +{CephCluster} nodes. + ---- openstack overcloud node provision \ @@ -72,7 +72,7 @@ openstack overcloud node provision \ /home/stack/network/baremetal_deployment.yaml ---- + -When the update is complete, ensure that the `CephStorage` nodes have a +When the update is complete, ensure that the {CephCluster} nodes have a new interface created and tagged with the appropriate VLAN associated with `StorageNFS`. diff --git a/docs_user/modules/proc_deploying-a-ceph-ingress-daemon.adoc b/docs_user/modules/proc_deploying-a-ceph-ingress-daemon.adoc new file mode 100644 index 000000000..64de4f374 --- /dev/null +++ b/docs_user/modules/proc_deploying-a-ceph-ingress-daemon.adoc @@ -0,0 +1,134 @@ +[id="deploying-a-ceph-ingress-daemon_{context}"] + += Deploying a {Ceph} ingress daemon + +To match the cardinality diagram, you use cephadm labels to refer to a group of nodes where a given daemon type should be deployed. For more information about the cardinality diagram, see xref:ceph-daemon-cardinality_{context}[{Ceph} daemon cardinality]. +`HAProxy` is managed by {OpenStackPreviousInstaller} through `Pacemaker`: the three running instances at this point will point to the old RGW backends, resulting in a broken configuration. +ifeval::["{build}" != "upstream"] +Since you are going to deploy the Ceph ingress daemon, the first thing to do is remove the existing `ceph_rgw` config, clean up the config created by {OpenStackPreviousInstaller} and restart the service to make sure other services are not affected by this change. +endif::[] +ifeval::["{build}" != "downstream"] +Since you are going to deploy the https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/ansible_plugins/modules/ceph_mkspec.py[Ceph ingress daemon], the first thing to do is remove the existing `ceph_rgw` config, clean up the config created by {OpenStackPreviousInstaller} and restart the service to make sure other services are not affected by this change. +endif::[] +After you complete this procedure, you can reach the RGW backend from the ingress daemon and use RGW through the {object_storage} command line interface (CLI). +//kgilliga: I will rewrite this intro for GA. + +.Procedure + +. Log in to each Controller node and remove the following configuration from the `/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg` file: ++ +---- +listen ceph_rgw + bind 10.0.0.103:8080 transparent + mode http + balance leastconn + http-request set-header X-Forwarded-Proto https if { ssl_fc } + http-request set-header X-Forwarded-Proto http if !{ ssl_fc } + http-request set-header X-Forwarded-Port %[dst_port] + option httpchk GET /swift/healthcheck + option httplog + option forwardfor + server controller-0.storage.redhat.local 172.17.3.73:8080 check fall 5 inter 2000 rise 2 + server controller-1.storage.redhat.local 172.17.3.146:8080 check fall 5 inter 2000 rise 2 + server controller-2.storage.redhat.local 172.17.3.156:8080 check fall 5 inter 2000 rise 2 +---- + +. Restart `haproxy-bundle` and ensure it is started: ++ +---- +[root@controller-0 ~]# sudo pcs resource restart haproxy-bundle +haproxy-bundle successfully restarted + + +[root@controller-0 ~]# sudo pcs status | grep haproxy + + * Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-haproxy:pcmklatest]: + * haproxy-bundle-podman-0 (ocf:heartbeat:podman): Started controller-0 + * haproxy-bundle-podman-1 (ocf:heartbeat:podman): Started controller-1 + * haproxy-bundle-podman-2 (ocf:heartbeat:podman): Started controller-2 +---- + +. Confirm that no process is bound to 8080: ++ +---- +[root@controller-0 ~]# ss -antop | grep 8080 +[root@controller-0 ~]# +---- ++ +The {object_storage_first_ref} CLI fails at this point: ++ +---- +(overcloud) [root@cephstorage-0 ~]# swift list + +HTTPConnectionPool(host='10.0.0.103', port=8080): Max retries exceeded with url: /swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) +---- + +. Set the required images for both HAProxy and Keepalived: ++ +---- +ifeval::["{build}" != "downstream"] +[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy quay.io/ceph/haproxy:2.3 +[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived quay.io/ceph/keepalived:2.1.5 +endif::[] +ifeval::["{build}" == "downstream"] +[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_haproxy registry.redhat.io/rhceph/rhceph-haproxy-rhel9:latest +[ceph: root@controller-0 /]# ceph config set mgr mgr/cephadm/container_image_keepalived registry.redhat.io/rhceph/keepalived-rhel9:latest +endif::[] +---- + +. Create a file called `rgw_ingress` in the `/home/ceph-admin/specs/` directory in `controller-0`: ++ +---- +$ sudo vim /home/ceph-admin/specs/rgw_ingress +---- + +. Paste the following content in to the `rgw_ingress` file: ++ +[source,yaml] +---- +--- +service_type: ingress +service_id: rgw.rgw +placement: + label: rgw +spec: + backend_service: rgw.rgw + virtual_ip: 10.0.0.89/24 + frontend_port: 8080 + monitor_port: 8898 + virtual_interface_networks: + - +---- ++ +* Replace `` with your external network, for example, `10.0.0.0/24`. For more information, see xref:completing-prerequisites-for-migrating-ceph-rgw_{context}[Completing prerequisites for migrating {Ceph} RGW]. + +. Apply the `rgw_ingress` spec by using the Ceph orchestrator CLI: ++ +---- +$ cephadm shell -m /home/ceph-admin/specs/rgw_ingress +$ cephadm shell -- ceph orch apply -i /mnt/rgw_ingress +---- + +. Wait until the ingress is deployed and query the resulting endpoint: ++ +---- +[ceph: root@controller-0 /]# ceph orch ls + +NAME PORTS RUNNING REFRESHED AGE PLACEMENT +crash 6/6 6m ago 3d * +ingress.rgw.rgw 10.0.0.89:8080,8898 6/6 37s ago 60s label:rgw +mds.mds 3/3 6m ago 3d controller-0;controller-1;controller-2 +mgr 3/3 6m ago 3d controller-0;controller-1;controller-2 +mon 3/3 6m ago 3d controller-0;controller-1;controller-2 +osd.default_drive_group 15 37s ago 3d cephstorage-0;cephstorage-1;cephstorage-2 +rgw.rgw ?:8090 3/3 37s ago 4m label:rgw +---- ++ +---- +[ceph: root@controller-0 /]# curl 10.0.0.89:8080 + +--- +anonymous[ceph: root@controller-0 /]# +— +---- + diff --git a/docs_user/modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc b/docs_user/modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc index e09e5fec6..b2827a58a 100644 --- a/docs_user/modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc +++ b/docs_user/modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc @@ -1,21 +1,21 @@ [id="migrating-mon-and-mgr-from-controller-nodes_{context}"] -= Migrating Ceph Monitor and Ceph Manager daemons to Red Hat Ceph Storage nodes -//kgilliga: I'm trying to understand the purpose of this procedure. Is this procedure a prescriptive way for customers to migrate Ceph Monitor and Ceph manager daemons from controller nodes to Red Hat Ceph Storage nodes? Or are we recommending that customers create a proof of concept before doing the actual migration? And are oc0-controller-1 and oc0-ceph-0 just examples of the names of nodes for the purposes of this procedure? Note: The SME addressed these questions in the PR. This procedure needs more work. It should not be a POC. -Migrate your Ceph Monitor daemons, Ceph Manager daemons, and object storage daemons (OSDs) from your {rhos_prev_long} Controller nodes to existing Red Hat Ceph Storage nodes. During the migration,ensure that you can do the following actions: += Migrating Ceph Monitor and Ceph Manager daemons to {Ceph} nodes +//kgilliga: This procedure needs to be revisited. It should not be a POC. +Migrate your Ceph Monitor daemons, Ceph Manager daemons, and object storage daemons (OSDs) from your {rhos_prev_long} Controller nodes to existing {Ceph} nodes. During the migration, ensure that you can do the following actions: -* Keep the mon IP addresses by moving them to the Red Hat Ceph Storage nodes. +* Keep the mon IP addresses by moving them to the {Ceph} nodes. * Drain the existing Controller nodes and shut them down. * Deploy additional monitors to the existing nodes, and promote them as -_admin nodes that administrators can use to manage the Red Hat Ceph Storage cluster and perform day 2 operations against it. -* Keep the cluster operational during the migration. +_admin nodes that administrators can use to manage the {CephCluster} cluster and perform day 2 operations against it. +* Keep the {CephCluster} cluster operational during the migration. -The following procedure shows an example migration from a Controller node (`oc0-controller-1`) and a Red Hat Ceph Storage node (`oc0-ceph-0`). Use the names of the nodes in your environment. +The following procedure shows an example migration from a Controller node (`oc0-controller-1`) and a {Ceph} node (`oc0-ceph-0`). Use the names of the nodes in your environment. .Prerequisites * Configure the Storage nodes to have both storage and storage_mgmt -network to ensure that you can use both Red Hat Ceph Storage public and cluster networks. This step requires you to interact with {OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later you do not have to run a stack update. However, there are commands that you must perform to run `os-net-config` on the bare metal node and configure additional networks. +network to ensure that you can use both {Ceph} public and cluster networks. This step requires you to interact with {OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later you do not have to run a stack update. However, there are commands that you must perform to run `os-net-config` on the bare metal node and configure additional networks. .. Ensure that the network is defined in the `metalsmith.yaml` for the CephStorageNodes: + @@ -62,14 +62,14 @@ Warning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts. .Procedure -. To migrate mon(s) and mgr(s) on the two existing Red Hat Ceph Storage nodes, create a Red Hat Ceph Storage spec based on the default roles with the mon/mgr on the controller nodes. +. To migrate mon(s) and mgr(s) on the two existing {Ceph} nodes, create a {Ceph} spec based on the default roles with the mon/mgr on the controller nodes. + ---- openstack overcloud ceph spec -o ceph_spec.yaml -y \ --stack overcloud-0 overcloud-baremetal-deployed-0.yaml ---- -. Deploy the Red Hat Ceph Storage cluster: +. Deploy the {CephCluster} cluster: + ---- openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \ @@ -79,10 +79,10 @@ openstack overcloud ceph spec -o ceph_spec.yaml -y \ ---- + [NOTE] -The `ceph_spec.yaml`, which is the OSP-generated description of the Red Hat Ceph Storage cluster, +The `ceph_spec.yaml`, which is the OSP-generated description of the {CephCluster} cluster, will be used, later in the process, as the basic template required by cephadm to update the status/info of the daemons. -. Check the status of the cluster: +. Check the status of the {CephCluster} cluster: + ---- [ceph: root@oc0-controller-0 /]# ceph -s @@ -348,7 +348,7 @@ osd.default_drive_group 8 2m ago 69s oc0-ceph-0;oc0-ceph-1 ceph mgr fail oc0-controller-0.xzgtvo ---- + -At this point the cluster is clean: +At this point the {CephCluster} cluster is clean: + ---- [ceph: root@oc0-controller-0 specs]# ceph -s @@ -368,7 +368,7 @@ At this point the cluster is clean: pgs: 1 active+clean ---- + -The `oc0-controller-1` is removed and powered off without leaving traces on the Red Hat Ceph Storage cluster. +The `oc0-controller-1` is removed and powered off without leaving traces on the {CephCluster} cluster. . Repeat this procedure for additional Controller nodes in your environment until you have migrated all the Ceph Mon and Ceph Manager daemons to the target nodes. diff --git a/docs_user/modules/proc_migrating-the-rgw-backends.adoc b/docs_user/modules/proc_migrating-the-rgw-backends.adoc new file mode 100644 index 000000000..9ac5e8887 --- /dev/null +++ b/docs_user/modules/proc_migrating-the-rgw-backends.adoc @@ -0,0 +1,175 @@ +[id="migrating-the-rgw-backends_{context}"] + += Migrating the {Ceph} RGW backends + +To match the cardinality diagram, you use cephadm labels to refer to a group of nodes where a given daemon type should be deployed. For more information about the cardinality diagram, see xref:ceph-daemon-cardinality_{context}[{Ceph} daemon cardinality]. + +.Procedure + +. Add the RGW label to the {Ceph} nodes: ++ +---- +for i in 0 1 2; { + ceph orch host label add cephstorage-$i rgw; +} +---- ++ +---- +[ceph: root@controller-0 /]# + +for i in 0 1 2; { + ceph orch host label add cephstorage-$i rgw; +} + +Added label rgw to host cephstorage-0 +Added label rgw to host cephstorage-1 +Added label rgw to host cephstorage-2 + +[ceph: root@controller-0 /]# ceph orch host ls + +HOST ADDR LABELS STATUS +cephstorage-0 192.168.24.54 osd rgw +cephstorage-1 192.168.24.44 osd rgw +cephstorage-2 192.168.24.30 osd rgw +controller-0 192.168.24.45 _admin mon mgr +controller-1 192.168.24.11 _admin mon mgr +controller-2 192.168.24.38 _admin mon mgr + +6 hosts in cluster +---- + +ifeval::["{build}" != "downstream"] +. During the overcloud deployment, RGW is applied at step 2 +(external_deployment_steps), and a cephadm compatible spec is generated in +`/home/ceph-admin/specs/rgw` from the https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/ceph_mkspec.py[ceph_mkspec] ansible module. +Find and patch the RGW spec, specifying the right placement using the labels +approach, and change the rgw backend port to 8090 to avoid conflicts +with the https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[Ceph Ingress Daemon] (*) +endif::[] +ifeval::["{build}" != "upstream"] +. During the overcloud deployment, RGW is applied at step 2 +(external_deployment_steps), and a cephadm compatible spec is generated in +`/home/ceph-admin/specs/rgw` from director. Find the RGW spec: +endif::[] ++ +---- +[root@controller-0 heat-admin]# cat rgw + +networks: +- 172.17.3.0/24 +placement: + hosts: + - controller-0 + - controller-1 + - controller-2 +service_id: rgw +service_name: rgw.rgw +service_type: rgw +spec: + rgw_frontend_port: 8080 + rgw_realm: default + rgw_zone: default +---- + +. In the `placement` section, replace the following values: +* Replace the controller nodes with the `label: rgw` label. +* Change the ` rgw_frontend_port` value to `8090` to avoid conflicts with the Ceph ingress daemon. ++ +---- +--- +networks: +- 172.17.3.0/24 +placement: + label: rgw +service_id: rgw +service_name: rgw.rgw +service_type: rgw +spec: + rgw_frontend_port: 8090 + rgw_realm: default + rgw_zone: default +---- + +. Apply the new RGW spec by using the orchestrator CLI: ++ +---- +$ cephadm shell -m /home/ceph-admin/specs/rgw +$ cephadm shell -- ceph orch apply -i /mnt/rgw +---- ++ +This command triggers the redeploy: ++ +---- +... +osd.9 cephstorage-2 +rgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 starting +rgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 starting +rgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 starting +rgw.rgw.controller-1.eyvrzw controller-1 172.17.3.146:8080 running (5h) +rgw.rgw.controller-2.navbxa controller-2 172.17.3.66:8080 running (5h) + +... +osd.9 cephstorage-2 +rgw.rgw.cephstorage-0.wsjlgx cephstorage-0 172.17.3.23:8090 running (19s) +rgw.rgw.cephstorage-1.qynkan cephstorage-1 172.17.3.26:8090 running (16s) +rgw.rgw.cephstorage-2.krycit cephstorage-2 172.17.3.81:8090 running (13s) +---- + +. Ensure that the new RGW backends are reachable on +the new ports, because you are going to enable an IngressDaemon on port 8080 +later in the process. For this reason, log in to each RGW node (the {CephCluster} +nodes) and add the iptables rule to allow connections to both 8080 and 8090 +ports in the {CephCluster} nodes. ++ +---- +iptables -I INPUT -p tcp -m tcp --dport 8080 -m conntrack --ctstate NEW -m comment --comment "ceph rgw ingress" -j ACCEPT + +iptables -I INPUT -p tcp -m tcp --dport 8090 -m conntrack --ctstate NEW -m comment --comment "ceph rgw backends" -j ACCEPT + +for port in 8080 8090; { + for i in 25 10 32; { + ssh heat-admin@192.168.24.$i sudo iptables -I INPUT \ + -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \ + -j ACCEPT; + } +} +---- + +. From a Controller node (e.g. controller-0) try to reach (curl) the RGW backends: ++ +---- +for i in 26 23 81; do { + echo "---" + curl 172.17.3.$i:8090; + echo "---" + echo +done +---- ++ +You should observe the following output: ++ +---- +--- +Query 172.17.3.23 +anonymous +--- + +--- +Query 172.17.3.26 +anonymous +--- + +--- +Query 172.17.3.81 +anonymous +--- +---- + +. If RGW backends are migrated in the {Ceph} nodes, there is no "`internalAPI`" network(this is not true in the case of HCI). Reconfigure the RGW keystone endpoint, pointing to the external network that has been propagated. For more information about propagating the external network, see xref:completing-prerequisites-for-migrating-ceph-rgw_{context}[Completing prerequisites for migrating {Ceph} RGW]. ++ +---- +[ceph: root@controller-0 /]# ceph config dump | grep keystone +global basic rgw_keystone_url http://172.16.1.111:5000 + +[ceph: root@controller-0 /]# ceph config set global rgw_keystone_url http://10.0.0.103:5000 +---- diff --git a/docs_user/modules/proc_updating-the-object-storage-endpoints.adoc b/docs_user/modules/proc_updating-the-object-storage-endpoints.adoc new file mode 100644 index 000000000..74b1c94d8 --- /dev/null +++ b/docs_user/modules/proc_updating-the-object-storage-endpoints.adoc @@ -0,0 +1,83 @@ +[id="updating-the-object-storage-endpoints_{context}"] + += Updating the object-store endpoints + +The object-storage endpoints still point to the original virtual IP address (VIP) that is owned by pacemaker. You must update the object-store endpoints because other services stll use the original VIP, and you reserved a new VIP on the same network. + +.Procedure + +. List the current endpoints: ++ +---- +(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object + +| 1326241fb6b6494282a86768311f48d1 | regionOne | swift | object-store | True | internal | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s | +| 8a34817a9d3443e2af55e108d63bb02b | regionOne | swift | object-store | True | public | http://10.0.0.103:8080/swift/v1/AUTH_%(project_id)s | +| fa72f8b8b24e448a8d4d1caaeaa7ac58 | regionOne | swift | object-store | True | admin | http://172.17.3.68:8080/swift/v1/AUTH_%(project_id)s | +---- + +. Update the endpoints that are pointing to the Ingress VIP: ++ +---- +(overcloud) [stack@undercloud-0 ~]$ openstack endpoint set --url "http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s" 95596a2d92c74c15b83325a11a4f07a3 + +(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list | grep object-store +| 6c7244cc8928448d88ebfad864fdd5ca | regionOne | swift | object-store | True | internal | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s | +| 95596a2d92c74c15b83325a11a4f07a3 | regionOne | swift | object-store | True | public | http://10.0.0.89:8080/swift/v1/AUTH_%(project_id)s | +| e6d0599c5bf24a0fb1ddf6ecac00de2d | regionOne | swift | object-store | True | admin | http://172.17.3.79:8080/swift/v1/AUTH_%(project_id)s | +---- ++ +Repeat this step for both internal and admin endpoints. + +. Test the migrated service: ++ +---- +(overcloud) [stack@undercloud-0 ~]$ swift list --debug + +DEBUG:swiftclient:Versionless auth_url - using http://10.0.0.115:5000/v3 as endpoint +DEBUG:keystoneclient.auth.identity.v3.base:Making authentication request to http://10.0.0.115:5000/v3/auth/tokens +DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.0.115:5000 +DEBUG:urllib3.connectionpool:http://10.0.0.115:5000 "POST /v3/auth/tokens HTTP/1.1" 201 7795 +DEBUG:keystoneclient.auth.identity.v3.base:{"token": {"methods": ["password"], "user": {"domain": {"id": "default", "name": "Default"}, "id": "6f87c7ffdddf463bbc633980cfd02bb3", "name": "admin", "password_expires_at": null}, + + +... +... +... + +DEBUG:swiftclient:REQ: curl -i http://10.0.0.89:8080/swift/v1/AUTH_852f24425bb54fa896476af48cbe35d3?format=json -X GET -H "X-Auth-Token: gAAAAABj7KHdjZ95syP4c8v5a2zfXckPwxFQZYg0pgWR42JnUs83CcKhYGY6PFNF5Cg5g2WuiYwMIXHm8xftyWf08zwTycJLLMeEwoxLkcByXPZr7kT92ApT-36wTfpi-zbYXd1tI5R00xtAzDjO3RH1kmeLXDgIQEVp0jMRAxoVH4zb-DVHUos" -H "Accept-Encoding: gzip" +DEBUG:swiftclient:RESP STATUS: 200 OK +DEBUG:swiftclient:RESP HEADERS: {'content-length': '2', 'x-timestamp': '1676452317.72866', 'x-account-container-count': '0', 'x-account-object-count': '0', 'x-account-bytes-used': '0', 'x-account-bytes-used-actual': '0', 'x-account-storage-policy-default-placement-container-count': '0', 'x-account-storage-policy-default-placement-object-count': '0', 'x-account-storage-policy-default-placement-bytes-used': '0', 'x-account-storage-policy-default-placement-bytes-used-actual': '0', 'x-trans-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'x-openstack-request-id': 'tx00000765c4b04f1130018-0063eca1dd-1dcba-default', 'accept-ranges': 'bytes', 'content-type': 'application/json; charset=utf-8', 'date': 'Wed, 15 Feb 2023 09:11:57 GMT'} +DEBUG:swiftclient:RESP BODY: b'[]' +---- + +. Run tempest tests against object-storage: ++ +---- +(overcloud) [stack@undercloud-0 tempest-dir]$ tempest run --regex tempest.api.object_storage +... +... +... +====== +Totals +====== +Ran: 141 tests in 606.5579 sec. + - Passed: 128 + - Skipped: 13 + - Expected Fail: 0 + - Unexpected Success: 0 + - Failed: 0 +Sum of execute time for each test: 657.5183 sec. + +============== +Worker Balance +============== + - Worker 0 (1 tests) => 0:10:03.400561 + - Worker 1 (2 tests) => 0:00:24.531916 + - Worker 2 (4 tests) => 0:00:10.249889 + - Worker 3 (30 tests) => 0:00:32.730095 + - Worker 4 (51 tests) => 0:00:26.246044 + - Worker 5 (6 tests) => 0:00:20.114803 + - Worker 6 (20 tests) => 0:00:16.290323 + - Worker 7 (27 tests) => 0:00:17.103827 +----