Skip to content

Commit

Permalink
restructured migrating ceph MDS procedure
Browse files Browse the repository at this point in the history
  • Loading branch information
klgill committed Apr 19, 2024
1 parent c1ed442 commit 4ea461d
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 97 deletions.
2 changes: 2 additions & 0 deletions docs_user/adoption-attributes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ ifeval::["{build}" == "upstream"]
:OpenStackPreviousInstaller: TripleO
:Ceph: Ceph
:CephCluster: Ceph Storage
:CephRelease: Reef

//Components and services

Expand Down Expand Up @@ -78,6 +79,7 @@ ifeval::["{build}" == "downstream"]
:OpenStackPreviousInstaller: director
:Ceph: Red Hat Ceph Storage
:CephCluster: Red Hat Ceph Storage
:CephRelease: 7

//Components and services

Expand Down
1 change: 0 additions & 1 deletion docs_user/assemblies/ceph_migration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ ifdef::context[:parent-context: {context}]
:toc: left
:toclevels: 3

include::../modules/ceph-mds_migration.adoc[leveloffset=+1]
include::../modules/ceph-monitoring_migration.adoc[leveloffset=+1]

ifdef::parent-context[:context: {parent-context}]
Expand Down
2 changes: 2 additions & 0 deletions docs_user/main.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ include::assemblies/assembly_migrating-ceph-rbd.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-rgw.adoc[leveloffset=+1]

include::modules/proc_migrating-ceph-mds.adoc[leveloffset=+1]

include::assemblies/ceph_migration.adoc[leveloffset=+1]

include::assemblies/swift_migration.adoc[leveloffset=+1]
Original file line number Diff line number Diff line change
@@ -1,39 +1,23 @@
[id="migrating-ceph-mds_{context}"]

//:context: migrating-ceph-mds
//kgilliga: This module might be converted to an assembly.

= Migrating Ceph MDS

In the context of data plane adoption, where the OpenStack services are
redeployed in OpenShift, a TripleO-deployed Ceph cluster will undergo a
migration in a process we are calling “externalizing” the Ceph cluster.
There are two deployment topologies, broadly, that include an “internal” Ceph
cluster today: one is where OpenStack includes dedicated Storage nodes to host
OSDs, and the other is Hyperconverged Infrastructure (HCI) where Compute nodes
double up as Storage nodes. In either scenario, there are some Ceph processes
that are deployed on OpenStack Controller nodes: Ceph monitors, rgw, rdb, mds,
ceph dashboard and nfs-ganesha.
This document describes how to migrate the MDS daemon in case Manila (deployed
with either a cephfs-native or ceph-nfs backend) is part of the overcloud
deployment.

== Requirements

For this procedure, we assume that we are beginning with a OpenStack based on
Wallaby and a Ceph Reef deployment managed by TripleO.
= Migrating {Ceph} MDS to an external cluster
//kgilliga: Can you please verify the accuracy of the title?

In the context of data plane adoption, where the {rhos_prev_long} ({OpenStackShort}) services are
redeployed in {OpenShift}, a {OpenStackPreviousInstaller}-deployed {CephCluster} cluster will undergo a migration in a process we are calling “externalizing” the {CephCluster} cluster.
There are two deployment topologies, broadly, that include an “internal” {CephCluster} cluster today: one is where {OpenStackShort} includes dedicated {CephCluster} nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes double up as {CephCluster} nodes. In either scenario, there are some {Ceph} processes that are deployed on {OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha.
This document describes how to migrate the MDS daemon in case {rhos_component_storage_file_first_ref} (deployed with either a cephfs-native or ceph-nfs backend) is part of the overcloud deployment.

For this procedure, we assume that we are beginning with a {OpenStackShort} based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by {OpenStackPreviousInstaller}.
We assume that:

* Ceph has been upgraded to Reef and is managed by cephadm/orchestrator
* Both the Ceph public and cluster networks are propagated, via TripleO, to the
target nodes
* {Ceph} is upgraded to {Ceph} {CephRelease} and is managed by cephadm/orchestrator.
* Both the {Ceph} public and cluster networks are propagated, through{OpenStackPreviousInstaller}, to the target nodes.

== Gather the current status of the MDS daemon
.Prerequisites

Before starting the MDS migration, verify the Ceph cluster is healthy and gather
some information of the MDS status.
* Verify that the {CephCluster} cluster is healthy and check the MDS status:

[source,bash]
----
[ceph: root@controller-0 /]# ceph fs ls
name: cephfs, metadata pool: manila_metadata, data pools: [manila_data ]
Expand All @@ -56,10 +40,8 @@ mds.controller-1.cwzhog
MDS version: ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)
----

Eventually, using the `ceph fs dump` command, we can retrieve more detailed
information of the cephfs MDS status:
* Retrieve more detailed information on the Ceph File System (CephFS) MDS status:

[source,bash]
----
[ceph: root@controller-0 /]# ceph fs dump
Expand Down Expand Up @@ -104,20 +86,8 @@ Standby daemons:
dumped fsmap epoch 8
----

== Check the OSD blocklist
* Check the OSD blocklist and clean up the client list:

When a file system client is unresponsive or misbehaving, it may happen that
the access to the file system is forcibly terminated. This process is called
eviction. Evicting a CephFS client prevents it from communicating further with
MDS daemons and OSD daemons.
Ordinarily, a blocklisted client may not reconnect to the servers: it must be
unmounted and then remounted. However, in some situations it may be useful to
permit a client that was evicted to attempt to reconnect. Because CephFS
uses the RADOS OSD blocklist to control client eviction, CephFS clients can be
permitted to reconnect by removing them from the blocklist.
Check the current OSD blocklist and clean up the client list:

[source,bash]
----
[ceph: root@controller-0 /]# ceph osd blocklist ls
..
Expand All @@ -127,17 +97,17 @@ for item in $(ceph osd blocklist ls | awk '{print $0}'); do
done
----

== Migrate MDS to the target nodes
[NOTE]
When a file system client is unresponsive or misbehaving, it may happen that
the access to the file system is forcibly terminated. This process is called
eviction. Evicting a CephFS client prevents it from communicating further with MDS daemons and OSD daemons.
Ordinarily, a blocklisted client may not reconnect to the servers: it must be unmounted and then remounted. However, in some situations it may be useful to permit a client that was evicted to attempt to reconnect. Because CephFS uses the RADOS OSD blocklist to control client eviction, CephFS clients can be permitted to reconnect by removing them from the blocklist.

The MDS migration is performed by cephadm, and as done for the other daemons,
the general idea is to move the daemons placement from a "hosts" based approach
to a "label" based one. This ensures that the human operator can easily visualize
the status of the cluster and where daemons are placed using the `ceph orch host`
command, and have a general view of how the daemons are co-located within a
given host, according to the https://access.redhat.com/articles/1548993[cardinality matrix]
described in the associated article.
.Procedure

[source,bash]
. The MDS migration is performed by cephadm, and as done for the other daemons, the general idea is to move the daemons placement from a "hosts" based approach to a "label" based one. This ensures that the human operator can easily visualize the status of the cluster and where daemons are placed using the `ceph orch host` command, and have a general view of how the daemons are co-located within a given host, according to the https://access.redhat.com/articles/1548993[cardinality matrix] described in the associated article.
//kgilliga: Is this a step? Can we rewrite it to something like: "Check the status of the Ceph cluster and export the MDS metadata"?
+
----
[ceph: root@controller-0 /]# ceph orch host ls
HOST ADDR LABELS STATUS
Expand All @@ -160,18 +130,16 @@ placement:
- controller-2.redhat.local
----

Extend the MDS labels to the target nodes:

[source,bash]
. Extend the MDS labels to the target nodes:
+
----
for item in $(sudo cephadm shell -- ceph orch host ls --format json | jq -r '.[].hostname'); do
sudo cephadm shell -- ceph orch host label add $item mds;
done
----

Verify all the hosts have the MDS label:

[source,bash]
. Verify all the hosts have the MDS label:
+
----
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls
Expand All @@ -184,17 +152,15 @@ controller-1.redhat.local 192.168.24.53 mon _admin mgr mds
controller-2.redhat.local 192.168.24.10 mon _admin mgr mds
----

Dump the current MDS spec:

[source,bash]
. Dump the current MDS spec:
+
----
[ceph: root@controller-0 /]# ceph orch ls --export mds > mds.yaml
----

Edit the retrieved spec and replace the `placement.hosts` section with
. Edit the retrieved spec and replace the `placement.hosts` section with
`placement.label`:

[source,bash]
+
----
service_type: mds
service_id: mds
Expand All @@ -203,18 +169,16 @@ placement:
label: mds
----

Use the `ceph orchestrator` to apply the new MDS spec: it results in an
. Use the `ceph orchestrator` to apply the new MDS spec: it results in an
increased number of mds daemons:

[source,bash]
+
----
$ sudo cephadm shell -m mds.yaml -- ceph orch apply -i /mnt/mds.yaml
Scheduling new mds deployment …
----

Check the new standby daemons temporarily added to the cephfs fs:

[source,bash]
. Check the new standby daemons temporarily added to the cephfs fs:
+
----
$ ceph fs dump
Expand All @@ -233,39 +197,31 @@ Standby daemons:
[mds.mds.controller-1.tyiziq{-1:499136} state up:standby seq 1 addr [v2:172.17.3.43:6800/3615018301,v1:172.17.3.43:6801/3615018301] compat {c=[1],r=[1],i=[7ff]}]
----

It is possible to elect as "active" a dedicated MDS for a particular file system.
To configure this preference, `CephFS` provides a configuration option for MDS
called `mds_join_fs` which enforces this affinity.
When failing over MDS daemons, a cluster’s monitors will prefer standby daemons
with `mds_join_fs` equal to the file system name with the failed rank. If no
standby exists with `mds_join_fs` equal to the file system name, it will choose
an unqualified standby as a replacement.
To properly drive the migration to the right nodes, set the MDS affinity that
manages the MDS failover:

[source,bash]
. To migrate MDS to the right nodes, set the MDS affinity that manages the MDS failover:
//It is possible to elect as "active" a dedicated MDS for a particular file system. To configure this preference, `CephFS` provides a configuration option for MDS called `mds_join_fs` which enforces this affinity.
//When failing over MDS daemons, a cluster’s monitors will prefer standby daemons with `mds_join_fs` equal to the file system name with the failed rank. If no standby exists with `mds_join_fs` equal to the file system name, it will choose an unqualified standby as a replacement.
//kgilliga: I'm commenting out this text for now because it is too much text for a step (downstream). We could place this text after the code block for more context, but we might want to discuss what info is really necessary downstream.
+
----
ceph config set mds.mds.cephstorage-0.fqcshx mds_join_fs cephfs
----

Remove the labels from controller nodes and force the MDS failover to the
. Remove the labels from Controller nodes and force the MDS failover to the
target node:

[source,bash]
+
----
$ for i in 0 1 2; do ceph orch host label rm "controller-$i.redhat.local" mds; done
Removed label mds from host controller-0.redhat.local
Removed label mds from host controller-1.redhat.local
Removed label mds from host controller-2.redhat.local
----

+
The switch happens behind the scenes, and the new active MDS is the one that
has been set through the `mds_join_fs` command.
Check the result of the failover and the new deployed daemons:

you set through the `mds_join_fs` command.

[source,bash]
. Check the result of the failover and the new deployed daemons:
+
----
$ ceph fs dump
Expand Down Expand Up @@ -295,8 +251,3 @@ mds.mds.cephstorage-1.jkvomp cephstorage-1.redhat.local run
mds.mds.cephstorage-2.gnfhfe cephstorage-2.redhat.local running (79m) 3m ago 79m 24.2M - 17.2.6-100.el9cp 1af7b794f353 f3cb859e2a15
----


== Useful resources

* https://docs.ceph.com/en/reef/cephfs/eviction[cephfs - eviction]
* https://docs.ceph.com/en/reef/cephfs/standby/#configuring-mds-file-system-affinity[ceph mds - affinity]

0 comments on commit 4ea461d

Please sign in to comment.