Skip to content

Commit

Permalink
Merge pull request rook#14776 from sp98/osd-migration
Browse files Browse the repository at this point in the history
migrate OSDs to enable encryption as day-2 operation
  • Loading branch information
travisn authored Dec 6, 2024
2 parents 324df99 + 85c8194 commit 476827e
Show file tree
Hide file tree
Showing 23 changed files with 752 additions and 579 deletions.
1 change: 1 addition & 0 deletions Documentation/CRDs/Cluster/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,7 @@ The following storage selection settings are specific to Ceph and do not apply t
* `encryptedDevice`**: Encrypt OSD volumes using dmcrypt ("true" or "false"). By default this option is disabled. See [encryption](http://docs.ceph.com/docs/master/ceph-volume/lvm/encryption/) for more information on encryption in Ceph. (Resizing is not supported for host-based clusters.)
* `crushRoot`: The value of the `root` CRUSH map label. The default is `default`. Generally, you should not need to change this. However, if any of your topology labels may have the value `default`, you need to change `crushRoot` to avoid conflicts, since CRUSH map values need to be unique.
* `enableCrushUpdates`: Enables rook to update the pool crush rule using Pool Spec. Can cause data remapping if crush rule changes, Defaults to false.
* `migration`: Existing PVC based OSDs can be migrated to enable or disable encryption. Refer to the [osd management](../../Storage-Configuration/Advanced/ceph-osd-mgmt.md/#osd-encryption-as-day-2-operation) topic for details.

Allowed configurations are:

Expand Down
85 changes: 85 additions & 0 deletions Documentation/CRDs/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -8049,6 +8049,65 @@ bool
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.Migration">Migration
</h3>
<p>
(<em>Appears on:</em><a href="#ceph.rook.io/v1.StorageScopeSpec">StorageScopeSpec</a>)
</p>
<div>
<p>Migration handles the OSD migration</p>
</div>
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>confirmation</code><br/>
<em>
string
</em>
</td>
<td>
<em>(Optional)</em>
<p>A user confirmation to migrate the OSDs. It destroys each OSD one at a time, cleans up the backing disk
and prepares OSD with same ID on that disk</p>
</td>
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.MigrationStatus">MigrationStatus
</h3>
<p>
(<em>Appears on:</em><a href="#ceph.rook.io/v1.OSDStatus">OSDStatus</a>)
</p>
<div>
<p>MigrationStatus status represents the current status of any OSD migration.</p>
</div>
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>pending</code><br/>
<em>
int
</em>
</td>
<td>
</td>
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.MirrorHealthCheckSpec">MirrorHealthCheckSpec
</h3>
<p>
Expand Down Expand Up @@ -9471,6 +9530,18 @@ map[string]int
<p>StoreType is a mapping between the OSD backend stores and number of OSDs using these stores</p>
</td>
</tr>
<tr>
<td>
<code>migrationStatus</code><br/>
<em>
<a href="#ceph.rook.io/v1.MigrationStatus">
MigrationStatus
</a>
</em>
</td>
<td>
</td>
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.OSDStore">OSDStore
Expand Down Expand Up @@ -12935,6 +13006,20 @@ Selection
</tr>
<tr>
<td>
<code>migration</code><br/>
<em>
<a href="#ceph.rook.io/v1.Migration">
Migration
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Migration handles the OSD migration</p>
</td>
</tr>
<tr>
<td>
<code>store</code><br/>
<em>
<a href="#ceph.rook.io/v1.OSDStore">
Expand Down
26 changes: 26 additions & 0 deletions Documentation/Storage-Configuration/Advanced/ceph-osd-mgmt.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,29 @@ If you don't see a new OSD automatically created, restart the operator (by delet

!!! note
The OSD might have a different ID than the previous OSD that was replaced.


## OSD Migration

Ceph does not support changing certain settings on existing OSDs. To support changing these settings on an OSD, the OSD must be destroyed and re-created with the new settings. Rook will automate this by migrating only one OSD at a time. The operator waits for the data to rebalance (PGs to become `active+clean`) before migrating the next OSD. This ensures that there is no data loss. Refer to the [OSD migration](https://github.com/rook/rook/blob/master/design/ceph/osd-migration.md) design doc for more information.

The following scenarios are supported for OSD migration:

- Enable or disable OSD encryption for existing PVC-based OSDs by changing the `encrypted` setting under the `storageClassDeviceSets`

For example:

```yaml
storage:
migration:
confirmation: "yes-really-migrate-osds"
storageClassDeviceSets:
- name: set1
count: 3
encrypted: true # change to true or false based on whether encryption needs to enable or disabled.
```
Details about the migration status can be found under the cephCluster `status.storage.osd.migrationStatus.pending` field which shows the total number of OSDs that are pending migration.

!!! note
Performance of the cluster might be impacted during data rebalancing while OSDs are being migrated.
1 change: 1 addition & 0 deletions PendingReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@

- Enable mirroring for CephBlockPoolRadosNamespaces (see [#14701](https://github.com/rook/rook/pull/14701)).
- Enable periodic monitoring for CephBlockPoolRadosNamespaces mirroring (see [#14896](https://github.com/rook/rook/pull/14896)).
- Allow migration of PVC based OSDs to enable or disable encryption (see [#14776](https://github.com/rook/rook/pull/14776)).
4 changes: 2 additions & 2 deletions cmd/rook/ceph/osd.go
Original file line number Diff line number Diff line change
Expand Up @@ -261,10 +261,10 @@ func prepareOSD(cmd *cobra.Command, args []string) error {
}

// destroy the OSD using the OSD ID
var replaceOSD *oposd.OSDReplaceInfo
var replaceOSD *oposd.OSDInfo
if replaceOSDID != -1 {
logger.Infof("destroying osd.%d and cleaning its backing device", replaceOSDID)
replaceOSD, err = osddaemon.DestroyOSD(context, &clusterInfo, replaceOSDID, cfg.pvcBacked, cfg.storeConfig.EncryptedDevice)
replaceOSD, err = osddaemon.DestroyOSD(context, &clusterInfo, replaceOSDID, cfg.pvcBacked)
if err != nil {
rook.TerminateFatal(errors.Wrapf(err, "failed to destroy OSD %d.", replaceOSDID))
}
Expand Down
16 changes: 16 additions & 0 deletions deploy/charts/rook-ceph/templates/resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3445,6 +3445,16 @@ spec:
minimum: 0
nullable: true
type: number
migration:
description: Migration handles the OSD migration
properties:
confirmation:
description: |-
A user confirmation to migrate the OSDs. It destroys each OSD one at a time, cleans up the backing disk
and prepares OSD with same ID on that disk
pattern: ^$|^yes-really-migrate-osds$
type: string
type: object
nearFullRatio:
description: NearFullRatio is the ratio at which the cluster is considered nearly full and will raise a ceph health warning. Default is 0.85.
maximum: 1
Expand Down Expand Up @@ -5538,6 +5548,12 @@ spec:
osd:
description: OSDStatus represents OSD status of the ceph Cluster
properties:
migrationStatus:
description: MigrationStatus status represents the current status of any OSD migration.
properties:
pending:
type: integer
type: object
storeType:
additionalProperties:
type: integer
Expand Down
16 changes: 16 additions & 0 deletions deploy/examples/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3443,6 +3443,16 @@ spec:
minimum: 0
nullable: true
type: number
migration:
description: Migration handles the OSD migration
properties:
confirmation:
description: |-
A user confirmation to migrate the OSDs. It destroys each OSD one at a time, cleans up the backing disk
and prepares OSD with same ID on that disk
pattern: ^$|^yes-really-migrate-osds$
type: string
type: object
nearFullRatio:
description: NearFullRatio is the ratio at which the cluster is considered nearly full and will raise a ceph health warning. Default is 0.85.
maximum: 1
Expand Down Expand Up @@ -5536,6 +5546,12 @@ spec:
osd:
description: OSDStatus represents OSD status of the ceph Cluster
properties:
migrationStatus:
description: MigrationStatus status represents the current status of any OSD migration.
properties:
pending:
type: integer
type: object
storeType:
additionalProperties:
type: integer
Expand Down
1 change: 1 addition & 0 deletions design/ceph/osd-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ considered in the future:
and application data on slower media.
- Setups with multiple OSDs per drive, though with recent Ceph releases the
motivation for deploying this way is mostly obviated.
- OSDs where Persistent Volumes are using partitioned disks due to a [ceph issue](https://tracker.ceph.com/issues/68977).

## Proposal
- Since migration requires destroying of the OSD and cleaning data from the disk,
Expand Down
20 changes: 19 additions & 1 deletion pkg/apis/ceph.rook.io/v1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,13 @@ type DeviceClasses struct {
// OSDStatus represents OSD status of the ceph Cluster
type OSDStatus struct {
// StoreType is a mapping between the OSD backend stores and number of OSDs using these stores
StoreType map[string]int `json:"storeType,omitempty"`
StoreType map[string]int `json:"storeType,omitempty"`
MigrationStatus MigrationStatus `json:"migrationStatus,omitempty"`
}

// MigrationStatus status represents the current status of any OSD migration.
type MigrationStatus struct {
Pending int `json:"pending,omitempty"`
}

// ClusterVersion represents the version of a Ceph Cluster
Expand Down Expand Up @@ -3051,6 +3057,9 @@ type StorageScopeSpec struct {
// +nullable
// +optional
StorageClassDeviceSets []StorageClassDeviceSet `json:"storageClassDeviceSets,omitempty"`
// Migration handles the OSD migration
// +optional
Migration Migration `json:"migration,omitempty"`
// +optional
Store OSDStore `json:"store,omitempty"`
// +optional
Expand Down Expand Up @@ -3089,6 +3098,15 @@ type StorageScopeSpec struct {
AllowOsdCrushWeightUpdate bool `json:"allowOsdCrushWeightUpdate,omitempty"`
}

// Migration handles the OSD migration
type Migration struct {
// A user confirmation to migrate the OSDs. It destroys each OSD one at a time, cleans up the backing disk
// and prepares OSD with same ID on that disk
// +optional
// +kubebuilder:validation:Pattern=`^$|^yes-really-migrate-osds$`
Confirmation string `json:"confirmation,omitempty"`
}

// OSDStore is the backend storage type used for creating the OSDs
type OSDStore struct {
// Type of backend storage to be used while creating OSDs. If empty, then bluestore will be used
Expand Down
34 changes: 34 additions & 0 deletions pkg/apis/ceph.rook.io/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions pkg/daemon/ceph/osd/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ type OsdAgent struct {
storeConfig config.StoreConfig
kv *k8sutil.ConfigMapKVStore
pvcBacked bool
replaceOSD *oposd.OSDReplaceInfo
replaceOSD *oposd.OSDInfo
}

// NewAgent is the instantiation of the OSD agent
func NewAgent(context *clusterd.Context, devices []DesiredDevice, metadataDevice string, forceFormat bool,
storeConfig config.StoreConfig, clusterInfo *cephclient.ClusterInfo, nodeName string, kv *k8sutil.ConfigMapKVStore,
replaceOSD *oposd.OSDReplaceInfo, pvcBacked bool) *OsdAgent {
replaceOSD *oposd.OSDInfo, pvcBacked bool) *OsdAgent {

return &OsdAgent{
devices: devices,
Expand All @@ -71,7 +71,7 @@ func getDeviceLVPath(context *clusterd.Context, deviceName string) string {

// GetReplaceOSDId returns the OSD ID based on the device name
func (a *OsdAgent) GetReplaceOSDId(device string) int {
if device == a.replaceOSD.Path {
if device == a.replaceOSD.BlockPath {
return a.replaceOSD.ID
}

Expand Down
Loading

0 comments on commit 476827e

Please sign in to comment.