Software RAID install on previous used mdadm disks #107

olivierlambert · 2018-12-05T11:04:17Z

IIRC, we are already using mdadm --zero-superblock /dev/sdX to clean all the selected disks from previous mdadm superblocks.

However, if mdadm was used on partitions level (eg sda2), our command won't clean it, and the install will fail.

Ideally, we should loop on every partition and remove the superblock. Maybe there is a better way (super block detection?) to find and remove only where a superblock was stored before.

The text was updated successfully, but these errors were encountered:

randadinata · 2018-12-06T02:51:29Z

If we already have user consent for destroying data, can't we just nuke the first and last 2MiB with dd followed by partprobe ? 🤣 everything in between doesn't matter anymore

olivierlambert · 2018-12-06T07:38:20Z

That's an option (but same idea: on each partition). So it won't simplify a lot the equation (need to loop on each partition).

oallart · 2018-12-11T06:44:10Z

We have a similar approach, a script that nukes the md's and other bits.
It can be passed as <script stage="installation-start" type="url"> from a remote server on an unattended install

The script runs

mdadm --zero-superblock
wipefs --all
sgdisk -Z

on all drives and/or partitions

olivierlambert · 2018-12-11T08:06:51Z

Can you describe with more details exactly each step? (and in which order?) So we can maybe do that instead just the zero on the whole disk (and miss partitions).

oallart · 2018-12-11T23:40:55Z

Yes, am working on refining that right now, it's not quite world ready yet. Works well via pxe on a rescue boot, but not quite there as an integrated step with the xs answerfile method.

Basically it does

identify, activate and destroy LVM
identify, activate and destroy md's
wipefs (erase filesystem, raid or partition-table signatures)
zap the GPT and MBR data structures
dd zero some parts just in case

Some of these are probably redundant but work well. We use the script to zero drives for reinstall much faster than dban or a full dd zero can do.
I'll get something a bit cleaner and will share.

oallart · 2018-12-12T05:34:52Z

Ok so here's something I tested a bit and does work when supplied from an answer file as <script stage="installation-start" type="url">

Still a bit crude but works well.
Output is redirected to /tmp/prescript.log
I also have in there a bit to prevent the package installation delay caused by the md resync.

#!/bin/sh
# O. Allart - 2018/12
# to be executed at the very first stage of install of a fresh xenserver
# - dbanlite style wipe
# - disable md resync
{
# identify partitions, md devices
# map partitions to md devices 
echo "md devices found:"
cat /proc/mdstat | grep ^md  
if [[ $? -ne 0 ]]
then 
	echo "No software RAID md device found in /proc/mdstat, no MD to destroy"
else
	
	for DEVICE in $(cat /proc/mdstat | sed -n 's/\(md[0-9]\+\).*\(sd[a-f][1-9]\?\).*\(sd[a-f][1-9]\?\).*/\1:\2:\3/p'); do
		# Extract md device and associated devices
		MD=$(echo $DEVICE | cut -d: -f1)
		DEV1=$(echo $DEVICE | cut -d: -f2)
		DEV2=$(echo $DEVICE | cut -d: -f3)
	
		# test these are valid
		mdadm --detail /dev/$MD | head -5
		if [[ $? -ne 0 ]]; then
			echo "Reported device /dev/$MD invalid"
			exit 6
		fi
	
		mdadm -E /dev/$DEV1 | head -10
		if [[ $? -ne 0 ]]; then
			echo "Reported partion /dev/$DEV1 invalid"
			exit 7
		fi
		mdadm -E /dev/$DEV2 | head -10
		if [[ $? -ne 0 ]]; then
			echo "Reported partion /dev/$DEV2 invalid"
			exit 7
		fi
	
		echo "Stopping device"
		mdadm --stop /dev/$MD
		if [[ $? -ne 0 ]]; then echo "Device $MD could not be stopped" && exit 8; fi
	
		echo "Zeroing superblock on /dev/$DEV1"
		mdadm --zero-superblock /dev/$DEV1
		if [[ $? -ne 0 ]]; then
			mdadm --zero-superblock /dev/$DEV1
			if [[ $? -ne 0 ]]; then echo "CRITICAL: Partion /dev/$DEV1 could not be zero'd - Drive is NOT ready for reuse" && exit 9; fi
		fi

		echo "Zeroing superblock on /dev/$DEV2"
		mdadm --zero-superblock /dev/$DEV2
		if [[ $? -ne 0 ]]; then
			mdadm --zero-superblock /dev/$DEV2
			if [[ $? -ne 0 ]]; then echo "CRITICAL: Partion /dev/$DEV2 could not be zero'd - Drive is NOT ready for reuse" && exit 9; fi
		fi
		echo "-------------------------------------------------------------"
	
	done
fi

# Finishing touch: wipe FS signatures, zap partition tables.
for DRIVE in $(cat /proc/partitions | grep -o "sd[a-z]$")
do
        echo Finishing $DRIVE
        wipefs --all /dev/$DRIVE
        sgdisk -Z /dev/$DRIVE
done
 
# delays resync to speed up install in raid1 md configs
echo 0 > /proc/sys/dev/raid/speed_limit_max
echo 0 > /proc/sys/dev/raid/speed_limit_min
} > /tmp/prescript.log 2>&1

olivierlambert · 2018-12-12T11:34:27Z

Pinging this info to @nraynaud who did the software RAID stuff, for potential inclusion directly in the installer 👍

gdelafond · 2018-12-12T12:58:15Z

@olivierlambert @nraynaud if you include it in the installer, beware that disk's name will not always match sd[a-z]$.
As far as I know, Linux disk name scheme is the following:

SATA/SAS: sd[a-z]+$
inside a VM (to make tests) usually: xvd[a-z]+$
NVMe: nvme[0-9]+$.

Some rules are defined in /lib/udev/rules.d/60-persistent-storage.rules

Maybe I should not take information from /proc/partitions but from something like: lsblk | awk '$6 == "disk" {print $1}' ?

gdelafond · 2018-12-12T13:10:57Z

Instead of erasing all available drives, maybe the installer should ask for the disk have to be erased. Or only erase disk that have been chosen for the XCP installation.

olivierlambert · 2018-12-12T16:47:31Z

Yes, this is already what we do (the select disk only are magic block zeroed). But we lack the fact of doing that on all partitions

oallart · 2018-12-12T23:39:12Z

Good points.
As said earlier, it is a bit crude and more specific to our use. But glad to see the ball rolling and hoping for the feature to be included someday. It's nice that xcp-ng has the tools available to perform the various tasks (sgdisk, wipefs etc.).
We work a lot with answer files (see my posts on upgrading too) so we can build the logic around drives in there. Until the feature is built in, there is an avenue for people to use the feature externally. Those script stage entries are incredibly useful.

olivierlambert · 2018-12-13T03:46:56Z

@oallart feel free to create a dedicated entry in the Wiki with a "how to", this could be useful for all XCP-ng users 👍

oallart · 2018-12-14T00:44:20Z

@olivierlambert yep I have already started and taken over some sections 😄

gdelafond · 2018-12-18T16:47:37Z

5. dd zero some parts just in case

You can wipe all fs information with:

DISK=sda
LBAS=$(cat /sys/block/$DISK/size)
dd if=/dev/zero of=/dev/$DISK bs=512 count=1024
dd if=/dev/zero of=/dev/$DISK bs=512 seek=$(($LBAS-1024)) count=1024

nraynaud · 2018-12-20T17:05:29Z

Hi all, I am working on the issue. The UI side of things is a bit complicated.

I worked with the installer yesterday.

Here is what I have:

some RAID array devices (/dev/md127) could be hidden in the UI because they expose less than 46GB. But their underlying members could represent more than that and be recycled in a new configuration for XCP-ng.
if a RAID array exists but is hidden, modifying it will simply not happen, there is a guard in the code, but there is no user feedback.

I am thinking of various UI solutions:

add a screen between "EULA" and "Select Primary disk" that would show everything (disks, partitions, RAIDs, and maybe LVM) and allow for some destructive actions on those (delete RAIDs, partitions, boot bits, FS markers, RAID member markers). Then the workflow would continue to the "Select Primary disk" screen.
Or somehow show what has been filtered out in the Select Primary disk screen partition and allow interaction with it (I am still unclear on this)

As for the partitions (eg. /dev/sda2), should we keep the partitions as they exist or destroy them and use full disks all the time?

olivierlambert · 2018-12-21T09:33:55Z

IMHO, when the user select its disks, it should destroy everything on it, without any other possibilities. XCP-ng is a kind of "Xen Appliance", not a "normal" Linux distro. Partitioning is done by XCP-ng, not the user.

stormi · 2019-01-09T11:52:26Z

For those willing to test the pull request or even help developing the feature, here's a guide that explains how to build a modified ISO image with a modified installer:
https://github.com/xcp-ng/xcp/wiki/Modifying-the-installer

klou · 2019-01-21T16:19:42Z

I'm not in a position to try this, but we upgraded from a XS-7.0 (RAID 1 on individual partitions) to XCP 7.5 (RAID 1 on individual disks) a few months ago, and I'm trying to figure out why my IO sucks.

Anyways, the below is from dmesg, in case it helps as an additional example of stuff left over on a 7.5 conversion.

[    3.011900] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.011903] GPT:1465148799 != 1465149167
[    3.011905] GPT:Alternate GPT header not at the end of the disk.
[    3.011906] GPT:1465148799 != 1465149167
[    3.011907] GPT: Use GNU Parted to correct GPT errors.
[    3.011921]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6
[    3.012711] sd 2:0:0:0: [sdb] Attached SCSI disk
[    3.014863] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.014865] GPT:1465148799 != 1465149167
[    3.014867] GPT:Alternate GPT header not at the end of the disk.
[    3.014868] GPT:1465148799 != 1465149167
[    3.014870] GPT: Use GNU Parted to correct GPT errors.
[    3.014882]  sda: sda1 sda2 sda3 sda4 sda5 sda6
[    3.015562] sd 0:0:0:0: [sda] Attached SCSI disk

ydirson · 2022-12-08T10:19:43Z

Let's describe the problem differently: we're installing an appliance, not a general-purpose OS... so we should not care at all about whatever RAID/LVM setup had been on the disks we're anyway going to overwrite. The problem is, when booting the ISO, some udev rules react to the presence of software-RAID signatures in some disks/partitions and assemble them... which is what we don't want. And in fact, that udev rules file from CentOS (/lib/udev/rules.d/65-md-incremental.rules) already has a special-case to neutralize it when running the Anaconda installer.

So we're left with a few actions to take:

inform udev that an installer is running, so it won't auto-assemble RAID arrays
clear the partition table in the disks selected for assembling a new RAID for good measure
add special support to detect a previous installation of XCP-ng on RAID, since this is the one case where we may want to activate a preexisting RAID ("may", because we still don't want to activate it if we're going to overwrite the disks with a new install)

ydirson · 2022-12-13T10:13:49Z

A test image is now available here. Please let us know if it works for you!
It is based on the 8.3-alpha2 install image, with installer changes detailed here.

stormi assigned nraynaud Dec 21, 2018

stormi added bug 🐛 and removed bug 🐛 labels Dec 21, 2018

nraynaud mentioned this issue Jan 3, 2019

[WiP] wipe existing RAID and disks before creating a new one. xcp-ng/host-installer_old#5

Open

oallart mentioned this issue Mar 13, 2019

Software raid install fails #155

Closed

msgerbs mentioned this issue Jun 27, 2019

Software RAID install fails (7.6.) - many existing errors in disk layout #208

Closed

stormi added the installer label Nov 30, 2020

stormi mentioned this issue Mar 23, 2022

Failed installation on top of previous soft RAID Linux install #543

Open

ydirson assigned ydirson and unassigned nraynaud Dec 13, 2022

stormi mentioned this issue Jan 9, 2023

SW RAID install fails if /dev/md127 already exists #588

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software RAID install on previous used mdadm disks #107

Software RAID install on previous used mdadm disks #107

olivierlambert commented Dec 5, 2018

randadinata commented Dec 6, 2018

olivierlambert commented Dec 6, 2018

oallart commented Dec 11, 2018 •

edited

Loading

olivierlambert commented Dec 11, 2018

oallart commented Dec 11, 2018 •

edited

Loading

oallart commented Dec 12, 2018 •

edited

Loading

olivierlambert commented Dec 12, 2018

gdelafond commented Dec 12, 2018 •

edited

Loading

gdelafond commented Dec 12, 2018

olivierlambert commented Dec 12, 2018

oallart commented Dec 12, 2018

olivierlambert commented Dec 13, 2018

oallart commented Dec 14, 2018

gdelafond commented Dec 18, 2018

nraynaud commented Dec 20, 2018 •

edited

Loading

olivierlambert commented Dec 21, 2018

stormi commented Jan 9, 2019

klou commented Jan 21, 2019

ydirson commented Dec 8, 2022

ydirson commented Dec 13, 2022 •

edited

Loading

Software RAID install on previous used mdadm disks #107

Software RAID install on previous used mdadm disks #107

Comments

olivierlambert commented Dec 5, 2018

randadinata commented Dec 6, 2018

olivierlambert commented Dec 6, 2018

oallart commented Dec 11, 2018 • edited Loading

olivierlambert commented Dec 11, 2018

oallart commented Dec 11, 2018 • edited Loading

oallart commented Dec 12, 2018 • edited Loading

olivierlambert commented Dec 12, 2018

gdelafond commented Dec 12, 2018 • edited Loading

gdelafond commented Dec 12, 2018

olivierlambert commented Dec 12, 2018

oallart commented Dec 12, 2018

olivierlambert commented Dec 13, 2018

oallart commented Dec 14, 2018

gdelafond commented Dec 18, 2018

nraynaud commented Dec 20, 2018 • edited Loading

olivierlambert commented Dec 21, 2018

stormi commented Jan 9, 2019

klou commented Jan 21, 2019

ydirson commented Dec 8, 2022

ydirson commented Dec 13, 2022 • edited Loading

oallart commented Dec 11, 2018 •

edited

Loading

oallart commented Dec 11, 2018 •

edited

Loading

oallart commented Dec 12, 2018 •

edited

Loading

gdelafond commented Dec 12, 2018 •

edited

Loading

nraynaud commented Dec 20, 2018 •

edited

Loading

ydirson commented Dec 13, 2022 •

edited

Loading