[RAID] Investigating Software RAID Failure Through MDADM
Contents:
System configuration.
1x NVMe 1TB M.2 SSD
1x NVMe 3.84TB M.2 SSD
6x SATA 20TB SSD (RAID5)
Examine drive/partition information.
The current RAID5 configuration is not working at this time. First, list recognizable storage devices to identify those used for the RAID.
rdlab@exxact:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 63.5M 1 loop /snap/core20/1974
loop1 7:1 0 63.5M 1 loop /snap/core20/2015
loop2 7:2 0 40.9M 1 loop /snap/snapd/20290
loop3 7:3 0 40.9M 1 loop /snap/snapd/20092
loop4 7:4 0 67.8M 1 loop /snap/lxd/22753
loop5 7:5 0 91.9M 1 loop /snap/lxd/24061
sda 8:0 0 18.2T 0 disk
└─sda1 8:1 0 18.2T 0 part
sdb 8:16 0 18.2T 0 disk
└─sdb1 8:17 0 18.2T 0 part
sdc 8:32 0 18.2T 0 disk
└─sdc1 8:33 0 18.2T 0 part
sdd 8:48 0 18.2T 0 disk
└─sdd1 8:49 0 18.2T 0 part
sde 8:64 0 18.2T 0 disk
└─sde1 8:65 0 18.2T 0 part
sdf 8:80 0 18.2T 0 disk
└─sdf1 8:81 0 18.2T 0 part
nvme1n1 259:0 0 3.5T 0 disk
└─nvme1n1p1 259:1 0 3.5T 0 part /scratch
nvme0n1 259:2 0 953.9G 0 disk
├─nvme0n1p1 259:3 0 1.1G 0 part /boot/efi
├─nvme0n1p2 259:4 0 1G 0 part /boot
├─nvme0n1p3 259:5 0 10G 0 part [SWAP]
└─nvme0n1p4 259:6 0 941.8G 0 part
This RAID5 is configured on sda, sdb, sdc, sdd, sde, sdf.
Display the OS configured partitions.
rdlab@exxact:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/disk/by-uuid/7639e5c3-3fcb-4b5b-953e-58461c5b9fb9 none swap sw 0 0
# / was on /dev/nvme0n1p4 during curtin installation
/dev/disk/by-uuid/7f5dfbfd-e027-4073-81a6-758484ecc019 / ext4 defaults 0 1
# /scratch was on /dev/nvme1n1p1 during curtin installation
/dev/disk/by-uuid/7ca82a8b-e49f-4af2-9e45-ca84ffcf8d52 /scratch ext4 defaults 0 1
# /data was on /dev/md0p1 during curtin installation
# /dev/disk/by-id/md-uuid-b690237f:da587456:50a2b64e:52c2abc6-part1 /data ext4 defaults 0 1
/dev/disk/by-id/md-uuid-8e1d9efa:a6028ec7:39cd0e4f:c76c036f /data xfs defaults 0 1
# /boot was on /dev/nvme0n1p2 during curtin installation
/dev/disk/by-uuid/6070f6d4-8edf-4032-b0d3-e9709be0326e /boot ext4 defaults 0 1
# /boot/efi was on /dev/nvme0n1p1 during curtin installation
/dev/disk/by-uuid/C232-F548 /boot/efi vfat defaults 0 1
#/swap.img none swap sw 0 0
The RAID5 partition has the following details.
# /data was on /dev/md0p1 during curtin installation
# /dev/disk/by-id/md-uuid-b690237f:da587456:50a2b64e:52c2abc6-part1 /data ext4 defaults 0 1
/dev/disk/by-id/md-uuid-8e1d9efa:a6028ec7:39cd0e4f:c76c036f /data xfs defaults 0 1
The mounted partition is “/data” with “xfs” file system. The UUID value is an important reference to identify the drive.
View the “md127” RAID status.
Status shows the RAID is “inactive”.
View additional details of “md127”.
Why is the RAID Level: RAID0 when it should be RAID5?
It’s possible that the RAID failed and there was a recovery attempt. Unfortunately, recovery failed so another attempt was made to rebuild it. The rebuild attempt was most likely configured incorrectly.
Additional information on the UUID can be checked to verify proper OS identification.
The “mdadm.conf” file is important to verify there is reference to show a RAID was configured. Originally, the RAID was created when the OS was installed. It can be seen there is a past reference name, “md0”, with a different UUID. This information was entered as a comment line by the installer.
Display additional RAID details to identify RAID failure.
Wrong UUID on the six RAID drives: sda, sdb, sdc, sdd, sde, sdf.
Examine each drive through MDADM. It looks like each drive still shows the RAID level as RAID5.
Displaying UUID values for comparision.
Conclusion
The drives are still identified as RAID5, but the configuration is RAID0. At this point of the investigation, it was decided to wipe out the existing /dev/md127 (RAID0) with associated devices and re-create the RAID5 configuration.
REF: ZD-4301 / ZD-6131 / ZD-6713 / ZD-7113