[RAID] Investigating Software RAID Failure Through MDADM
Contents:
System configuration.
1x NVMe 1TB M.2 SSD
1x NVMe 3.84TB M.2 SSD
6x SATA 20TB SSD (RAID5)
Examine drive/partition information.
The current RAID5 configuration is not working at this time. First, list recognizable storage devices to identify those used for the RAID.
rdlab@exxact:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 63.5M 1 loop /snap/core20/1974
loop1 7:1 0 63.5M 1 loop /snap/core20/2015
loop2 7:2 0 40.9M 1 loop /snap/snapd/20290
loop3 7:3 0 40.9M 1 loop /snap/snapd/20092
loop4 7:4 0 67.8M 1 loop /snap/lxd/22753
loop5 7:5 0 91.9M 1 loop /snap/lxd/24061
sda 8:0 0 18.2T 0 disk
└─sda1 8:1 0 18.2T 0 part
sdb 8:16 0 18.2T 0 disk
└─sdb1 8:17 0 18.2T 0 part
sdc 8:32 0 18.2T 0 disk
└─sdc1 8:33 0 18.2T 0 part
sdd 8:48 0 18.2T 0 disk
└─sdd1 8:49 0 18.2T 0 part
sde 8:64 0 18.2T 0 disk
└─sde1 8:65 0 18.2T 0 part
sdf 8:80 0 18.2T 0 disk
└─sdf1 8:81 0 18.2T 0 part
nvme1n1 259:0 0 3.5T 0 disk
└─nvme1n1p1 259:1 0 3.5T 0 part /scratch
nvme0n1 259:2 0 953.9G 0 disk
├─nvme0n1p1 259:3 0 1.1G 0 part /boot/efi
├─nvme0n1p2 259:4 0 1G 0 part /boot
├─nvme0n1p3 259:5 0 10G 0 part [SWAP]
└─nvme0n1p4 259:6 0 941.8G 0 part
This RAID5 is configured on sda, sdb, sdc, sdd, sde, sdf.
Display the OS configured partitions.
rdlab@exxact:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/disk/by-uuid/7639e5c3-3fcb-4b5b-953e-58461c5b9fb9 none swap sw 0 0
# / was on /dev/nvme0n1p4 during curtin installation
/dev/disk/by-uuid/7f5dfbfd-e027-4073-81a6-758484ecc019 / ext4 defaults 0 1
# /scratch was on /dev/nvme1n1p1 during curtin installation
/dev/disk/by-uuid/7ca82a8b-e49f-4af2-9e45-ca84ffcf8d52 /scratch ext4 defaults 0 1
# /data was on /dev/md0p1 during curtin installation
# /dev/disk/by-id/md-uuid-b690237f:da587456:50a2b64e:52c2abc6-part1 /data ext4 defaults 0 1
/dev/disk/by-id/md-uuid-8e1d9efa:a6028ec7:39cd0e4f:c76c036f /data xfs defaults 0 1
# /boot was on /dev/nvme0n1p2 during curtin installation
/dev/disk/by-uuid/6070f6d4-8edf-4032-b0d3-e9709be0326e /boot ext4 defaults 0 1
# /boot/efi was on /dev/nvme0n1p1 during curtin installation
/dev/disk/by-uuid/C232-F548 /boot/efi vfat defaults 0 1
#/swap.img none swap sw 0 0
The RAID5 partition has the following details.
# /data was on /dev/md0p1 during curtin installation
# /dev/disk/by-id/md-uuid-b690237f:da587456:50a2b64e:52c2abc6-part1 /data ext4 defaults 0 1
/dev/disk/by-id/md-uuid-8e1d9efa:a6028ec7:39cd0e4f:c76c036f /data xfs defaults 0 1
The mounted partition is “/data” with “xfs” file system. The UUID value is an important reference to identify the drive.
View the “md127” RAID status.
rdlab@exxact:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdc1[2](S) sde1[4](S) sdf1[6](S) sdd1[3](S) sdb1[1](S) sda1[0](S)
117190146048 blocks super 1.2
unused devices: <none>
Status shows the RAID is “inactive”.
View additional details of “md127”.
rdlab@exxact:/home/exx# mdadm --query /dev/md127
/dev/md127: (null) 0 devices, 6 spares. Use mdadm --detail for more detail.
root@jupiter:/home/exx# mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Raid Level : raid0
Total Devices : 6
Persistence : Superblock is persistent
State : inactive
Working Devices : 6
Name : jupiter:0 (local to host jupiter)
UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Events : 93758
Number Major Minor RaidDevice
- 8 1 - /dev/sda1
- 8 81 - /dev/sdf1
- 8 65 - /dev/sde1
- 8 49 - /dev/sdd1
- 8 33 - /dev/sdc1
- 8 17 - /dev/sdb1
Why is the RAID Level: RAID0 when it should be RAID5?
It’s possible that the RAID failed and there was a recovery attempt. Unfortunately, recovery failed so another attempt was made to rebuild it. The rebuild attempt was most likely configured incorrectly.
Additional information on the UUID can be checked to verify proper OS identification.
rdlab@exxact:/home/exx# mdadm --detail --scan /dev/md127
INACTIVE-ARRAY /dev/md127 metadata=1.2 name=jupiter:0 UUID=03f505ac:d5a96bac:5d4da761:810ad4a6
root@jupiter:/home# cat /etc/mdadm/mdadm.conf
# ARRAY /dev/md0 metadata=1.2 spares=1 name=ubuntu-server:0 UUID=b690237f:da587456:50a2b64e:52c2abc6
MAILADDR root
ARRAY /dev/md/data metadata=1.2 UUID=8e1d9efa:a6028ec7:39cd0e4f:c76c036f name=sn4622111485:data
The “mdadm.conf” file is important to verify there is reference to show a RAID was configured. Originally, the RAID was created when the OS was installed. It can be seen there is a past reference name, “md0”, with a different UUID. This information was entered as a comment line by the installer.
Display additional RAID details to identify RAID failure.
rdlab@exxact:/home# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
root@jupiter:/home# mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md/data
mdadm: /dev/sde1 has wrong uuid.
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sde
mdadm: /dev/sdf1 has wrong uuid.
mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdc1 has wrong uuid.
mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdb1 has wrong uuid.
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdd1 has wrong uuid.
mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdd
mdadm: /dev/sda1 has wrong uuid.
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: No super block found on /dev/nvme0n1p4 (Expected magic a92b4efc, got 00000477)
mdadm: no RAID superblock on /dev/nvme0n1p4
mdadm: No super block found on /dev/nvme0n1p3 (Expected magic a92b4efc, got 0000003f)
mdadm: no RAID superblock on /dev/nvme0n1p3
mdadm: No super block found on /dev/nvme0n1p2 (Expected magic a92b4efc, got 00000081)
mdadm: no RAID superblock on /dev/nvme0n1p2
mdadm: No super block found on /dev/nvme0n1p1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/nvme0n1p1
mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/nvme0n1
mdadm: No super block found on /dev/nvme1n1p1 (Expected magic a92b4efc, got 000005c1)
mdadm: no RAID superblock on /dev/nvme1n1p1
mdadm: No super block found on /dev/nvme1n1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/nvme1n1
mdadm: No super block found on /dev/loop5 (Expected magic a92b4efc, got a6eff301)
mdadm: no RAID superblock on /dev/loop5
mdadm: No super block found on /dev/loop4 (Expected magic a92b4efc, got a6eff301)
mdadm: no RAID superblock on /dev/loop4
mdadm: No super block found on /dev/loop3 (Expected magic a92b4efc, got fa2c5214)
mdadm: no RAID superblock on /dev/loop3
mdadm: No super block found on /dev/loop2 (Expected magic a92b4efc, got 9dbc89cd)
mdadm: no RAID superblock on /dev/loop2
mdadm: No super block found on /dev/loop1 (Expected magic a92b4efc, got 0000000a)
mdadm: no RAID superblock on /dev/loop1
mdadm: No super block found on /dev/loop0 (Expected magic a92b4efc, got 32138a62)
mdadm: no RAID superblock on /dev/loop0
Wrong UUID on the six RAID drives: sda, sdb, sdc, sdd, sde, sdf.
Examine each drive through MDADM. It looks like each drive still shows the RAID level as RAID5.
rdlab@exxact:/home# mdadm --examine /dev/sd[abcdef]1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : active
Device UUID : 1ce2160d:9a4d9d85:acfb481a:360faccc
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:52:42 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : 22583091 - correct
Events : 93758
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : active
Device UUID : 936b1233:4370fef4:a5b00add:ad78d8bf
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:52:42 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : 6da2fb23 - correct
Events : 93758
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : clean
Device UUID : cadf27f8:3814e183:fd4c0a6e:01a1908b
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:54:46 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : 1e50d87a - correct
Events : 93761
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : ..AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : clean
Device UUID : 043ea22c:0b072966:70a48109:b02cfdb2
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:54:46 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : f7f70ca8 - correct
Events : 93761
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : ..AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : clean
Device UUID : 22b3d070:6d3a7111:ceeb75e4:4f70cc8d
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:54:46 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : 9d314027 - correct
Events : 93761
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : ..AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 03f505ac:d5a96bac:5d4da761:810ad4a6
Name : jupiter:0 (local to host jupiter)
Creation Time : Tue Aug 15 12:04:05 2023
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 39063382016 (18626.87 GiB 20000.45 GB)
Array Size : 97658455040 (93134.36 GiB 100002.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264088 sectors, after=0 sectors
State : clean
Device UUID : 304c168f:5f981d99:1777b5dc:67771ac8
Internal Bitmap : 8 sectors from superblock
Update Time : Sun Sep 24 15:54:46 2023
Bad Block Log : 512 entries available at offset 88 sectors
Checksum : 75b0c98b - correct
Events : 93761
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : ..AAAA ('A' == active, '.' == missing, 'R' == replacing)
Displaying UUID values for comparision.
rdlab@exxact:/home# blkid
/dev/nvme1n1p1: UUID="7ca82a8b-e49f-4af2-9e45-ca84ffcf8d52" TYPE="ext4" PARTUUID="1b1fd338-33d6-4409-8f73-c28fe5b64c9d"
/dev/nvme0n1p1: UUID="C232-F548" TYPE="vfat" PARTUUID="bb710908-76b9-4722-bcbc-3326cc0eb476"
/dev/nvme0n1p2: UUID="6070f6d4-8edf-4032-b0d3-e9709be0326e" TYPE="ext4" PARTUUID="8801aa80-aef8-460b-86fd-46e3eeefa2df"
/dev/nvme0n1p3: UUID="7639e5c3-3fcb-4b5b-953e-58461c5b9fb9" TYPE="swap" PARTUUID="0791fca0-00fa-47e6-ab49-fab51c3c4137"
/dev/nvme0n1p4: UUID="7f5dfbfd-e027-4073-81a6-758484ecc019" TYPE="ext4" PARTUUID="eaa3624b-6ab6-4005-b645-8bf4dccdf6ba"
/dev/sda1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="1ce2160d-9a4d-9d85-acfb-481a360faccc" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="4058a593-2465-4406-bd4b-ea7e5600be4f"
/dev/sdd1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="043ea22c-0b07-2966-70a4-8109b02cfdb2" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="948c488a-ce92-4bbb-8fb5-590cf8f37703"
/dev/sdb1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="936b1233-4370-fef4-a5b0-0addad78d8bf" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="af27a600-c4e0-4de7-bd8c-322f32bb056c"
/dev/sdc1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="cadf27f8-3814-e183-fd4c-0a6e01a1908b" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="e45e7f5f-577f-46bf-92f2-f72103f0da5a"
/dev/sdf1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="304c168f-5f98-1d99-1777-b5dc67771ac8" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="69ae0247-dc71-47fa-bcef-855bf7672fdb"
/dev/sde1: UUID="03f505ac-d5a9-6bac-5d4d-a761810ad4a6" UUID_SUB="22b3d070-6d3a-7111-ceeb-75e44f70cc8d" LABEL="jupiter:0" TYPE="linux_raid_member" PARTUUID="5eaebad0-b8a8-4ae1-a6a9-4ffa7988e95b"
/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/loop2: TYPE="squashfs"
/dev/loop3: TYPE="squashfs"
/dev/loop4: TYPE="squashfs"
/dev/loop5: TYPE="squashfs"
Conclusion
The drives are still identified as RAID5, but the configuration is RAID0. At this point of the investigation, it was decided to wipe out the existing /dev/md127 (RAID0) with associated devices and re-create the RAID5 configuration.
REF: ZD-4301 / ZD-6131 / ZD-6713 / ZD-7113