记录一次 ZFS 由于设备名称变更导致故障的修复

发现一台 ubuntu 23.10 的 ZFS 卷降级了

# zpool status stor
  pool: stor
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 19:13:23 with 0 errors on Sun Jun  9 19:37:31 2024
config:

        NAME                        STATE     READ WRITE CKSUM
        stor                        DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            scsi-35000cca260cd1f18  ONLINE       0     0     0
            scsi-35000cca260cc3084  ONLINE       0     0     0
            scsi-35000cca260cc6be0  ONLINE       0     0     0
            13184766210832087855    FAULTED      0     0     0  was /dev/sdf1
            8984617841033776882     FAULTED      0     0     0  was /dev/sdg1
            wwn-0x5000cca2604ac3e0  ONLINE       0     0     0

errors: No known data errors

# zpool status -L stor
  pool: stor
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 19:13:23 with 0 errors on Sun Jun  9 19:37:31 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        stor                      DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sdc                   ONLINE       0     0     0
            sdd                   ONLINE       0     0     0
            sde                   ONLINE       0     0     0
            13184766210832087855  FAULTED      0     0     0  was /dev/sdf1
            8984617841033776882   FAULTED      0     0     0  was /dev/sdg1
            sdg                   ONLINE       0     0     0

errors: No known data errors

我的天, raidz2 掉了2个盘!

经检查, 硬盘并没有问题, 只是重启后设备名变了(从上面 sdg 在列里, 但是却说曾经的 sdg1 出错了可以看出), 这样 ZFS 就出问题了??!! 我在 FreeBSD 下重来没碰到过.

试着用 zpool replace, 报错

# zpool replace stor 13184766210832087855 /dev/sdf
invalid vdev specification
use '-f' to override the following errors:
/dev/sdf1 is part of active pool 'stor'

# zpool replace -f stor 13184766210832087855 /dev/sdf
invalid vdev specification
the following errors must be manually repaired:
/dev/sdf1 is part of active pool 'stor'

尝试zpool labelclear, 报错. wipefs 掉再 replace, 还是报错

# zpool labelclear /dev/sdf                       
failed to clear label for /dev/sdf

# wipefs -a /dev/sdf
/dev/sdf: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sdf: 8 bytes were erased at offset 0x74702555e00 (gpt): 45 46 49 20 50 41 52 54
/dev/sdf: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sdf: calling ioctl to re-read partition table: Success

# zpool replace stor 13184766210832087855 /dev/sdf
cannot replace 13184766210832087855 with /dev/sdf: /dev/sdf is busy, or device removal is in progress

艹! 已经不能描述我当前的心情了.

最后查了下网上的一些案例, 先 export, 然后用 import -d 的方式. 但是如果只用 -d /dev/disk/by-id/ 是不行的, 直接用多次 -d 来解决.

将有问题的设备离线, 导出池后, 用 -d /dev/disk/by-id/ 加上多个 -d 设备名导入池. 然后将有问题的设备重新上线.

# zpool offline stor 13184766210832087855
# zpool offline stor 8984617841033776882 
# zpool export stor
# zpool import -d /dev/disk/by-id/ -d /dev/sdc -d /dev/sdd -d /dev/sde -d /dev/sdg -d /dev/sdf -d /dev/sdi stor
# zpool online stor 13184766210832087855
# zpool online stor 8984617841033776882

最后终于池在线了, 并且对2个有问题的设备重建.

# zpool status stor
  pool: stor
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jul  7 01:17:01 2024
        44.7G / 32.3T scanned at 1.09G/s, 0B / 32.3T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                        STATE     READ WRITE CKSUM
        stor                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            scsi-35000cca260cd1f18  ONLINE       0     0     0
            scsi-35000cca260cc3084  ONLINE       0     0     0
            scsi-35000cca260cc6be0  ONLINE       0     0     0
            wwn-0x5000cca260cc504c  ONLINE       0     0     0
            wwn-0x5000cca260cc32d0  ONLINE       0     0     0  (awaiting resilver)
            wwn-0x5000cca2604ac3e0  ONLINE       0     0     0

errors: No known data errors

实际重建速度在后面是很快的, 前面做扫描用了一段很长的时间.