resilver が終わらない zpool が ddrescue で助かった

下のような RAID1 相当のミラー。

# camcontrol devlist
<TOSHIBA MD05ACA800 GX0B>          at scbus0 target 0 lun 0 (pass0,ada0)
<ST8000DM004-2CX188 0001>          at scbus1 target 0 lun 0 (pass1,ada1)

# zpool status
  pool: zpool1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Sep 21 13:23:42 2019
        447G scanned at 0/s, 41.9M issued at 0/s, 6.94T total
        26.4M resilvered, 0.00% done, no estimated completion time
config:

        NAME                            STATE     READ WRITE CKSUM
        zpool1                          ONLINE       0     0     0
          mirror-0                      ONLINE       0     0     0
            diskid/DISK-A               ONLINE       0     0     0
            diskid/DISK-B               ONLINE       0     0    11

片肺になったので DISK-B のHDDを新調して換装して zpool online zpool1 diskid/DISK-B した。

resilver 前は smartctl -a /dev/ada0(DISK-A) したのだけれど、異常は無かった。
その後 resilver 中に何度もリブートしてしまう。
今一度 smartctl してみると DISK-A のほうにもエラーが記録されていました。。。
定期的な scrub はしていませんでした。

これではいつまでたっても復旧できないので ddrescue してみることに。

$ zfs export zpool1
$ time ddrescue --force /dev/ada0 /dev/ada1
GNU ddrescue 1.24
Press Ctrl-C to interrupt
    ipos:    5453 GB, non-trimmed:    65536 B,  current rate:    141 MB/s
    opos:    5453 GB, non-scraped:        0 B,  average rate:    169 MB/s
non-tried:    2547 GB,  bad-sector:        0 B,    error rate:       0 B/s
  rescued:    5453 GB,   bad areas:        0,        run time:  8h 57m 49s
pct rescued:   68.15%, read errors:        1,  remaining time:      5h 15m
                              time since last successful read:          0s
Copying non-tried blocks... Pass 1 (forwards)
    ipos:    5284 GB, non-trimmed:        0 B,  current rate:       0 B/s
    opos:    5284 GB, non-scraped:        0 B,  average rate:    145 MB/s
non-tried:        0 B,  bad-sector:     4096 B,    error rate:      32 B/s
  rescued:    8001 GB,   bad areas:        1,        run time: 15h 18m 32s
pct rescued:   99.99%, read errors:        9,  remaining time:         n/a
                              time since last successful read:      1m 37s
Finished
ddrescue --force /dev/ada0 /dev/ada1  140.37s user 811.88s system 1% cpu 15:18:33.03 total
$ reboot # 念のため /dev/ada0 を外す
$ zpool import
  pool: zpool1
    id: 111111111111111111
  state: DEGRADED
status: One or more devices were being resilvered.
action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
config:

        zpool1                      DEGRADED
          mirror-0                  DEGRADED
            diskid/DISK-B           ONLINE
            12312312312312312312    OFFLINE
$ zpool import zpool1

僅かに read errors は出てますが、無事に新しいHDDに移せました。
まだ片肺ではありますが。

いつも思うのですがシーケンシャルリード偉大です。

p.s. 2年前の同時期に購入した MD05ACA800 6台中3台が同時期に壊れました。。。
2台は上、もう1台は全く別環境なのですが。。。

追記 2019-10-08T16:41:39+09
ST8000DM004 はヘビーロード中の片肺から、mirror(RAID1相当)に戻すのに、何度も失敗しました。
数日間連続の負荷に耐えられないようです。。。複数台確認。
WD Red と Gold は問題なく動作しました。

resilver が終わらない zpool が ddrescue で助かった

コメントする

Last.fm

月別