It's RAID time. I've started the server and I'm just having a look around before... - Random

thelastpsion, 2 months ago

It's RAID time. I've started the server and I'm just having a look around before I start poking around with mdadm.

The RAID10 has 4x 3TB drives. The first two (sda, sdb) are WD Red CMR, The second two (sdc, sdd) are ancient WD Greens, that won't re-add to the array. The plan was to replace the Greens with more Reds.

SMART tests on all drives: No reallocated sectors, no sectors pending reallocation. The Greens aren't dead. This surprised me.

However... sdb has 113 raw read errors. Hmm...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

thelastpsion, 2 months ago

Sidenote: When interviewing people for IT jobs, I have a #RAID question.

In a 4-drive array, what is the most reliable solution: RAID6 or RAID 10?

The answer is RAID6, as you can lose any two drives. With RAID10, you can only lose two drives that aren't in the same mirror. (Yes, I know there's risks with vibration. That's why no RAID5.)

Once you get to 6 drives, the odds change, and RAID10 is statistically better.

Of course, for my own 4-drive array, I didn't follow my own advice.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 2 months ago

Having said that, this isn't necessarily looking like a hardware fault, at least not in the way I was expecting. The WD Greens, in spite of how old they are (and their reputation for unreliability), could still be usable as spares.

It does mean that I might have to buy another drive, though. I don't like those read errors, in spite of the fact that SMART doesn't seem to think that it's an issue yet.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 2 months ago

So, running mdadm --assemble with all the drives, this is what I get.

When I first thought there was an issue last week, I tried re-adding sdc to the array. When it failed, I shut the server down. It now says it's a spare, which is worrying.

sdd won't re-add because it's too old (in dmesg: "kicking non-fresh sdd1 from array!").

Bear in mind that I don't need any recent writes, so I don't really care about corruption on any data from the past, say, 3 months.

Any ideas, people?

#mdadm

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 2 months ago

Just done an mdadm --examine on the drives.

sda1, sdb1 and sdc1 all have Last Update times of Thu Mar 28 15:59. sdd1's Last Update time is Wed Mar 27 05:32. The only writes during that 34-hour gap would have been from Syncthing, so I have all that elsewhere.

However, sdc1 has a "bad blocks present" error.

This looks like a classic case of a drive dropping off an array and exposing corruption elsewhere.

Can I re-add sdd1 to the array read-only, without completely breaking it?

#mdadm

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

thelastpsion, 2 months ago

PRAISE INSERT DEITY! IT LIVES!

Overlays have saved my bacon and resurrected the mdraid array! At least I think so... I haven't tried copying files off yet.

Now that I know that all is not lost, I can back up anything I'm missing to multiple locations. I've got a stash of hard drives upstairs in various sizes, so I'll do health checks on them all and put them to use.

This is the article that worked for me, in case anyone ever needs it.

https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Overlay_manipulation_functions

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ FrankauLux

Add comment