RAID5 failure: 2 bad HDD's at the same time

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • CapLeaker
    Leaking Member
    • Dec 2014
    • 8065
    • Canada

    #1

    RAID5 failure: 2 bad HDD's at the same time

    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?

    Question is: How do I recover all the files or recover the RAID?
  • Curious.George
    Badcaps Legend
    • Nov 2011
    • 2305
    • Unknown

    #2
    Re: RAID5 failure: 2 bad HDD's at the same time

    Originally posted by CapLeaker
    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?
    This is a known problem with RAID -- esp with larger drives! The time it takes to rebuild the array represents a sizeable window in which a second failure can eat your lunch...

    Of course, the "cost" (window of vulnerability) of rebuilding the failed drive will vary (e.g., RAID5 being more expensive than RAID1).
    Last edited by Per Hansson; 07-10-2019, 09:14 AM. Reason: fixed quote

    Comment

    • eccerr0r
      Solder Sloth
      • Nov 2012
      • 8689
      • USA

      #3
      Re: RAID5 failure: 2 bad HDD's at the same time

      As always RAID is not a backup.
      Question is, how bad are the drives. If you pull them up on their own on a PC (DON'T WRITE TO THEM!) can you at least read a few bytes? SMART information?

      If you have two drives completely dead, you're probably SOL. If just one is dead and one has a few bad sectors, depending on your NAS firmware you may be able to recover something...unfortunately I don't have any experience with WD's RAID, just Linux mdraid...

      Comment

      • ChaosLegionnaire
        HC Overclocker
        • Jul 2012
        • 3264
        • Singapore

        #4
        Re: RAID5 failure: 2 bad HDD's at the same time

        thats why i dont buy nas boxes off the shelf. they are typically composed of a homogeneous set of drives so this means that the drives have a tendency to all fail at the same time! talk about very convenient planned obsolescence there! im sure the companies that make these nas boxes couldnt care less either if it means more buying and more money!

        therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.

        do what eccerror said. for me, i fire up linux, pull the smart data, see how many pending, uncorrectable and reallocated sectors there are and run gnu ddrescue to pull as much data off the bad drive as possible if its still acessible and not bricked in which case the drive is totally unaccessible and undetectable neither by the bios nor os.

        if the drive is bricked and the data is critical, send it to a data recovery company. the fee could cost thousands of dollars for the recovery.

        Comment

        • eccerr0r
          Solder Sloth
          • Nov 2012
          • 8689
          • USA

          #5
          Re: RAID5 failure: 2 bad HDD's at the same time

          I've been able to successfully reassemble my Linux md-RAID5 arrays that were destroyed by two disk failures, but there's no guarantee that the data I pull off is accurate. However I was able to get a good portion of the data off after the failure.

          Which reminds me, I need to backup my array again soon...

          Comment

          • CapLeaker
            Leaking Member
            • Dec 2014
            • 8065
            • Canada

            #6
            Re: RAID5 failure: 2 bad HDD's at the same time

            Well, drive 2 is FUBAR. Won't read from it period. Not sure why Drive 3 has some bad sectors and I was able to get the important stuff off the Raid5. So that is good. However I am not able to recover the whole Raid array. But that is o.k. I kept too much junk anyway.

            Comment

            • Curious.George
              Badcaps Legend
              • Nov 2011
              • 2305
              • Unknown

              #7
              Re: RAID5 failure: 2 bad HDD's at the same time

              Originally posted by ChaosLegionnaire
              therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.
              They still have many things in common: the hardware/software that's implementing the array, power supply, thermal experience, software that is accessing the array, etc.

              I prefer to trade robustness for convenience -- I only spin up a drive when I'm accessing its contents. If that content is munged, then I have to consider how much of the other content may be at risk. Or, if the box that I'm using to access that drive may, instead, be the culprit.

              [Software/firmware/clients/apps/PEBKAC have been known to be buggy]

              As I don't expect to encounter problems, when/if I do, it gives me a moment to think about what's happening before I propagate a failure (to other copies of the data).

              Comment

              • Uranium-235
                Comrade Glimmer
                • Aug 2007
                • 5042
                • US

                #8
                Re: RAID5 failure: 2 bad HDD's at the same time

                this is why for large arrays, raid 6 is a better idea
                Cap Datasheet Depot: http://www.paullinebarger.net/DS/
                ^If you have datasheets not listed PM me

                Comment

                • diif
                  Badcaps Legend
                  • Feb 2014
                  • 6978
                  • England

                  #9
                  Re: RAID5 failure: 2 bad HDD's at the same time

                  This is why RAID IS NOT BACKUP.

                  Comment

                  • Stefan Payne
                    Badcaps Legend
                    • Dec 2009
                    • 1267
                    • Germany

                    #10
                    Re: RAID5 failure: 2 bad HDD's at the same time

                    Originally posted by CapLeaker
                    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
                    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?
                    You _NEVER EVER EVER_ do that!
                    If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

                    Also RAID is NOT a replacement for the BACKUP!

                    So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....

                    Comment

                    • CapLeaker
                      Leaking Member
                      • Dec 2014
                      • 8065
                      • Canada

                      #11
                      Re: RAID5 failure: 2 bad HDD's at the same time

                      Originally posted by Stefan Payne
                      You _NEVER EVER EVER_ do that!
                      If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

                      Also RAID is NOT a replacement for the BACKUP!

                      So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....
                      Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array? I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?

                      No, I've lost nothing important and that is a good thing. I do have a few offline HDDs. Some of the stuff on the RAID array was so old, it gives me a chance to clean up my file storage. Rather than copying everything and deleting the stuff no longer wanted, I just revesed it by copying only the stuff I want. This gives me more space.

                      Comment

                      • CapLeaker
                        Leaking Member
                        • Dec 2014
                        • 8065
                        • Canada

                        #12
                        Re: RAID5 failure: 2 bad HDD's at the same time

                        Originally posted by Uranium-235
                        this is why for large arrays, raid 6 is a better idea
                        that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?

                        Comment

                        • Curious.George
                          Badcaps Legend
                          • Nov 2011
                          • 2305
                          • Unknown

                          #13
                          Re: RAID5 failure: 2 bad HDD's at the same time

                          Originally posted by CapLeaker
                          that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?
                          Note that you don't need a second "disk failure" -- a URE (during the rebuild) will effectively render a RAID5 (w/ failed disk) "broken". Make sure your NAS is doing patrol reads of the entire array lest you discover that URE when you can least afford it!

                          Comment

                          • Stefan Payne
                            Badcaps Legend
                            • Dec 2009
                            • 1267
                            • Germany

                            #14
                            Re: RAID5 failure: 2 bad HDD's at the same time

                            Originally posted by CapLeaker
                            Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array?
                            Its worth a try.
                            You might want to clone the other HDDs as well or move them immediately over to a new RAID Array.

                            Originally posted by CapLeaker
                            I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?
                            No, that should be written in the MBR or wherever it does that.



                            Anyway, rule of the thumb:
                            If one Drive in a RAID Array dies, do not rebuild it, backup your data and move it over to another Array!

                            Because when all are the same make/model, other drives failing is highly likely.

                            Comment

                            • CapLeaker
                              Leaking Member
                              • Dec 2014
                              • 8065
                              • Canada

                              #15
                              Re: RAID5 failure: 2 bad HDD's at the same time

                              Cloning the HDD with Clonzilla, didn't work for me.

                              Comment

                              • Curious.George
                                Badcaps Legend
                                • Nov 2011
                                • 2305
                                • Unknown

                                #16
                                Re: RAID5 failure: 2 bad HDD's at the same time

                                Originally posted by CapLeaker
                                Cloning the HDD with Clonzilla, didn't work for me.
                                Without knowing how (and WHERE!) the particular NAS stores the array configuration data on the drive, there's no way of knowing if CZ will even SEE it as "data". CZ cheats by only copying the portions of the drive that it KNOWS to contain data (i.e., by understanding file systems and other common disk structures). This lets it skip over the parts of the medium that it thinks are "empty" -- otherwise CZ would take as long as a bytewise copy operation.

                                (Watch CZ in action and you will see how the thruput changes over the course of the operation)

                                You may have to resort to a bytewise copy to be sure you are preserving all of the "stuff that matters" -- to your NAS!

                                And, you're still stuck with the highly likely URE interfering with that operation -- the U in URE -- without the benefit of the redundant drives to compensate for it.

                                16TB = 128,000,000,000,000 bits = 1.28 x 10^14. Assume a URE rate of 1 in 10^14...

                                Comment

                                • CapLeaker
                                  Leaking Member
                                  • Dec 2014
                                  • 8065
                                  • Canada

                                  #17
                                  Re: RAID5 failure: 2 bad HDD's at the same time

                                  that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS.

                                  Comment

                                  • Curious.George
                                    Badcaps Legend
                                    • Nov 2011
                                    • 2305
                                    • Unknown

                                    #18
                                    Re: RAID5 failure: 2 bad HDD's at the same time

                                    Originally posted by CapLeaker
                                    that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS.
                                    dd(1) should clone the drive completely (there may be some issues with portions of the MBR under some OS's).

                                    Of course, now you're faced with the time it takes to read the entire medium.

                                    And, the real possibility that dd(1) will encounter a URE somewhere along the way (you'll have to sort out what "value" should be substituted for the "unknown" value, in that case).

                                    ISTR CZ has an option to just fall into dd(1) mode (instead of trying to understand the filesystem's structure)...?

                                    Comment

                                    • CapLeaker
                                      Leaking Member
                                      • Dec 2014
                                      • 8065
                                      • Canada

                                      #19
                                      Re: RAID5 failure: 2 bad HDD's at the same time

                                      I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.

                                      Comment

                                      • Curious.George
                                        Badcaps Legend
                                        • Nov 2011
                                        • 2305
                                        • Unknown

                                        #20
                                        Re: RAID5 failure: 2 bad HDD's at the same time

                                        Originally posted by CapLeaker
                                        I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.
                                        If it is truly cloning the entire media surface, then the NAS must have some NVRAM in which it stores data from drive inquiry commands. E.g., I track drives in my "disk sanitizer" by storing the serial number, model number, etc. from the drive inquiry in a large database. So, when I next encounter the drive (e.g., when I install an OS image), I know its history.

                                        Usually, the drive is used to store this stuff (in a special partition or in the "unused" area right after the MBR).

                                        Regardless, this is one of the ways RAID f*cks you; had that been a "regular" disk, you could have thrown it in another machine and accessed its contents like normal (losing whatever part of the disk that may be afflicted with UREs).

                                        If you've already written off the data (as lost), you could try to recover the contents using one of the Windows/Linux tools that claim to be able to do so. At the very least, it will be a learning experience (and COULD yield positive results).

                                        Google "raid recovery" (and, please, report on any results!)
                                        Last edited by Curious.George; 07-14-2019, 10:45 AM.

                                        Comment

                                        Related Topics

                                        Collapse

                                        • titomno2
                                          Gaming Laptop Zephyrus S17 TFT not detected every time !
                                          by titomno2
                                          Hi guys,
                                          I post my problem here because I think it could be a capacitor related issue (Asus support on this particular subject is non-existing).
                                          So the computer is a Zephyrus-S17-GX735LXS-26T and the strange behavior is seen.
                                          On cold start most of the time the internal screen is detected and a picture it works fine until shutdown, sleep or deep sleep.
                                          Then the screen won't work for a long period of time.
                                          If I wait 1 or 2 hours it work again most of the time.
                                          When it doesn't work the batterie indicator blink 3 times white stop and the same schematic is repeated...
                                          01-19-2025, 08:27 AM
                                        • flat-earther
                                          Adding a clock battery to action camera to retain date&time when changing batteries Campark X25
                                          by flat-earther
                                          I have a cheap action camera with an annoying trait, every time I change batteries the date & time gets reset and I have to set it again.

                                          To solve this problem I want to add a small internal battery inside it which will keep supplying power to the camera for a while while I change batteries so the date & time is retained.

                                          I have opened the camera and I think there is enough room to add something inside to accomplish this:

                                          The camera is powered by a 3.7V nominal single cell lithium ion battery.
                                          So I need to connect another battery in...
                                          06-08-2025, 02:00 AM
                                        • OscarAV
                                          HP Pavilion dm4-1065dx long time to get the power led on
                                          by OscarAV
                                          Hello,

                                          I just came here to see if someone has had the same problem I have, and if he or she solved It, to know how to do It.

                                          I have made a quick search for my laptop but I have not seen anything similar.

                                          I have an old laptop, an HP Pavilion dm4-1065dx which takes a lot of time to be ready to start if the laptop has been unplugged for a while. It has no battery. The battery failed long time ago. It needs to be plugged to work. But as I said, It can take a lot of time until the charging led gets on, and I can switch on the laptop. I am using an external monitor...
                                          09-23-2024, 05:53 PM
                                        • sam_sam_sam
                                          High Pressure Sodium Yard light when air conditioner turns light momentarily turns of
                                          by sam_sam_sam
                                          I have had these 70 watt high pressure sodium yard lights for many years now probably at least 15 years or more and there is a protection diode that after several years starts to go bad and when it does this it has momentarily shutting down the light and then a few minutes later it starts working again

                                          This about the forth time I have fixed this one and you can tell if it the diode or not if it lights up and it only shuts down when the air conditioner turns on then it is the diode

                                          If it cycling on and off constantly then it is the bulb itself that is the issue
                                          ...
                                          06-03-2023, 05:24 PM
                                        • Sus256
                                          HISENSE 65E7KQ PRO - no boot
                                          by Sus256
                                          Hi all!

                                          HISENSE 65E7KQ PRO
                                          RSAG7.820.13512 - chassis
                                          MT9618BAATAB - cpu
                                          RSAG7.820.12059 - power
                                          HD650Y3U77 - panel
                                          KLM8G1GETF - emmc

                                          The TV does not turn on, the standby indicator is on. All voltages are present.
                                          Emms is dead. Not readable, not detected.
                                          Replaced emmc.
                                          But with another dump there is no launch

                                          Terminal log

                                          UART
                                          <
                                          AC_ON
                                          RPMB key is not yet programmed

                                          HASH1_VERSION=0x00000000 64bit
                                          E-B
                                          FDE enabled
                                          layout pattern onebin by SAR7,0...
                                          06-01-2025, 02:11 AM
                                        • Loading...
                                        • No more items.
                                        Working...