![]() |
|
|
Thread Tools | Display Modes |
![]() |
#1 |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Hey guys,
I have an Asus M5A99FX Pro R2.0 motherboard with AMD RAIDXpert. I set up a 1000GB RAID1 array on it. This was about 8 years ago. Every once in a while, a power outage or something would knock one drive offline and degrade the array. After a reboot the drive would be back online and it would rebuild just fine. Recently, a drive dropped offline and the array became degraded. I figured it was no big deal and rebooted then set it to rebuild overnight. The next morning I took a look and both drives had dropped offline at 70% rebuild. Uh oh. I rebooted and tried rebuilding a few more times but one of the drives kept going offline. Figured it must be bad so I swapped another in and started rebuilding to it. The rebuild was almost complete when the other drive went offline. I am currently cloning the last drive to a good drive as it does not seem likely that it will do a successful rebuild. And yes, I know RAID is not a backup. I have several backups of the drive. I am just cloning it to save me the trouble of reinstalling some programs that I didn't have at the time of the last backup. I tossed the first failed disk in another PC while the other one was cloning to take a look at the SMART data. 7120 bad sectors. Ouch. No wonder it wouldn't rebuild. This leads to my question. Why on earth would the RAID controller never alert me to such an unhealthy disk? This disk has obviously been going out for a long time and it was just after the second disk started going bad that I found the problem. Trying to view the SMART data on the PC when the drive is in RAID just gives "not supported". I have another way older PC with a RAID controller that I just use as a server for some video games. I am using a hard drive with bad sectors since I really don't care if it dies abruptly. However, every time I boot that PC, it beeps and complains that one of my RAID disks are failing. Is there something that I am doing wrong with the AMD AMD RAIDXpert that I can't see the SMART data from the OS, and that the RAID controller never complains even when there is a disk that has both feet in the grave? It seems to defeat the purpose of the RAID array... Having one disk go bad and then just swapping a new one in isn't an option when the controller waits until both disks are bad to alert me to the problem... Let me know your thoughts.
__________________
canadaboy25 -Sometimes the light at the end of a tunnel is an on-coming train Last edited by canadaboy25; 09-24-2021 at 02:23 PM.. |
![]() |
![]() |
![]() |
#2 |
Badcaps Veteran
Join Date: Feb 2014
City & State: Midlands
My Country: England
I'm a: Professional Tech
Posts: 6,006
|
![]() EZRAID is just software RAID, it doesn't monitor SMART.
|
![]() |
![]() |
![]() |
#3 |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Sorry, I don't know where I got EZRAID from. I meant to put RAIDXpert. Edited the original post.
Every time I boot, I get the raid ROM page showing that everything is healthy, even though it wasn't I suppose RAIDXpert is just the name of the interface software. The actual controller is part of the AMD SB950 chipset. And in either case of it being software or hardware RAID, why would either of them block the SMART data from third party programs on the OS?? Last edited by canadaboy25; 09-24-2021 at 02:27 PM.. |
![]() |
![]() |
![]() |
#4 |
Badcaps Veteran
Join Date: Feb 2014
City & State: Midlands
My Country: England
I'm a: Professional Tech
Posts: 6,006
|
![]() It's still not real RAID, have you checked in the BIOS to see if SMART is enabled ?
What program are you trying to read the SMART with ? This video suggests RAIDXpert does record bad sectors. https://www.youtube.com/watch?v=ezS5svpP2XU Last edited by diif; 09-24-2021 at 04:52 PM.. Reason: Added video. |
![]() |
![]() |
![]() |
#5 |
Solder Sloth
Join Date: Nov 2012
City & State: CO
My Country: USA
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 7,084
|
![]() You have to watch your RAID.
Make sure your software alerts you whenever a drive gets kicked and you need to do due diligence to find out why it got kicked and make sure rebuilds succeeds each time. I don't get random power outage induced drive kicks from my software RAID5 and software RAID1. Heck I even can poll SMART data from the disks as needed. This is Linux mdraid of course... Incidentally I'm suffering disk puke failures on my RAID and am carefully making sure I have backups. What's annoying is that I use a RAID1 for my backup of my RAID5 and that RAID1 had a failure... not to mention one of my RAID5 disks is starting to fail. Sigh. Probably in the market for a 2TB disk RSN. |
![]() |
![]() |
![]() |
#6 | ||
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Quote:
In my RAIDXpert software the SMART status is "Not Available" where his is "Healthy". I have tried several different programs. The one I use the most is HDSentinel. It just gives a generic value of 75% health and no details to the disks that are part of the RAID array. I can see the SMART info for all my other drives that are not part of the RAID array. Quote:
I never thought that the drives were going bad as I was never getting any drive health warnings. Like I said, every other PC I've had with onboard RAID has given me warnings about bad disks at the BIOS ROM boot screen. Last edited by canadaboy25; 09-24-2021 at 05:55 PM.. |
||
![]() |
![]() |
![]() |
#7 |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() I took a look through the BIOS and SMART Reporting was enabled. However, the RAID mode was set to "Legacy ROM" and the other option was "UEFI Driver". I figured this had to be the problem. I switched the setting over and then Windows wouldn't boot because my system drive was MBR instead of GPT.
Should've been an easy fix but somehow my System Reserved boot partition was on the 1000GB RAID array that was failing and my Windows installation was on my SSD. So after days of struggling to get the system reserved partition moved and make the OS bootable again, I finally got my SSD switched over to GPT and the UEFI Driver setting enabled in the BIOS. During this process I found a 3rd bad disk in my 2000GB RAID array. Luckily the other disk in the array did not have any bad sectors yet. To my disappointment, this didn't do anything. HDSentinel still just gives ? for the SMART status of the drives connected to the RAID controller. Even for disks that are not a part of the array such as my SSD. CrystalDiskInfo doesn't even detect the disks at all. The drives now show a "Healthy" status in the RAIDXpert utility but this obviously means absolutely nothing. I plugged one of the bad drives into another PC to get the SMART status. I have attached a screenshot of the info. 256 bad sectors with 2576 pending bad sectors shows up as "Healthy" on the RAID utility. The other drive in the array had 16 bad sectors and 16 pending bad sectors. Not nearly as bad but it still caused the rebuild to fail at 70%. How the RAID utility considers this disk "Healthy", I have no idea. If I boot the computer from a linux USB stick, I can read the SMART data from all the drives, even the ones that are a part of an array. The files in the array are also available in a separate volume. So obviously the RAID controller is allowing the SMART data to be read. However, I have not found any way to do so from Windows. Any ideas? |
![]() |
![]() |
![]() |
#8 |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Nevermind, just rebooted the system and the SMART status in RAIDXpert is back to "Not Support". So there must be something wrong with the SMART implementation in their Windows driver...
|
![]() |
![]() |
![]() |
#9 |
Computer Geek
Join Date: Jan 2015
City & State: Nowhereland, Texas
My Country: USA
Line Voltage: 120/2/[email protected]
I'm a: Hardcore Geek
Posts: 1,990
|
![]() Protip: A drive regularly dropping out of the array is NOT NORMAL UNDER ANY CIRCUMSTANCES. I suggest booting up Linux and checking the SMART stats with:
Code:
smartctl -d sat -a /dev/sdX EDIT: Oops, just read that you managed to pull SMART. Junk that Seagate drive ASAP before it takes the array down for good.
__________________
Don't buy those $10 PSU "specials". They fail, and they have taken whole computers with them. ![]() My computer doubles as a space heater. Windows 10? Only if you like forced, buggy updates and 24/7 telemetry. Samsung = Seagate = Seatrash = Trashgate Don't buy Seagate drives. Don't use Seagate drives. If you have any in service right now, make plans to replace them ASAP. SMR = Slow Magnetic Recording Avoid SMR, buy CMR drives instead. SMR is easily a 15+ year step BACKWARDS in HDD speed. Permanently Retired Systems: RIP Advantech UNO-3072LA (2008-2021) - Decommissioned and taken out of service permanently due to lack of software support for it. Not very likely to ever be recommissioned again. |
![]() |
![]() |
![]() |
#10 | |
master hoarder
Join Date: May 2008
City & State: VA (NoVA)
My Country: U.S.A.
Line Voltage: 120 VAC, 60 Hz
I'm a: Hobbyist Tech
Posts: 10,862
|
![]() I've had good luck with HDDScan to read SMART info from drives where other softwares failed, including CrystalDisk. So perhaps give that a try and see if you can still read your drives' SMART through Windows. If not, then it likely it some kind of a limitation between Windows and the hardware you're running (like from the drivers?)
Quote:
![]() If I remember correctly, those Seagate STxxxxDM003 drives are 7200.12 models or similar... but ones that weren't affected by the BSY bug (or at least Seagate never provided a Firmware update for them.) Just saw a bunch of broken ones for sale on eBay the other day too. LOL. FWIW, I don't trust anything from Seagate past the 7200.7 and 7200.9 drives. Well, maybe can make an exception for the SCSI 10k server Cheetahs... but that's about it. With all of their newer models, this is the only thing they are good for, IMO: https://www.badcaps.net/forum/attach...1&d=1604205421 |
|
![]() |
![]() |
![]() |
#11 |
Super Moderator
Join Date: Jan 2016
City & State: Valbonne, 06
My Country: France
I'm a: Knowledge Seeker
Posts: 3,570
|
![]() Never trust these meaningless "health percentage" or whatever. Pending/uncorrectable sectors >= 1 → dead HDD, and you already lost data.
__________________
OpenBoardView — https://github.com/OpenBoardView/OpenBoardView |
![]() |
![]() |
![]() |
#12 |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Yes, obviously all 3 drives with bad sectors have been removed and replaced with good drives. The spares I had on hand have 40000+ hrs on them but have no bad or pending sectors so they will be fine until I find some replacements.
And yes, I was not looking at the silly "Health" value on the HDSentinel tool, but rather the ~2500 pending sectors and 256 bad sectors. One of the RAID arrays did not survive. Since both drives had bad sectors, the array would never rebuild. I cloned the best of the two drives to a good temporary drive, deleted the array, replaced the disks, created a new array, and then cloned the partition from the backup to the new array. I gave HDD scan a try, but it gave the same results. Could not see the SMART data of any drives connected through the RAID SATA ports on the motherboard. I guess I will just have to boot from the linux USB once every couple weeks to check on the SMART status. It sucks a lot, but I guess it is better to go through the inconvenience and find a failing drive while the array still has potential to be rebuilt. Very frustrating that all the SMART data can be read in linux, but nothing I can do will let me access it from Windows. Must be a driver limitation like was mentioned. |
![]() |
![]() |
![]() |
#13 |
Computer Geek
Join Date: Jan 2015
City & State: Nowhereland, Texas
My Country: USA
Line Voltage: 120/2/[email protected]
I'm a: Hardcore Geek
Posts: 1,990
|
![]() Also, PassMark DiskCheckup is a good Windows utility for reading SMART data (and tracking it over time)
|
![]() |
![]() |
![]() |
#14 | |
Badcaps Veteran
Join Date: Dec 2009
City & State: Prague, 50°4'52.22"N, 14°23'30.45"E
My Country: CZ
Line Voltage: 230 V/50 Hz
I'm a: Knowledge Seeker
Posts: 4,692
|
![]() Quote:
actually have one of those WD drives with some hardcore error in about 2/3s of the drive which I've cut around by partitions and use it as well as a data drive for couple years now (found some nice softwares in the meantime which may be able to manually reallocate a given sector, will try that the next time I run into such drive) to OP: have you tried the smartctl for win anyways? sometimes it is able to pull S.M.A.R.T. through SW fake-RAIDs that is one of the reasons I abandoned these funny things and if anything, use SW RAID from windblows directly; be aware that, as non-server system, 7 does NOT have any working mechanism to warn about broken RAID either! go figure…
__________________
Less jewellery, more gold into electrotech industry! ![]() ![]() Exclusive caps, meters and more! Hardware Insights - power supply reviews and more!
Last edited by Behemot; 01-01-2022 at 07:56 PM.. |
|
![]() |
![]() |
![]() |
#15 |
Solder Sloth
Join Date: Nov 2012
City & State: CO
My Country: USA
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 7,084
|
![]() Code:
5 Reallocated_Sector_Ct 0x0033 153 153 140 Pre-fail Always - 374 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 335 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 15 ![]() Death is near! |
![]() |
![]() |
![]() |
#16 | |
I hate waiting!
Join Date: May 2013
City & State: Saskatchewan
My Country: Canada
Line Voltage: 120VAC 60Hz
I'm a: Hobbyist Tech
Posts: 440
|
![]() Well, round 2 on failing drives. I left the computer running while I was away for Christmas so I could grab any files off of it remotely while I was away.
Got back home a week or so ago and have been using the computer like normal. I just now opened up the little tray on the bottom right of the taskbar and there was the yellow "!" indicating an array went critical. Booted into linux and sure enough, one drive had 200-some bad sectors and a few unrecoverable. I'm running out of disks to feed this thing. Pulled out an old Hitachi from 2010 and stuck it in. The array is currently rebuilding. Looking at the event log in RaidXpert, there were a bunch of timeouts and read errors, then the drive dropped off on Christmas eve! So I've been using the bloody thing with a degraded array for the past half-month. The stupid Raid icon disapears from the taskbar when there is nothing wrong, so I can't even pin it to the taskbar and have it visible. I would rather it spam me with warnings if something is wrong than just hide in the tray until I happen to see it... Quote:
It's shocking how poorly thought out some of these pieces of software are... |
|
![]() |
![]() |
![]() |
#17 |
Badcaps Veteran
Join Date: Dec 2009
City & State: Prague, 50°4'52.22"N, 14°23'30.45"E
My Country: CZ
Line Voltage: 230 V/50 Hz
I'm a: Knowledge Seeker
Posts: 4,692
|
![]() Suggest getting some of the enterprise 2TB drives (Seagate Constellation, WD RE4, Hitachi Ultrastar). Those have insane reliability, it's before all those new modern fancy craps (also called "features") made way into enterprise drives. Upgraded to 8 of them in RAID6 recently, it is not running 24/7 for now as I still have not moved to me reconstructed place with cheap electricity (if such thing will ever happen again with them leftish green cunts) but still they have spun for ages alltogether before I bought them second-hand, all 100% healthy. Recommended to other ppl too, never seen any of these fail, not even getting a single realocated sector, EVER.
Consumer and even all those colored drives (like for NAS etc.) are just rebatched variants of the same junk (similarity with the other neo-leftish colored BS, only coincidence?). Just look at the numbers, its all the same crap&shit, than somewhere in the sky above all this you have enterprise drives with 2000000 hours MTBF, and especially these up-to-2-TB drives do HOLD for ages. Newer tend to have too high capacity per inch and the surface is no longer reliable IMO. Plus you can get them used from arrays for equivalent of like 30 bucks. Will happily spin for another decade in yur home array. |
![]() |
![]() |
![]() |
Thread Tools | |
Display Modes | |
|
|