Badcaps Forums

Badcaps Forums (https://www.badcaps.net/forum/index.php)
-   General Computer & Tech Discussion (https://www.badcaps.net/forum/forumdisplay.php?f=16)
-   -   well I lost everything on my server (https://www.badcaps.net/forum/showthread.php?t=70940)

Uranium-235 07-25-2018 08:51 PM

well I lost everything on my server
 
few weeks ago my PE 1800 fans started to spin at full speed. No matter how much I reset the cmos or rebooted, it kept doing it. One thing it said was the Baseboard was unable to connect.

after research, I need to update the baseboard to fix it. I have ubuntu 16.04 and tried to update it with the linux update from dell. simple binary. but, it was redhat, I figured it would work. (I know ubuntu is debian based)

I sudo'd the executable and found a bunch of text scrolling past me, I wasn't able to read all of it. Saw it trying to change permission on /pci/xx/xx/xx devices I thought, oh, it's enumerating the onboard baseboard store, getting ready to update it.

turns out, it was going through my raid 1 array fing up all the permissions on well, everything.

after that, it would boot but not login (unable to access my /home). Also a bunch of stuff failed to start

So I made a win7 PE, tried to run the windows updater, memory address blocks, instruction issues

I got pissed and hooked a laptop hard drive up, and a dvd drive up (only two sata ports). I installed windows xp on the laptop hard drive for the sole purpose of running this executable. It ran, updated the baseboard from 1.5 to 1.8. Right before it was done, my fans spun down to normal.

so, ubuntu is messed up, i'll do what I always do. Take one of the Raid 1 drives, and wipe it, install with raid 1, degraded, without the other hard drive there. Attach the hard drive, copy all my stuff to the new install (this time 18.04) and rebuild the array

but the second drive, was so corrupt. Trying to get it to mount, trying to get mdadm to access it for mounting. Just didn't work. I think I remember seeing fsck scanning it and I think that might have caused the problems.

all my tools, all my games, all my videos. Backups for customers I have luckily have not needed to access. gone. Next time i'll try to hook it up to a windows ext access utility before fsck ever has a chance to do anything

I am partially to blame, I didn't realize running an executable for redhat would, recursively go through my array and fuck up all my permissions:crying:

Topcat 07-25-2018 09:01 PM

Re: well I lost everything on my server
 
I can't seem to preach this enough: ALWAYS KEEP OFF-SERVER BACKUPS!!

http://www.paullinebarger.net/DS/
Yes, things seem to be broken. I hope you get it sorted out.

Uranium-235 07-25-2018 09:12 PM

Re: well I lost everything on my server
 
hmm, I wonder if I can SSH back it to my webhost, I have "unlimited storage". There a package for that?

RJARRRPCGP 07-25-2018 09:39 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by Uranium-235 (Post 841184)
few weeks ago my PE 1800 fans started to spin at full speed. No matter how much I reset the cmos or rebooted, it kept doing it. One thing it said was the Baseboard was unable to connect.

after research, I need to update the baseboard to fix it. I have ubuntu 16.04 and tried to update it with the linux update from dell. simple binary. but, it was redhat, I figured it would work. (I know ubuntu is debian based)

I sudo'd the executable and found a bunch of text scrolling past me, I wasn't able to read all of it. Saw it trying to change permission on /pci/xx/xx/xx devices I thought, oh, it's enumerating the onboard baseboard store, getting ready to update it.

turns out, it was going through my raid 1 array fing up all the permissions on well, everything.

after that, it would boot but not login (unable to access my /home). Also a bunch of stuff failed to start

So I made a win7 PE, tried to run the windows updater, memory address blocks, instruction issues

I got pissed and hooked a laptop hard drive up, and a dvd drive up (only two sata ports). I installed windows xp on the laptop hard drive for the sole purpose of running this executable. It ran, updated the baseboard from 1.5 to 1.8. Right before it was done, my fans spun down to normal.

so, ubuntu is messed up, i'll do what I always do. Take one of the Raid 1 drives, and wipe it, install with raid 1, degraded, without the other hard drive there. Attach the hard drive, copy all my stuff to the new install (this time 18.04) and rebuild the array

but the second drive, was so corrupt. Trying to get it to mount, trying to get mdadm to access it for mounting. Just didn't work. I think I remember seeing fsck scanning it and I think that might have caused the problems.

all my tools, all my games, all my videos. Backups for customers I have luckily have not needed to access. gone. Next time i'll try to hook it up to a windows ext access utility before fsck ever has a chance to do anything

I am partially to blame, I didn't realize running an executable for redhat would, recursively go through my array and fuck up all my permissions:crying:

Sounds like possibly a new malware... :barf:

brethin 07-25-2018 10:16 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by Topcat (Post 841187)
I can't seem to preach this enough: ALWAYS KEEP OFF-SERVER BACKUPS!!

I keep 2 on 2 different locations just because I hate loosing things.

goontron 07-25-2018 10:38 PM

Re: well I lost everything on my server
 
Yeah, you need to watch that. Running it from the wrong path, or with unset variables can cause chaos. Take the Steam Client for an example: https://www.theregister.co.uk/2015/0...ans_linux_pcs/ I got bit by that bug. Took out the drives being backed up and the drives that were taking the backup... Both my "production" data and backup data gone like that.

Uranium-235 07-26-2018 01:05 AM

Re: well I lost everything on my server
 
Quote:

Originally Posted by goontron (Post 841207)
Yeah, you need to watch that. Running it from the wrong path, or with unset variables can cause chaos. Take the Steam Client for an example: https://www.theregister.co.uk/2015/0...ans_linux_pcs/ I got bit by that bug. Took out the drives being backed up and the drives that were taking the backup... Both my "production" data and backup data gone like that.

seems something you could sue valve for

Curious.George 07-26-2018 02:22 AM

Re: well I lost everything on my server
 
Quote:

Originally Posted by brethin (Post 841201)
I keep 2 on 2 different locations just because I hate loosing things.

Yup, though even that may not be sufficient!

I keep "cold" backups of everything -- two on rust and at least one on another medium (tape, MO, CD/DVD, etc.).

Some years ago, I had a "disk crash" (or, so i thought!). So, I shrugged and pulled the (first!) cold backup out and mounted it in an external case (SCSI-based system). This disk proved unreadable (WTF?). Maybe bad luck or a problem with the enclosure/cabling?

Set it aside and pulled (SECOND!) backup out and repeated the exercise. And, should NOT have been surprised to see the exact same results!

Now we KNOW something is seriously hosed!!

So, I pulled out the MO backup, cabled an MO drive to the system and restored the drives from the MO (largely read-only) backup.

Turns out the OS I had upgraded, previously, had a bug in the quirks table for the drives that I happened to be using as my cold backups. Mounting any of them (without manually installing the "read-only" jumper on the drive itself) would result in the superblock being trashed.

So, roll back to an earlier OS release, copy the MO image onto both cold backups (which, not surprisingly, actually DO work!) and only lose a day of my time (and a few years off my life from the stress).

Now, I keep multiple copies of "stuff" and in varied places. I have a database that tells me which files (and their MD5's) are located on which media so if I lose any copy of a file, I can quickly locate a backup copy of it, regardless of where it may be stored.

diif 07-26-2018 04:38 AM

Re: well I lost everything on my server
 
A backup isn't a backup until it's been verified.

ratdude747 07-26-2018 04:58 AM

Re: well I lost everything on my server
 
This is why I setup my server to back up it's RAID10 array to a separate drive every week and then to clear out old backups so it doesn't overflow. It's still in-server, but it's on a separate drive controller, so I consider that to be good enough.

stj 07-26-2018 05:48 AM

Re: well I lost everything on my server
 
this is a good reason to try to use ZFS for backups.
everything gets checksumed.

Uranium-235 07-26-2018 08:44 AM

Re: well I lost everything on my server
 
Well this linux raid is being a pain. I created new partitions on the disk to join, and even though it's the same number of blocks, says it's too small. Ugh. I have s perc 5 card I managed to get out of an old customers workstation cause he had a crashed 5 array and I convinced him to use intel on board raid 1 (that and the card started to have an odd pcie conflict with another device)

Not sure the impi will enumerate it for control on a server this old though

petehall347 07-26-2018 09:02 AM

Re: well I lost everything on my server
 
not sure if running nautilus will help here but thought i would mention it anyway ..

Curious.George 07-26-2018 09:39 AM

Re: well I lost everything on my server
 
Quote:

Originally Posted by diif (Post 841246)
A backup isn't a backup until it's been verified.

A backup can be successfully verified and still found to be defective when it is eventually needed! E.g., both of my "cold backups" were actually intact -- it was the OS that had been compromised which rendered them inaccessible at a much LATER date.

My "system" tracks the last time it "examined" each volume in the database. When a volume is encountered, it determines which files on that volume have not been re-verified in a particular number of days and starts a task running to read the file in its entirety and verify the checksum computed matches the checksum stored for that file in that folder on that volume in the database. If so, it records the timestamp for that "verification" and then moves on to process the next such file.

If the database indicates files that haven't been checked in some "verification interval", it emails me to mount those volumes so they can be examined.

This happens regardless of where (which network node) the devices are mounted and regardless of the media involved. E.g., my CDs, DVDs, MOs, thumb drives, drives in sleds, external USB drives, etc. are all managed with the same mechanism (though I can schedule different "verification intervals" for each volume to manage the amount of manual intervention that are required of me). Do you know that the pile of optical media you've written over the years are still intact? If you don't care, then why hold onto them??

Because of this, any time I happen to mount a volume for "some other reason", the contents of the volume can be checked "for free".

When a discrepancy is encountered (file can't be read, file not found, checksum mismatch), the database tells me where I can find a "copy" of that particular file so that the defective copy can be repaired.

Does your RAID array tell you if ALL the files you are NOT accessing, now, are intact? Do you have to verify its entire contents in order to reassure yourself that it is intact? Are ALL of the files on that medium equally important to you? Do you really want to verify the ISO images of the installation CDs -- which you happen to have squirreled away in a desk drawer -- just because they happen to reside on THAT array? Or, would you be equally confident verifying them every month or three -- KNOWING that the masters also exist on non-rust?

Topcat 07-26-2018 09:45 AM

Re: well I lost everything on my server
 
Quote:

Originally Posted by brethin (Post 841201)
I keep 2 on 2 different locations just because I hate loosing things.

As do I. I keep 2 external drives in my safe on the property, and I've got another in a safe deposit box. RAID's are nice, but that's a common misconception that so many make....RAID's provide redundancy, not backup!

Curious.George 07-26-2018 10:00 AM

Re: well I lost everything on my server
 
Quote:

Originally Posted by stj (Post 841260)
this is a good reason to try to use ZFS for backups.
everything gets checksumed.

But you then need ZFS to access the medium. What do you do if the box(es) that support it are down? How do you implement it on already written WORM media? etc.

The same problem applies to the various RAID technologies. When your RAID hardware (or system) dies, how do you access (or recover) the contents of those volumes?

I perform the checksums in-band and deliberately store them ON ANOTHER MACHINE (which can be replicated). There's nothing magical about the volumes that I'm checking -- no reliance on particular hardware (can you pull a drive from a Synology RAID array and install it in a "software RAID" box and expect to access its contents?) There's nothing magical about the filesystems being used -- I can check FAT12 floppies just as easily as EXTFS2 or NTFS or...

And, I can mount a volume on any machine (with compatible hardware -- SCSI drives obviously need a SCSI HBA for access) and still gain access to the data.

The cost of keeping a spare HBA or SCSI enclosure is trivial compared to keeping a spare RAID *box* (that claims to be compatible with the other boxes you might have).

[You learn these lessons when you discover the hardware to access various types of media that you've used over the decades are suddenly not obtainable. Or, the support for them (OS drivers) has disappeared. Do you scurry to move all of that data forward onto new media? (how do you know i is intact when you do so?) Or, do you try to maintain legacy hardware to make it accessible in its original form? (What will you do when you can't buy CD/DVD drives anymore?)]

And, because I have the checksums (MD5s) for all of these files available, I can find likely duplicates just by querying the database: two files (which might have different names and reside in different folder -- on different volumes OR ON THE SAME VOLUME) that share a checksum value are likely the same -- or, I can be prompted to make both available to the system so it can make that determination (and record it!).

This has already been helpful in identifying duplicate copies of files that I did not care to maintain (e.g., "805-1709-12.pdf" and "Sun Ultra 60 Service Manual.pdf" are identical documents differing only in the name that I assigned to them and the folders I stuffed them into!)

RJARRRPCGP 07-26-2018 03:01 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by Curious.George (Post 841322)
But you then need ZFS to access the medium. What do you do if the box(es) that support it are down? How do you implement it on already written WORM media? etc.

The same problem applies to the various RAID technologies. When your RAID hardware (or system) dies, how do you access (or recover) the contents of those volumes?

I perform the checksums in-band and deliberately store them ON ANOTHER MACHINE (which can be replicated). There's nothing magical about the volumes that I'm checking -- no reliance on particular hardware (can you pull a drive from a Synology RAID array and install it in a "software RAID" box and expect to access its contents?) There's nothing magical about the filesystems being used -- I can check FAT12 floppies just as easily as EXTFS2 or NTFS or...

And, I can mount a volume on any machine (with compatible hardware -- SCSI drives obviously need a SCSI HBA for access) and still gain access to the data.

The cost of keeping a spare HBA or SCSI enclosure is trivial compared to keeping a spare RAID *box* (that claims to be compatible with the other boxes you might have).

[You learn these lessons when you discover the hardware to access various types of media that you've used over the decades are suddenly not obtainable. Or, the support for them (OS drivers) has disappeared. Do you scurry to move all of that data forward onto new media? (how do you know i is intact when you do so?) Or, do you try to maintain legacy hardware to make it accessible in its original form? (What will you do when you can't buy CD/DVD drives anymore?)]

And, because I have the checksums (MD5s) for all of these files available, I can find likely duplicates just by querying the database: two files (which might have different names and reside in different folder -- on different volumes OR ON THE SAME VOLUME) that share a checksum value are likely the same -- or, I can be prompted to make both available to the system so it can make that determination (and record it!).

This has already been helpful in identifying duplicate copies of files that I did not care to maintain (e.g., "805-1709-12.pdf" and "Sun Ultra 60 Service Manual.pdf" are identical documents differing only in the name that I assigned to them and the folders I stuffed them into!)

MD5 is obsolete, FFS! For example, for ISOs, SHA is regularly used now.

Curious.George 07-26-2018 03:26 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by RJARRRPCGP (Post 841372)
MD5 is obsolete, FFS! For example, for ISOs, SHA is regularly used now.

MD5 is obsolete due to its vulnerability to HACKING/cracking. It is still robust enough to produce a unique signature of any nontrivial file contents without concern for collisions. I.e., ask yourself what type of data corruption would cause a file's contents to change in such a way that there would be an undetectable collision in the MD5 wrt the "correct" contents.

All I use the signature for is to verify that the contents of the file appear to be unaltered -- WITHOUT having to do a bytewise compare to another copy of the file (which may not be "online" at the moment).

MD5 is, on average, faster to compute than any of the SHA variants when the host platform -- as well as file size -- is variable. My goal, of course, is to process as many files as quickly as I can so the user doesn't have to "wait" while the system runs around checking things.

(You want to be able to mount a volume to access something of interest to YOU, not to cater to the system's need to check files. The system, OTOH, wants to exploit every opportunity it has to access the files on that volume so it can vouch for their integrity, NOW.)

RJARRRPCGP 07-26-2018 05:44 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by Curious.George (Post 841376)
MD5 is obsolete due to its vulnerability to HACKING/cracking. It is still robust enough to produce a unique signature of any nontrivial file contents without concern for collisions. I.e., ask yourself what type of data corruption would cause a file's contents to change in such a way that there would be an undetectable collision in the MD5 wrt the "correct" contents.

All I use the signature for is to verify that the contents of the file appear to be unaltered -- WITHOUT having to do a bytewise compare to another copy of the file (which may not be "online" at the moment).

MD5 is, on average, faster to compute than any of the SHA variants when the host platform -- as well as file size -- is variable. My goal, of course, is to process as many files as quickly as I can so the user doesn't have to "wait" while the system runs around checking things.

(You want to be able to mount a volume to access something of interest to YOU, not to cater to the system's need to check files. The system, OTOH, wants to exploit every opportunity it has to access the files on that volume so it can vouch for their integrity, NOW.)

I fear that a collision in the future, can result in failure to detect corruption. :eek:

Even though it's much better than CRC, with files. I saw MD5 do a GJ with optical drive mis-reads, IIRC.

Curious.George 07-26-2018 07:02 PM

Re: well I lost everything on my server
 
Quote:

Originally Posted by RJARRRPCGP (Post 841398)
I fear that a collision in the future, can result in failure to detect corruption. :eek:

Even though it's much better than CRC, with files. I saw MD5 do a GJ with optical drive mis-reads, IIRC.

Realistically, the sorts of errors that will manifest will be:
  • file not found
  • unrecoverable read errors
  • drive failure

Keep in mind that, unlike ZFS, RAID, etc. my scheme tolerates the volumes being accessed "unsupervised". E.g., I can take an external USB drive and manually change "something" -- or someTHINGS -- without the system ever seeing me make those changes. So, I can delete a file -- or rename it, or move it, etc. -- and the system will not see me doing those things (so that it can update its notion of the file's new name, location, etc.).

I can likewise make changes to the file's content while "out of sight" and it won't know to update the checksum (signature) stored in the database to reflect those changes.

Removable media (esp CD/DVD) can fail while offline and throw UREs. Again, something that doesn't happen with RAID/ZFS/etc. (volumes are never really "offline" while the rest of those system is running).

And, of course, a drive can always have a catastrophic failure (fail to spin up).

Note that most archive formats (ZIP, ARC, RAR, etc.) rely on simple checksums to vouch for the integrity of their contents. How often have you encountered one that fails to self-verify after it had previously done so?

Compute the MD5 of this message. Then, alter it in such a way that its length and MD5 remain unchanged. Then, try to convince me that your alterations are representative of a likely hardware/media failure! :whistle:


All times are GMT -6. The time now is 09:14 AM.

Powered by vBulletin ®
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.