Page 1 of 1

You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 11:10 am
by Dbug
Sorry for the outage.

Yesterday when I came back (late) from my Norwegian lessons, I found out the server was dead. It was actually cold, so I suppose it had been dead for a few hours at least.

Anyway, as far as I can see, we suffered from a number of things:
- The PSU died
- That somewhat corrupted the filesystem on the two primary drives (the SSD with the OS and the HDD with all the files)
- After finally finding a compatible power supply (the joy of mini-itx cases) the BIOS told me it was corrupted, but that thankfully it was a "Dual BIOS" motherboard so it was able to restore itself (phew).
- After rebooting again, I found out that the defence-force.org and osdk.org sites worked, but the phpbb claimed I was using some ancient version of PHP, which was surprising since it had been updated to the latest just a couple months before.
- At that point I realized that GRUB failing to boot on the main boot drive decided to boot on the old Intel X25 SSD which had the original Ubuntu 10.04 install, which yes, was old and everything... but at least that gave me a working shell... Unfortunately for some reason, the fscheck tools were signaling errors
- I tried booting the main OS drive, and got a litany of filesystem errors, missing inodes, ...
- I decided to ask for help on #ubuntu (on freenode) where using a Live CD was suggested.
- Downloaded the ISO, burnt it... booted it... did not boot... yeahhhh, first failed DVD burn in a decade.
- Tried another DVD with the original Windows ISO burner, and this time that worked fine.
- A bit of checking, mounting, backuping (in case of), and after a clean reboot... it decided to stop on a blinking cursor
- Found out that the Bios had decided to put the non bootable HDD as the primary drive
- After fixing that, it booted just fine, and as far as I can see is still running.
- I installed the latest updates, rebooted again, and here we are.

So now the question is: Did anything actually got corrupted, are there some broken pages, is SVN broken, is PHPBB somewhat corrupted, etc...?

If you find anything, please tell, in the meantime I'll have to order a new PSU, and investigate if I can somewhat find some way to get less hardware problems leading to downtime, without having to use a shitty hosted server that makes my life miserable and the latency intolerable.

Thanks for your patience!

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 11:19 am
by ibisum
Just wanted to say thank you for the effort required to keep this system up and running .. and I for one appreciate the fact that you've got it running on your own personal hardware instead of putting it all out in the cloud. It adds a human element to one of my favourite sites on the internet ... ;)

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 12:02 pm
by iss
Big thanks from me too!
I've made new full check-out of the svn and compared the files and their content with last before the failure and everything is the same and OK!
Interesting, now the forum seems to work faster :).

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 1:38 pm
by Chema
Thank you very much, Dbug, for your dedication to this community.

It is amazing how everything failed... thousands of power failures and I only had once (many many years ago) a situation in which issuing an ls command dumped core :) I think I also lost the occasional unsaved changes or experimented the corruption of the file I was working on, but cannot remember a filesystem corruption, not to mention two, the BIOS and GRUP!

Everything seems to be working nicely.

Will report if I notice something, of course.

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 2:37 pm
by Dbug
Yeah, it's called Murphy's law :)

"When something bad happens, it will happen at the worse possible moment"!

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 5:02 pm
by Steve M
What a hassle.
Well done getting it working. I'll keep an eye out for any glitches.
As others said - many thanks for your efforts.

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 6:44 pm
by Vyper68
Well done getting it back up. Do you have a server fund I can send some euros to ?

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Thu May 30, 2019 8:20 pm
by Dbug
Vyper68 wrote:
Thu May 30, 2019 6:44 pm
Well done getting it back up. Do you have a server fund I can send some euros to ?
I don't hate a "server fund": I prefer to receive help from the community when I need it, like when I try to locate some software, hardware, or helping me with repairing my Orics, rather than some money :)

But thanks for asking!

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Fri May 31, 2019 2:25 pm
by Vyper68
Well if you ever want something repaired or made I am more than willing to help out, it's the least I can do for you all. :D

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sat Jun 01, 2019 6:12 pm
by kenneth
Takk for arbeidet.
Vi må feire med Akevitt! :mrgreen:

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sat Jun 01, 2019 8:13 pm
by Silicebit.
Mike, your work is invaluable. Thanks for supporting the server for us. If you need reparation for some of your Orics, here I am! :-)

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sun Jun 02, 2019 7:59 am
by Voyageur
Thank you, Dbug.
If I can make some Oric repairs (advice included, postage far too expensive :oops: , sorry...).

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sun Jun 02, 2019 11:46 am
by Symoon
Ouf!
Thanks for the time spent running this place. I've been away a few days and didn't notice anything!

Stupid question but would it be complicated to backup the whole disk(s) running the forum, let's say, once or twice a week? You know, with a sort of Ghost or Macrium thing. Would require some downtime (that I think none would mind for here), and might save yours in case of major crash.

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sun Jun 02, 2019 4:19 pm
by Dbug
Symoon wrote:
Sun Jun 02, 2019 11:46 am
Stupid question but would it be complicated to backup the whole disk(s) running the forum, let's say, once or twice a week? You know, with a sort of Ghost or Macrium thing. Would require some downtime (that I think none would mind for here), and might save yours in case of major crash.
There is an automatic daily backup of the data (using "Back In Time"), that would not really help with the OS being corrupted or the hardware failing.

The only real solution would be to have a secondary machine (virtual or not) that can take over when the primary fails, but in the case of phpbb, that means that we would not have all the latest messages, which is kind of a problem.

I'm not sure how to do a decent deduplication on a budget :)

Re: You Suffered A Catastrophic Failure, Please Surrender Your Server

Posted: Sat Jun 08, 2019 8:06 am
by Dbug
I had to switch off the system, earlier today, we are not running with the brand new PSU (If you are interested, a semi-modular Seasonic Focus 650W 80+ Gold), all the filters cleaned up, all the dust vacuumed.

The PSU is dead, long live the PSU :)