(no subject)

Date: 2014-11-18 08:49 pm (UTC)
Of the machines that failed, 4 were powered off the entire time, one was a dumb-as-hell 100M workgroup switch, and one was remotely powered on by an overeager admin[1] when power and internet came back up 1.5 hours through the scheduled 3 hour outage and he didn't think to question *WHY* power was on so early or whether work was actually complete, which meant that machine was *on* when the power went out again. The UPS should have helped, but still, 2 hours later everything came back up and that server had lost a power supply.

(So that machine wasn't actually "dead" since only 50% of it's power failed and it can run on 50%. On the other hand, a cleanly shut down Xen hypervisor is MUCH easier to bring back up than one that had the power yanked again after booting)

Essentially, what should have been a 45 minute Saturday turned into *hours* due to human error, and then there were a really wacky number of unexpected hardware failures in the more elderly kit.

[1]: Who was not me, for the record.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

theweaselking: (Default)theweaselking
Page generated Jun. 17th, 2025 04:00 pm