theweaselking | Dec. 21st, 2011

Anyone else finding they can't make Comments, *at all*, with the new LJ comment code? The "post a new comment" button stays grayed out forever for me.

(Also: It's animated! And SLOW! And the controls hide themselves until you hover! These are all TERRIBLE interface decisions!)

EDIT, since I can't make comments: The button doesn't switch away from grayed-out, for me. Which means it's almost certainly depending on a script that *isn't* hosted on livejournal or ljtoys. Which is INSANELY STUPID.

I'm tempted to go through one at a time enabling scripts to find which one it is. I'm also tempted to just say "fuckit" and take another crack at making a non-unusably-fugly Dreamwidth. I've tried that before, and failed.

EDIT2: It was a bad interaction between the Greasemonkey script I use to eliminate Facebook "like" buttons and the livejournal comment code. Disable that script, problem solved.

So, there's this server.

It was, when new, a beast. It's still pretty nice - quad-core Xeon processor, 8GB RAM, 5-drive hot-swappable RAID-5 (4 live, one hot spare), redundant power. It's a perfectly acceptable server for a small business to use for their MySQL databases and their internal website and their subversion server, etc.

Except, there aren't *5* drives in this thing.

There's 6.

Tucked away inside the case somewhere is a 300GB 5400RPM SATA drive. With the OS installed on it. And the applications. And the data. In fact, that lovely RAID is being used for one and only one thing: A backup copy of the subversion repo, last changed in 2008.

(The person responsible for this no longer works with the company. I respect his work, in theory - he did a lot of pretty cool things with limited resources! - but I'm constantly running into WHAT THE FUCK WERE YOU THINKING moments, like this one.

So, this morning. We've got scheduled downtime, to fix several other problems, so I've got approval to fix this at the same time.

The plan is very simple and involves as few changes as possible:
#1: rsync /var to RAID/var
#2: move /var to /var-old
#3: symlink /var to RAID/var

That's it, we're just moving all the data files for the SQL databases, all the web content, and also all the subversion stuff onto the RAID. We did a full sync in the weeks before today, and then this morning we shut down MySQL and Apache and told it to do one final sync.

Now, we're using InnoDB. And this is a well-used server that was configured by someone who obviously didn't know what he was doing sometimes, so ibdata1 is a little more than 13GB.
That's fine! It doesn't hurt the system. It just means that the DATABASE portion of the sync is going to take as long this time as it did the first time, and that ain't fast. That's why the *first* thing I did was kill the services and start that sync, before we even moved on to the rest of the process.

But!

The sync would have gone faster if
1: the guy changing the battery in the UPS had realised that the UPS batteries are hot-swappable and don't require the UPS be turned off
2: the guy changing the battery in the UPS had realised that he should be touching the UPS with the red flashy "change battery" light and not the UPS above it whose happy little lights are all happy green.
3: the guy changing the motherboard in the SAN hadn't slipped and yanked the power from the SQL server, the DNS servers, and the primary domain controller.

TWO unexpected hard poweroffs later, the sync is running again. And since that fucking 13GB file hadn't finished[1] the whole database part of the sync needed to restart from scratch, again. Twice.

So, yeah. That took way longer than expected.

Bonus awesome: The website for the company pulls database access for some applications from this database server.
Which means that some, not all, parts of the public website were producing SQL errors.
There's a company who does uptime monitoring and notifications for us. They're also the primary DNS hosts. They *knew* about the maintenance window. But when it went long, did they call us to ask why? No. Did they assume it was just maintenance and ignore it? No. They switched primary DNS for the website to our alternate.

The alternate is *not up* currently. It's a POS shared-hosting account that we stick a placeholder page on twice a year or so when we're expecting total downtime for longer than a few hours. The placeholder wasn't even there!

So I had to fix THAT, too.

Then I had MEEEEEEETINGS.

I have had a day.

[1]: AND since the machine started MySQL automatically on the reboot because I didn't about what that would mean. Even if the file HAD gone across, it would have been changed and needed to be resync'd. I mean, it might not really have NEEDED to be resync'd, but it would have wanted to be.

I have seen the future.

And it doesn't make sense.

Dec. 21st, 2011

Navigation

Dec. 21st, 2011

All change is stupid and evil.

(no subject)

(no subject)

A new headache.

My day has been something.

Profile