I hate computers.
Apr. 6th, 2011 10:59 amHere's a stumper for you.
Once upon a time, long ago, there was a DNS server. This DNS server no longer really exists - it's still there, still turned on, but nothing uses it for DNS. This machine also used to be a mail server, but, again, nothing uses it regularly, these days. It's only still taking up server room space and electricity because there's a couple of IMAP mailboxes there that people still access once in a while.
There is an SQL server (which also serves internal web pages) and a Web server (for external pages, calling the SQL server and accessing the same databases as the internal pages) The web server lives in a DMZ, with pinholes from it to the CURRENT DNS server (not the old one!), and to the SQL server. The web server cannot, under any circumstances, see the old DNS server.
If the old DNS server is turned off, page serving from the Web server is EXTREMELY slow. Like, 10-15 second delays before pages load. This ONLY happens to pages that make database connections - serving local files with no DB connection is fast.
The SQL server reports no errors, and serves its local, internal pages pulling from the exact same database at normal speed. Logs show that it the DB requests are being served as soon as they arrive, they're just not really arriving on time.
I logged into the old DNS server and started killing services. With NOTHING running on it (nearly literally - it was sitting there with an IP and that's all, no programs or services running) the web pages are perfectly fast. As soon as it doesn't have a network connection, the web server gets these delays again.
I set up a new SQL server and pointed a copy of the website, running on the same web server, at it. I rebooted the old DNS server while watching the pages. The "real" page going to the real SQL server, slows to a crawl. The fake page to the new SQL server is still it's normal blazingly fast self.
So, the problem has to be the SQL server.... except that it serves all requests EXCEPT the Web Server onces perfectly fast, and it serves the Web Server requests as soon as it gets them. It just isn't getting them.
The Web server *cannot* speak to the old DNS server. It simply can't reach it, and has never been programmed to reach it. The web server postdates the DNS server's decommissioning. It CAN reach the current DNS, but the current DNS don't speak to the old DNS. Also, the problem doesn't happen when Bind on the old server is stopped - it only happens when the old server is turned off or unplugged from the network.
Judicious use of grep on the Web server, SQL server, and other DNS servers on the network have shown that there are NO references to the old DNS server anywhere in their configuration, by name, alternate name, or IP.
The test SQL server I set up is an exact mirror of the normal SQL server's configuration, with only the hostname and IP address changed - and yet, calls to it don't crawl the way calls to the normal SQL server do.
I'm at the point of stealing the IP of the old DNS server with my laptop and running Wireshark just to see what the hell is calling. I'm *that* stumped. Any ideas, other than that?
Once upon a time, long ago, there was a DNS server. This DNS server no longer really exists - it's still there, still turned on, but nothing uses it for DNS. This machine also used to be a mail server, but, again, nothing uses it regularly, these days. It's only still taking up server room space and electricity because there's a couple of IMAP mailboxes there that people still access once in a while.
There is an SQL server (which also serves internal web pages) and a Web server (for external pages, calling the SQL server and accessing the same databases as the internal pages) The web server lives in a DMZ, with pinholes from it to the CURRENT DNS server (not the old one!), and to the SQL server. The web server cannot, under any circumstances, see the old DNS server.
If the old DNS server is turned off, page serving from the Web server is EXTREMELY slow. Like, 10-15 second delays before pages load. This ONLY happens to pages that make database connections - serving local files with no DB connection is fast.
The SQL server reports no errors, and serves its local, internal pages pulling from the exact same database at normal speed. Logs show that it the DB requests are being served as soon as they arrive, they're just not really arriving on time.
I logged into the old DNS server and started killing services. With NOTHING running on it (nearly literally - it was sitting there with an IP and that's all, no programs or services running) the web pages are perfectly fast. As soon as it doesn't have a network connection, the web server gets these delays again.
I set up a new SQL server and pointed a copy of the website, running on the same web server, at it. I rebooted the old DNS server while watching the pages. The "real" page going to the real SQL server, slows to a crawl. The fake page to the new SQL server is still it's normal blazingly fast self.
So, the problem has to be the SQL server.... except that it serves all requests EXCEPT the Web Server onces perfectly fast, and it serves the Web Server requests as soon as it gets them. It just isn't getting them.
The Web server *cannot* speak to the old DNS server. It simply can't reach it, and has never been programmed to reach it. The web server postdates the DNS server's decommissioning. It CAN reach the current DNS, but the current DNS don't speak to the old DNS. Also, the problem doesn't happen when Bind on the old server is stopped - it only happens when the old server is turned off or unplugged from the network.
Judicious use of grep on the Web server, SQL server, and other DNS servers on the network have shown that there are NO references to the old DNS server anywhere in their configuration, by name, alternate name, or IP.
The test SQL server I set up is an exact mirror of the normal SQL server's configuration, with only the hostname and IP address changed - and yet, calls to it don't crawl the way calls to the normal SQL server do.
I'm at the point of stealing the IP of the old DNS server with my laptop and running Wireshark just to see what the hell is calling. I'm *that* stumped. Any ideas, other than that?
(no subject)
Date: 2011-04-06 04:50 pm (UTC)(I've got nothing. Additionally, I don't have any of the know-how to actually have anything.)
(no subject)
Date: 2011-04-06 05:03 pm (UTC)Grabbing the IP address from the old DNS box and moving it to the web server box would be my next move.
I'm at least 20% confident here, so good luck with that.
(no subject)
Date: 2011-04-06 05:18 pm (UTC)(no subject)
Date: 2011-04-06 05:12 pm (UTC)I'd just grep in /etc/*/* jne other relevant dirs for the IP address/name and see if it is in confs somewhere. ;)
For starters! Then I'd swear a lot and ask and just as I post my question I figure out where the problem lies and I'd be forced to post "NM, I R STUPID". ;)
(no subject)
Date: 2011-04-06 05:19 pm (UTC)I've already done the grep bit and found nothing. And, unfortunately, posting it here hasn't led me to a revelation, which was kind of what I was hoping for.
(no subject)
Date: 2011-04-06 05:25 pm (UTC)Well, I'm just glad that I'm not the only one who does "ask and receive instant enlightenment" thing.
I think hijacking the IP might be a good idea and checking who calls what and where and why. Clearly, something is calling home.
If you find out, do let us know, 'cuz this has made me curious. :)
(no subject)
Date: 2011-04-06 05:30 pm (UTC)(no subject)
Date: 2011-04-06 06:04 pm (UTC)This is quite a puzzle you've got here.
(no subject)
Date: 2011-04-06 06:11 pm (UTC)I wonder what the SQL server's routing table looks like.
(no subject)
Date: 2011-04-06 06:34 pm (UTC)Hmmm... Any chance the old SQL server has logging to an external machine set up, and is looking to the old DNS server for information about it?
(no subject)
Date: 2011-04-06 06:41 pm (UTC)(no subject)
Date: 2011-04-06 07:00 pm (UTC)#2: Because the Unstable Legacy Machine is absolutely not allowed to have radical and unclean changes made to it that may result in the unavailability of the ancient-ass IMAP boxes. Also, if I break it we don't have a fix for the website being slow.
(no subject)
Date: 2011-04-06 07:01 pm (UTC)(no subject)
Date: 2011-04-06 07:02 pm (UTC)#3: Because the machine is so old it wants Ethereal, not Wireshark... and requires X to go with it.
On the other hand, writing network traffic to stdout should be doable.
(no subject)
Date: 2011-04-06 10:51 pm (UTC)(no subject)
Date: 2011-04-07 05:21 am (UTC)tcpdump -i ethx -vvv host olddns and \( sqlserver or webserver \) -w output.cap
Then transfer the cap file to a box with wireshark and open it up.
(no subject)
Date: 2011-04-07 05:05 pm (UTC)Yeah, did that. It's fucking RDNS lookups, and I don't know why.
(no subject)
Date: 2011-04-06 07:37 pm (UTC)(no subject)
Date: 2011-04-06 07:57 pm (UTC)(no subject)
Date: 2011-04-07 01:37 am (UTC)Also, the tool you need is iptraf.
(no subject)
Date: 2011-04-07 03:29 am (UTC)Though this is all more akin to a genetic disorder of flowers than anything an admin should be dealing with, so lifting off and nuking from orbit would be my advice.
(no subject)
Date: 2011-04-06 08:01 pm (UTC)(no subject)
Date: 2011-04-06 08:06 pm (UTC)(no subject)
Date: 2011-04-07 12:12 am (UTC)(no subject)
Date: 2011-04-07 01:29 am (UTC)(no subject)
Date: 2011-04-07 03:33 am (UTC)(no subject)
Date: 2011-04-07 12:51 pm (UTC)Second: Not exactly, because we know precisely where all the machines involved are on the network and in the office, at at all times. We just know they're talking behind our backs.
(no subject)
Date: 2011-04-07 01:38 pm (UTC)But yes, there are photos, and I know a couple of the eye-witnesses.
(no subject)
Date: 2011-04-07 01:49 pm (UTC)(I also got the cupholder and the "no, I can't see bheind my computer because the power is out" guy. No, really. There Are No Apocryphal Computer Stories.)
(no subject)
Date: 2011-04-07 07:34 pm (UTC)(no subject)
Date: 2011-04-07 01:41 pm (UTC)1) Check the firewall
2) Check the router/switch
Also, if you turn off the box and bring up the IP on another box, does it still "fix" the problem? An IP alias is a cheap fix if it keeps things running while you chase down the real culprit.