Oct. 16th, 2007
Geek pop quiz.
Oct. 16th, 2007 04:48 pmSo, I have a server. It runs Debian and sits, serving a CVS repository and running LAMP and SSH and all that standard server stuff.
It's recently begun intermittently refusing to respond to *any* external requests for anything except ping, for a few minutes at a time, before coming back up again. Stopping and restarting the networking services brings it up again briefly, but it doesn't stay up. Nothing shows in any of the logs - they *do not see* any problem. In fact, getting something to go *out* from the server is the best way to bring it back up again, to the point where it's now sending me an email every minute just because that seems to work.
I also sat pinging the machine for a few hours, to see what happened to the pings. And I found something very, very, very weird.
Here's a snippet of my ping log when server went down:
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
.... and from when it came back up again.
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
No lost packets, no nothing. However, the TTL counter jumped to *150* while it wasn't answering on any interface, and dropped back down to 64 again as soon as it started responding.
It does this every time. Every time the connection drops, TTL jumps to 150. Every time it comes back up, it goes back to the proper value of 64.
That makes this software-related, guaranteed. The question is, what the fuck?
There were no software changes that I'm aware of. Machine has multiple NICs, but the behaviour happens regardless of which one or ones are plugged in, enabled, configured, etc. It happens to *all* services at once, and they all come back up again at once.
(And yes, tried a different cable and port and switch, just in case, too. No change)
Thoughts?
It's recently begun intermittently refusing to respond to *any* external requests for anything except ping, for a few minutes at a time, before coming back up again. Stopping and restarting the networking services brings it up again briefly, but it doesn't stay up. Nothing shows in any of the logs - they *do not see* any problem. In fact, getting something to go *out* from the server is the best way to bring it back up again, to the point where it's now sending me an email every minute just because that seems to work.
I also sat pinging the machine for a few hours, to see what happened to the pings. And I found something very, very, very weird.
Here's a snippet of my ping log when server went down:
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
.... and from when it came back up again.
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
No lost packets, no nothing. However, the TTL counter jumped to *150* while it wasn't answering on any interface, and dropped back down to 64 again as soon as it started responding.
It does this every time. Every time the connection drops, TTL jumps to 150. Every time it comes back up, it goes back to the proper value of 64.
That makes this software-related, guaranteed. The question is, what the fuck?
There were no software changes that I'm aware of. Machine has multiple NICs, but the behaviour happens regardless of which one or ones are plugged in, enabled, configured, etc. It happens to *all* services at once, and they all come back up again at once.
(And yes, tried a different cable and port and switch, just in case, too. No change)
Thoughts?
A Slice Of Life.
Oct. 16th, 2007 08:06 pm![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Me: .... why?
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Me: Seeing that movie for Eccleston is like going to see Transformers because you love Hugo Weaving, or watching Blink because you need your fix of Martha Jones.
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Me: Oh, come on, like YOU'D get "Anthrax AND Leprosy" on the first try