Oct. 16th, 2007

theweaselking: (Default)


King George, a rare King cheetah, licks his lips in anticipation of birthday cake at the Miami Metrozoo. The cake consisted of chunk meat, dry kitten chow, lean turkey, bacon candles and mashed potato frosting.
theweaselking: (Default)
So, I have a server. It runs Debian and sits, serving a CVS repository and running LAMP and SSH and all that standard server stuff.

It's recently begun intermittently refusing to respond to *any* external requests for anything except ping, for a few minutes at a time, before coming back up again. Stopping and restarting the networking services brings it up again briefly, but it doesn't stay up. Nothing shows in any of the logs - they *do not see* any problem. In fact, getting something to go *out* from the server is the best way to bring it back up again, to the point where it's now sending me an email every minute just because that seems to work.

I also sat pinging the machine for a few hours, to see what happened to the pings. And I found something very, very, very weird.

Here's a snippet of my ping log when server went down:

Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time<1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150

.... and from when it came back up again.
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=150
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64
Reply from 192.168.1.2: bytes=32 time=1ms TTL=64

No lost packets, no nothing. However, the TTL counter jumped to *150* while it wasn't answering on any interface, and dropped back down to 64 again as soon as it started responding.

It does this every time. Every time the connection drops, TTL jumps to 150. Every time it comes back up, it goes back to the proper value of 64.

That makes this software-related, guaranteed. The question is, what the fuck?

There were no software changes that I'm aware of. Machine has multiple NICs, but the behaviour happens regardless of which one or ones are plugged in, enabled, configured, etc. It happens to *all* services at once, and they all come back up again at once.

(And yes, tried a different cable and port and switch, just in case, too. No change)

Thoughts?
theweaselking: (Default)
[livejournal.com profile] torrain: I want to go see The Dark Is Rising.

Me: .... why?

[livejournal.com profile] torrain: Ian McShane and Christopher Eccleston!

Me: Seeing that movie for Eccleston is like going to see Transformers because you love Hugo Weaving, or watching Blink because you need your fix of Martha Jones.

[livejournal.com profile] torrain: Alternately, I could stay home with you and watch Hugh Laurie play Whack-A-Mole with diagnostics. Wheee, thud, I guess that wasn't it! Look, another one! Thud!

Me: Oh, come on, like YOU'D get "Anthrax AND Leprosy" on the first try

Profile

theweaselking: (Default)theweaselking
Page generated Jul. 25th, 2025 08:45 pm