02-05-2010, 10:16 PM
Originally Posted by Captain_Caligula
No kidding.

I'm a NOC guy as well. I do edge security and forensics. If one of my servers goes down at least two of my people are awakened from sleep and the on duty staff goes nuts.

For gods sake- there's actually a klaxon on site which goes off. Sounds like a air raid.

But the real fun starts during RCA meeting (That's "Root Cause Analysis"). That's when heads roll. That's when the guy who decided to bounce the mail server gets fired.

Oh... they know. At least 30 of them. and prolly 7 of them are running for their vehicles with sleep in their eyes and no coffee.
God, the 4am calls because nobody knows how to disable a certain system without causing a cascade of outages all because a cooling fan vibrated a cable loose and started intermittent power loss to a rack and the monitoring system is telling you it's failing when it's not so the on-duty is desperately trying to route around it when all it needs is a zip-tie and a new fan, but you don't know that for three days because it's not reproducible even though it keeps happening occasionally and nobody noticed the loose cable until AFTER you ordered a $20,000 new rack.

And that's a hardware problem....those are the EASY ones to fix.

NOC guys at Cryptic? Good luck to you guys, and ignore the idiots that have no idea what you're going through right now.