Category Archives: Incidents

And you are moving this into production, because…?

And I said “Get out of here, you Loch Ness Monster!”Let us say you are world famous fashion designer Karl Lagerfeld. You are getting ready to present your fall line to the public.  The lights go down, the models are ready to go and the show begins.

The first model hits the runway and about halfway down, Anna Wintour, editor of Vogue Magazine, suddenly jumps up and says “Darling that makeup is all wrong” and stops the model and proceeds to redo the model’s make up in front of the world’s fashion press.

What would you do?  Well, if you were Karl Lagerfeld, chances are you’d be in a Paris jail for beating Anna Wintour to death with a stiletto in front of Kanye West.  But basically this is what we in operations have to deal with on a daily basis.  Someone on the business side sees “something wrong”,  runs down to IT, grabs a developer and yells “FIX IT NOW”.  Now, there are a few thing blatantly wrong with this.  To start:

  1. Is it really wrong?  The same rules should be applying to everything, so if one data point is off for one, shouldn’t it be off for everyone?
  2. How is it wrong?  How off is the calculation? Exactly what should it be?
  3. Who are you and why are you in here yelling at a developer?  I’m  Operations.  I should be yelling at the developer, not you.  Shouldn’t you be talking to the project manager and going over what the rules are?

And of course, the one question that everyone misses.  What time is it? You see making changes in production is something that is dependent upon timing.  And timing in operations is like timing in comedy – it is everything.  There are risks involved with dropping changes into production in the middle of the day like a hot mic.  I need a reason why you are hell bent on shoving this in at lunch time.  Not because it is my lunch time (which really never matters), but because I have to answer for its possible failure.  There are reasons why changes are done during low traffic hours.  It doesn’t affect as many people if and when it blows up.  There is also the question of verification.  When this change goes in, how long do we need to wait to make sure that everything is OK?  If it is so important that your boss is two steps from an aneurism, then why does it take you three days to verify the procedure was done correctly?  Just saying.

Oh yeah, do the paperwork.  It’s not a problem if there’s not a ticket.  And straighten you tie.  Anna hates that.

There are Incidents, and then there are Incidents

What IT Operations does on a regular basis.
What IT Operations does on a regular basis. You’re Welcome.

OK, we all know that every single one of us is a delicate hothouse flower, full of potential, so that when we are unable to get to that spreadsheet or we are unable to log into the report server, those folks in IT must drop everything they are doing right now and help us continue with our  work.

Yeah, you and the other 25 delicate hothouse flowers that just called in, each with their own special set of problems that is keeping them from reaching their potential.

In Incident management, events are set up according to Priority. Put it this way, your inability to access you spreadsheet will probably take a back set to a fire in the data center. Why? Because chances are, your problem affects only you. The data center on the other hand probably will affect you, your department, and maybe every customer you have. That is how priorities are set – chances are that a problem affecting you is further down the chart than one that affects lots of people. I apologize, but sometimes the cold gust of reality can make a flower stronger.  Sometimes it kills it, but that’s not my problem.

But, you may say, my inability to access that spreadsheet is going to have major ramifications, because it is needed in Federal court tomorrow at 8:00 AM. OK, that changes the topology a little, as no one want to jerk around a lawyer or a Judge. But even that has to fall in the same scale as everyone else calling in. How do you handle it?

IN ITIL, Priorities are based on a combination of two things: Impact and Urgency

The needs of the many outweigh the needs of the few. Or the one. That would be you.

Impact counts the number of people impacted: the more people impacted, the higher you go.  If there are only a handful of people affected, it’s a low impact, regardless of social status.  Urgency is basically when you need it.  Yeah I know.  Everyone needs it RIGHT NOW.  Relax, princess, you do not need it right now.  You know it.  I know it.  Stop acting as if it’s life or death.  It isn’t.  If something is broken, the urgency is going to be higher than a request for access.  Why?  BECAUSE IT’S BROKEN.

Now, most of the time major incidents and minor stuff can co-exist peacefully, as there is usually enough staff to take care of everything.  So yes, you’re going to be able to get that spreadsheet that eventually will send your boss to jail.  So not to worry.  Just remember that there are things out there there that is more urgent and go from there.