Category Archives: Technology

Always Know Where the Emergency Exits Are

margarine4_thumb.jpgLet us say that something horrible has happened to every application you have.  And let us say that something horrible happened to all of your backups at the same time.

Armegeddon?  Yeah, maybe.  But what do you do?  Well, you have to rebuild from the ground up.  But how?

There is one thing, that if you have it, can mean the difference between rebuilding and sitting in the ashes of your company and cursing whatever higher being that happens to be available at the moment.  And that is a runbook.

A runbook is very simple and yet you’d be amazed at the number of places that A) don’t have one or B) never keep it up.  A runbook is basically instructions for the morons in Operations (like me) to rebuild your application from the ground up.  Simple as that. You make a change to your application, the record of the change goes in the runbook.  You move to a new server, the notation goes in the runbook.  You do anything to your application, it needs to be noted in the runbook. That way, if disaster strikes you have an ace in the whole.

So who creates and updates the runbook?  The development team.  And who houses the runbook?  Operations. Why?  Operations houses it because we are going to be the ones who need it should something happen.  Because Operations has multiple applications to watch.  We do not know the intricacies of your application like the people who created it.   So do not ask Operations if what is in the runbook everything we need.  Short answer: we don’t know. It’s your application.  You hand us the runbook with the understanding that you have put YOUR knowledge into it.  Now when a change goes into the runbook, it is operations responsibility to note the change went in and when.  But if you decide to move to a new server or make major structural changes, again, it is the development team’s responsibility to update.

And that is where the problems with runbooks lie. Everyone is pressed for time, no one ever has a technical writer when needed and all of us have a tendency to say “We’ll work on that tomorrow”.

And sometimes, tomorrow is not exactly what you expect.



Your business unit has decided that everything must go.  There’s a new logo that must be on everything.  They hate the interface; it looks dated.  Animation?  Ugh.  Nothing works right on their phones.  They need apps, for God’s sake.  Reports are not giving them the information they need.  And who the hell approved that lime green color everywhere?

To paraphrase Bowie, “I’m quite aware of what you’re going through.”

But before you start changing everything, you first need to plan this out.  You know that.  I know that.  That is what a change ticket is all about.  Once you have your business needs arranged and know what has to be where and why, then you need to fill out a change form and have change management have a look at it.  You need to do this before you drop the first keystroke of coding.  Why?  Not all change happens in a vacuum.  What you change may well affect another user or business group.  As change manager, I need to know what you are planning on doing.  I may be aware of something else that is going on that may conflict with your plans.  If you are upgrading, I need to know in advance so I can start to update information.  You may have to make some alterations dependent upon what is going on, and it is better to have that information up front, rather than the day you want to deploy.  And if I know in advance I can make sure that everything is ready for you.  Change Management is your partner, not your adversary.

Cue the sax solo…

How Do You Solve a Problem Like…

ITIL is not an abbey, but was Maria an incident, or a problem?

At what point does an incident become a problem?  That is a question that unfortunately may have multiple answers.  After all, in ITIL, an incident is something that happens.  If it continues to happen, or if multiple people have it happen to them, it can become problematic.  But a problem?  As much as I detest ‘squishy’ answers, the answer to this conundrum is, it depends upon your organization.

Some organizations take a lot of pain before assigning a problem.  Some almost immediately assign incidents a problem depending on the length of time the incident is open, or the number of people calling in who do not want to follow instructions in the knowledge base.  It really does depend. There’s the squish.

However, there is a good rule of thumb.  First, you need to know the severity of the incident. Is this almost all of your customers or just a handful?  If this is something that is happening to just some people, then is there a knowledge-based article about it?  If not, there needs to be.  If you do have a KBA out there, are people using it and what are their reactions to it? Is it perceived as an easy fix or do they need to perform a voodoo ritual in order for it to work?  Trust me, your customers will let you know.

Then you need to assess the incidents, your customers and the fix itself.  As this is still an incident, it has passed through all of the levels of support, so you will have some general idea as to what needs to happen to fix it.  And by now, you know about how many people this incident affects.  The next question is, how much pain is you organization capable of enduring?  No one like to hear the same complaint over and over.  But sometimes, the fix is more than the disease.  Sometimes, however, a fix easily can be found and there you are.  Problem it is and soon to be problem solved.

But sometimes, problems cannot be solved by marrying them off to well known Austrian war heroes.  More about that in the next post.

Show Me On the Ticket Where the Bad Thing Happened

If you picture your Help Desk like this, it can make you feel a little better.

Not to sound dramatic, but an Incident is kind of like a crime scene. In order to piece everything together the person on the help desk needs to know exactly what happened.  Everything.  Yes I know, what you need doesn’t work, but I need to reassemble the crime scene in order to see what happened.

That is why putting everything on your ticket is helpful.  This is not a blame game, but an investigation. Sometimes, (actually, more than you think) one software program is at odds with another, and yes, this can happen when the software packages are created by the same company (I’m looking at you, Microsoft).  Sometimes it is user error.  Hey, we all make mistakes.  And sometimes what is happening can actually be a symptom of something far worse.  But unless the incident team knows what is going on, the incident is nothing more than ” Damned if I know.”

So, what needs to be on the ticket?  First, besides your name and how to contact you (which should be automatically recorded on the ticket), I need to know the following:

  1. The time it happened.  This helps us if we need to check server logs.  Things happen all the time, so knowing when it happened helps us dig through the chaff.
  2. Exactly what happened.  The more detailed you can be, the better.  Just saying “It doesn’t work” means we have to call in Miss Cleo and her tarot cards to divine what happened.  And while I love her fake Jamaican accent, she’s never right.
  3. What else was running in the background.  Excel is giving me an error message and I have Word and Visio also running.  This may have something to do with it, it may not.  But we know that there are other avenues we might be able to check if our first assumption is wrong.
  4. Screenshots.  Just like a crime scene investigation, pictures record a lot more than people think.  If you have an error message, get a screen shot to add to the ticket.
  5. If your ticketing system does not capture the information concerning the computer you are on (Some do, some are stupid), then please add that to the ticket as well.  There are times when the hardware does not play nice with the software.

In other words, nothing gets ruled out at the start.  Once we can verify the alibis for various parts, then we can find the perp, solve the problem and wrap up the case a lot neater the Law and Order sometimes does.  If we find that what you are experiencing is part of a larger problem, well, we have a larger case to solve.  We’ll keep you updated.  Olivia Benson never gives up.  Neither do we.  *Chu-CHUNK!*

You Are Not The Most Interesting Developer in the World

Where I work, we deal with developers who were apparently dropped on their heads at a very young age.  Repeatedly.  Onto concrete.  Repeatedly.  I also work with project managers who apparently had the same thing happen to them.

The reason I point this out is because some people truly believe there is a cost savings in combining development, test and production in to one happy little world.  You know, because every developer does everything right the first time. Or at least combining development and test, because, really, those are one in the same.  I actually had a PM tell me this in the last week.  As my father used to say, “That boy is about as sharp as a sack of wet mice.”  I’ll simply add, “Bless his heart”.  Because thinking it can be all in one is the coding equivalent of “Hold my Beer”.

For the uninitiated, the development area is the place where the developers can make multiple horrendous mistakes, or develop, take you choice of phrasing.  The test area is a place where you actually make sure that whatever you have developed installs as it should and that everything should be working as it is designed to.  Production is that area that everyone actually uses and they don’t take kindly to things changing or not working as designed in the middle of an actual transaction.  If you think they do, then I’ll give them your number and you can be a customer service rep as well.  Have fun.  Some groups also include a fourth level – User acceptance testing, or UAT.  This is an environment that your test group of actual business users beat on to make sure everything works to their standards after QA, but before production. I personally fond of the four-tier layout, because It brings the business in on the final product, and anything found afterwards in on their heads, not the development team’s.

Now in this multilayer fiesta there are some points that as operations, I demand.  First, all environments  should resemble production in the fact that the server structure is the same, and you have the same basic software running in the background, because there are indeed differences between, say, .NET 3.5 and .NET 4.5.  Really, there are.  That’s why they are different numbers.  Someone at Microsoft didn’t just decide to change it because they were bored.  If your production server is running .NET 3.5, then why are you developing on something else? Do not talk to me about upgrading on the fly because of backward compatibility.  Sorry, it is something that may work 90 percent of the time, but that remaining ten percent always bites you in the ass.  Replication applies to your databases as well.  Table structures need to be the same.  Change one thing, it needs to go through the whole set and it needs to be tested.  Yes, it is a pain in the neck.  Yes, it takes time.  Why all these restrictions?  Because you are not a cowboy.  You are in IT.  You want freedom, move to Montana and raise dental floss.  But do not ask me to hold your beer.

And you are moving this into production, because…?

And I said “Get out of here, you Loch Ness Monster!”Let us say you are world famous fashion designer Karl Lagerfeld. You are getting ready to present your fall line to the public.  The lights go down, the models are ready to go and the show begins.

The first model hits the runway and about halfway down, Anna Wintour, editor of Vogue Magazine, suddenly jumps up and says “Darling that makeup is all wrong” and stops the model and proceeds to redo the model’s make up in front of the world’s fashion press.

What would you do?  Well, if you were Karl Lagerfeld, chances are you’d be in a Paris jail for beating Anna Wintour to death with a stiletto in front of Kanye West.  But basically this is what we in operations have to deal with on a daily basis.  Someone on the business side sees “something wrong”,  runs down to IT, grabs a developer and yells “FIX IT NOW”.  Now, there are a few thing blatantly wrong with this.  To start:

  1. Is it really wrong?  The same rules should be applying to everything, so if one data point is off for one, shouldn’t it be off for everyone?
  2. How is it wrong?  How off is the calculation? Exactly what should it be?
  3. Who are you and why are you in here yelling at a developer?  I’m  Operations.  I should be yelling at the developer, not you.  Shouldn’t you be talking to the project manager and going over what the rules are?

And of course, the one question that everyone misses.  What time is it? You see making changes in production is something that is dependent upon timing.  And timing in operations is like timing in comedy – it is everything.  There are risks involved with dropping changes into production in the middle of the day like a hot mic.  I need a reason why you are hell bent on shoving this in at lunch time.  Not because it is my lunch time (which really never matters), but because I have to answer for its possible failure.  There are reasons why changes are done during low traffic hours.  It doesn’t affect as many people if and when it blows up.  There is also the question of verification.  When this change goes in, how long do we need to wait to make sure that everything is OK?  If it is so important that your boss is two steps from an aneurism, then why does it take you three days to verify the procedure was done correctly?  Just saying.

Oh yeah, do the paperwork.  It’s not a problem if there’s not a ticket.  And straighten you tie.  Anna hates that.

Welcome to IT!

There are more of these guys than you think.

Yeah, you’re going to have a lot of fun here. From the inability to sign on to the network, to people deleting three weeks of work in a single key stroke, to Ms. Whitcombe not understanding why her computer keeps freezing up, even though she has been warned about that coupon site five times it is best to understand the one rule of the jungle:

It is all your fault. Even though you have absolutely no control over the stupidity of your fellow co-workers, it will always be your fault.

OK, so what are you as an IT organization going to do about it? You could take your lumps, or become cynical because there are some people who should not be around a computer under any circumstances. Or you could find a way to collect data on every problem that plagues your company and find out how to prevent it in the future. Most IT Departments are looking for 2 things: Excellent customer service, leading to happy customers and great productivity. Of course achieving these lofty goals for little or no money is also on our mind. But these are the basics that drive us: Productivity and Customer Service.

Allow me to introduce you to ITIL.

Yes, another ingredient in the IT alphabet soup. Groan all you want, but unlike cleverly named languages, ITIL is a common sense process, created to make sure that the things that drive your co-workers crazy are reported and looked at, problems are given solutions and changes are made with knowledge and forethought. Because while everything may be your fault, it is still your responsibility to fix it. ITIL gives you a roadmap. Get into the car. Time to drive.

The Handy Dandy Mission Statement

This is a reboot of sorts.  Before, this blog was more of a snarky review of everything technology related.  Now, it is more of a review of technology, methods and madness that the IT community faces on a daily basis.  OK, there will be some snark, but the bulk of this will be attempted with a semi-straight face.

Mainly, I hope to show the difference between the hype and the reality of technology.  There is always that magic moment when people realize that brand new technology X will not solve all their problems.  It doesn’t mean that the technology is bad or poor or evil, it simply means that X does not do what people originally thought it would do.  Every company’s needs are slightly different .  The phrase “Your Mileage May Vary” is key.

So, here we go again.