Double Edged Work Sword

Change of Plans |
I started off writing this to talk about a triage situation that I had to deal with today at work then I went off on how I deal with my team and talked a bit about Teaching. My Triage post will have to be another one.

One of my more double edged responsibilities at work fits me perfectly. My department is built like a pyramid with the NOC I, NOC II and NOC III staff( frontline doods) support staff at the bottom. The next smaller level of the pyramid is the amazingly talented operations engineering staff. The top of the pyramid is me as the sole senior Engineer. Issues filter up to the pyramid until they either reach me, or until the ops engineers call Microsoft for support. If situation filters to me or Microsoft PSS depends on the situation, time of the day, and a number of other things. Either way the situation will be handled. The Engineering team at work is insanely talented; I cannot get over how intelligent my team is considering some of them are in their first IT jobs. There is not a single member of the team who has not taught me a number of things since I Started. If you cannot tell I am very proud of the team that I am a part of.

Now back to my purpose for typing this post; my double edged sword. That double edge sword is that the buck stops with me. I have the power to make the calls, and the reasonability of dealing with the fallout. When I make the calls I am not asking for permission, I am being asked for permission, or I am out there doing it. If I screw up on the call it my arse that is on the line. The second I get involved in an issue and tell someone to do something I take full ownership of the blame for anything that might happen. I don’t care who is wrong, once I have touched it, I will take ownership for the entire issue. I will NEVER let one of my guys take the fall for something that I touch, even if it was their fault. That is who I am, and that is my commitment to my team.

Today I walked into the NOC and found one of the engineers, Jeremy, working a Severity 0 incident with a larger customer. Severity 0 (Sev0) = A server is down and email is not flowing, we drop everything to fix it and elevate to get it fixed as quickly as possible. This is the stuff that I love, crisis management, and fun problems with no obvious solutions. I stayed in the NOC for the next 2 hours with Jeremy and he worked on the issue and interacted with the customer while I helped steer him in the right direction and make suggestions. Steering and not taking over is another part of who I am that is perfect for this job; I love to teach. If I am helping someone with something I always try to help through their hands. In my mind that’s the best thing I can do to help them learn the what, where, and why of the steps that we are taking to make things work. Jeremy and I tried all kinds of things to fix this one issue. Like more issues one thing lead to the next, and so on, until we ultimately ended with a fix for the problem.

On one of things that we provide customers for Sev0 incidents is what call Root Cause Analysis (RCAs) In an RCA we detail what we did to solve the problem, what caused the problem and what we suggest to do to avoid the problem in the future. Sometimes an RCA is simple, sometimes an RCA is not possible, and lastly you have the choice of determining the root cause or fixing the problem. Today was one of those days where there was no way to work out what the root cause without prolonging the downtime.

Tune in Next time where I talk about Situation triage and tie that back into this post.


Related Posts with Thumbnails

About Kevinm