Increase Availability with Ticket Excellence

Everybody has troubles — especially IT professionals. But for obvious reasons, they don’t have the luxury to ignore them, run from them, put them off, or pass them on to somebody else. Doing so would undermine availability. Perhaps the availability of a server or a desktop, or even a workforce. No, in the world of information technology, we document our troubles. We track them. And we wrestle them to the ground to restore precious availability. We write down what we think is wrong, and we write down what we did to fix it. If we can’t find a permanent solution, we put a workaround in place. And we have to document everything, because if it’s not documented, it didn’t happen. It’s all part of trouble ticket excellence — and it’s a lot of work.

Trouble Tickets and Those Who Use Them

Everything is digital now, but in the early days of IT troubleshooting, engineers were given a card or paper slip that described the problem, its location, and other details. Until recently — and perhaps even now — repairmen of all sorts worked from paper tickets torn from service books. Small automotive repair shops, for instance, still use this method. You would be hard pressed to find any IT department still using the paper method for assigning and tracking trouble tickets. We are talking about IT professionals, after all.

A blog post by the customer support company Kayako tells the history of customer service, including the advent of trouble ticketing. Blogger Varun Shoor writes that Ron Muns founded the Help Desk Institute (HDI) in 1989. The aim was to give employees a central place where they could get technical support for their IT infrastructure. HDI continues to provide education, certification, and other membership benefits.

So who might use a trouble ticket? The answer is simple: everyone. When a system user calls customer support or the help desk to report a problem, the first-level (or Tier 1) support person will open a trouble ticket. The greatest skill required of Tier 1 personnel is the management of issues. They may ask the user many questions and may do basic technical troubleshooting. As they work, they are continually typing in the ticket. If they cannot solve the problem, they may forward the ticket to a Tier 2 engineer. Technical escalation may mean the ticket goes on to a Tier 3 engineer, or even to an equipment vendor, who is sometimes referred to as Tier 4.

Trouble tickets are used by departments called help desks, service operation centers (SOCs), or network operations centers (NOCs). Service personnel may be divided into groups based on level of support or the type of platform. But it’s not just technical people who use trouble tickets. Managers get involved when management escalation is required. And tickets are used to crunch numbers for reports. Company executives may want to see such key performance indicators (KPIs) as:

Mean time to repair (MTTR)
Mean time to close (MTTC)
Number of tickets open
Number of tickets closed
User satisfaction rating

Increase Availability with Maintenance, Change Control, and KBs

Just as technical maintenance can be viewed as either proactive or reactive, trouble ticket systems use two approaches to issues. A reactive ticket is generally one that is opened because an issue already exists. If it is an issue that affects just that user, then the ticket will be dealt with as a problem unique to that customer. But if there is a broader system outage that is known to the 1st-line tech or customer service rep, they may want to attach the ticket to an “event” ticket. This single ticket would then have a subordinate relationship to the event ticket, which may be described as master-slave, parent-child, or with some other terminology. In this case, when the master ticket is updated or closed, the same will happen to the subordinate ticket.

Automation is changing so much of the trouble management process.

Automation is changing so much of the trouble management process. Proactive tickets can be opened by techs who recognize a problem, but the more common scenario these days is that a ticket will be automatically opened and sent to the appropriate personnel when an issue occurs. The ticket generation could be created by a specific event, such as a connection going down, or it could be created once a certain parameter meets a predetermined threshold. For instance, when a set number of errors occur, the system could open a ticket. Or suppose traffic on a line becomes congested, or a link is no longer available. When the numbers become too large, automation will generate tickets.

Incident management is not the only role of ticket systems. They can also be used for administrative tasks, such as customer notification or tracking routine tasks. They can also be integrated with a company’s change control system. Change control, including methods of procedure, is a way to document moves and changes affecting hardware and software in a live system.

The potential for database integration is great. Many companies cleverly link trouble ticketing systems with their knowledge base. This way a technician or engineer can find clear procedures and solutions for dealing with the problems before them.

Increase Availability with Ticket Excellence

Trouble tickets are meant to address troubles, and those could be just about anything, but often impact availability of some system or another. The methods that IT professionals have developed to deal with these troubles have developed over time. It’s only natural that things break down, become obsolete, or need special attention. It can be overwhelming.

So how do you keep up? That’s the subject of an article in Techopedia called “IT Infrastructure: How to Keep Up”. The author deals with infrastructure management from the early days to the present. But whether we are talking about old-school replacement of vacuum tubes or futuristic analytics, we still need documentation.

What do we mean by “ticket excellence”? The term comes from a standard of problem documentation that was promulgated in the Sprint organization in the 1990s. Sprint had a document that gave clear instruction on how to write ticket notes, how to handle tickets, and how to do it all well. The specifics of the internal procedures are not important here. The idea is that any IT professional or organization that tracks problems in a ticket system should do so in an excellent way and according to the highest standards.

Here are some principles that perhaps should be included in any company’s “ticket excellence” standards:

Write clearly and succinctly
Avoid abbreviations that are not universally known
Remain objective and leave out personal remarks
Never use profane or insulting language
Document logical steps at every opportunity
Distinguish between what has been done and what should be done next (next action)
Do not hold onto an important ticket if you can’t resolve it
Be aware of customers with VIP status
Follow SLAs as well as technical and management escalation procedures

Increase Availability with SLAs and Escalation Management

We wrote recently in this space about service level agreements (SLAs). The primary focus of the article was the matter of downtime and uptime and what is promised to the customer. But SLAs could cover so much more. One area that SLAs cover is the amount of time required to fix particular issues.

To manage tickets properly, you need to prioritize them. Ticket systems do this by assigning a priority code to each ticket. There are different ways to do that. A system may include a word for each priority:

Critical
High
Medium
Low

Or a ticket system might assign a priority number that will dictate how urgently tickets are worked:

Priority 1 (P1)
Priority 2 (P2)
Priority 3 (P3)
Priority 4 (P4)
Priority 5 (P5)

These priorities should be clearly defined in the SLA. A P1 may be a critical issue that deals with a major availability event or is related to a VIP customer. A P5 may be an administrative task that could be done when the desk is quiet. Each of these may have time limits. A P1 may need to be resolved in 4 hours, a P2 in 8 hours, and so on. Any ticket that is not resolved within the SLA time requirement is considered “outside of SLA”. That’s when people can turn up the heat.

The way people turn up the heat is by using SLA-defined escalation.

The way people turn up the heat is by using SLA-defined escalation. This means that an issue is going further up the chain, either technically or in management. As for technical escalation, if a Tier 1 tech can’t handle a problem, he will send a ticket to a Tier 2 department or tech — with an appropriate explanatory handoff. If the Tier 2 guy can’t fix it, the ticket goes to Tier 3. Management escalation means that supervisors, directors — even vice-presidents or CEOs — get involved in a ticket. You may think that technical issues are about machines, but they are also about people.

Increase Availability with Automated Systems and Analytics

That doesn’t prevent people from turning as much over to the machines as possible. We don’t use paper tickets, and IT companies around the world continue to look for ways to make the detection and resolution of IT problems as automatic as possible. With redundant systems and self-healing networks, many problems don’t even show up on a surveillance technician’s radar.

In a white paper by the company Wipro called Infrastructure Automation and Analytics, the author discusses the history of dealing with IT infrastructure:

Phase 1: Chaotic
Phase 2: Reactive
Phase 3: Proactive
Phase 4: Managed
Phase 5: Utility

Just as infrastructure management and maintenance required a lot of attention in the early days of IT, we can see the parallels in trouble documentation. Issues that might have been complicated and confusing long ago are handled much better now that we have so much experience with them. And many problems that required a lot of written analysis might be left to computers nowadays. The work of IT problem management continues to advance.

Conclusion

In an IT organization, ensuring high levels of availability of users and devices across the organization starts with trouble ticket excellence. If you can’t figure something out, it often helps to write it all out and analyze it. Trouble ticket excellence is a way that this analysis becomes a collaborative effort. It can be frustrating, and exhilarating, and sometimes it’s just plain fun! Anyone who wants to go into IT should know how to communicate. It’s extremely important. Documenting an issue with ticket excellence can make or break an IT business.

Increase Availability with Ticket Excellence

Trouble Tickets and Those Who Use Them

Increase Availability with Maintenance, Change Control, and KBs

Automation is changing so much of the trouble management process.

Increase Availability with Ticket Excellence

Increase Availability with SLAs and Escalation Management

The way people turn up the heat is by using SLA-defined escalation.

Increase Availability with Automated Systems and Analytics

Conclusion

Related Articles

Network Availability – Is it Important to you?

The Basics of DCIM

The Essence of Uptime

Can Too Much Redundancy Impact Availability?

Prevent your next outage now!