The Incident Management process: a step-by-step guide

10 February 2023

Things go wrong. Without wishing to quote a well-known phrase, ‘stuff’ happens!  Small things going wrong can lead to larger issues, if not dealt with effectively and quickly. In the world of IT, there are many little things that happen, and their impact can be disproportionate. Fortunately, there is a process called Incident Management for dealing with these incidents, before they become larger problems. Learn how to partner with SCT for reliable incident management services and support.

What is an Incident Management Process?

Incident Management is the process of handling IT incidents and disruptions, resolving the issue and returning to normal service, as outlined in the service level agreement (SLA).

Incidents are often single spontaneous events that can be quickly fixed.  Examples include a localised network failure, a user downloading a virus on their computer, an e-mail server not replicating, or a web server running slowly.  There are many other examples!  They differ from ‘problems’ in that they are isolated, affecting only a small number of users, and can be fixed quickly.  Larger problems, which may have multiple causes and significant business impact, have their own problem management process.

Why is Incident management important?

A robust incident management process ensures that IT teams can quickly address vulnerabilities and issues as they arise, and help avoid them recurring in the future. A faster response helps reduce the overall impact of incidents, mitigate damage done, and ensure that systems and services continue to operate to their agreed service levels. 

A good process will also save a company money!  The research company Gartner, estimates that a single minute of IT downtime has a cost of around $5,600 or over $300k per hour.  Whilst that figure won’t be the same for all businesses, it is easy to see the loss in productivity if key employees are unable to do their job effectively.

The steps of an incident response plan

Whilst there is no single plan to fit all organisations, a good incident management process will have the following steps:

Incident identification

Incidents are typically notified to a central help desk by a user, or identified as a result of monitoring software. Initially, it is likely to be the symptoms or resultant issue that is reported, rather than a distinct cause. It is important to capture as much detail as possible, time, location, activity etc, to aid resolution later.

Incident logging

Once reported, the incident should be logged.  This will include the assignment of a unique reference number, incident manager, together with all the user reported information. Logging is important not only for resolving the incident, but for avoiding future recurrence.

Incident categorisation

The first job of the incident manager is to categorise the event.  This makes it easier to identify the potential solution, as well as prioritise the incident.  There is no hard and fast rule as to how many incident categories there are, as it will depend on the nature of the IT infrastructure.  There may be a tree-like mechanism to identify sub-categories.

Incident prioritisation

The priority is based on the impact and urgency of the incident. The impact focuses on the degree of damage likely to be caused to the user or business. The urgency indicates the timescale within which the incident needs to be resolved. Typically, issues are categorised as critical, urgent, high, medium, or low priority.

These categories are not absolute, and will depend on resources available at the time. A low priority issue may get resolved quickly if there is free resource, an urgent issue may have to wait if there is a larger problem or existing critical issues.  

Incident response

This is the stage where the magic happens!  The root cause of the issue is identified, a solution is agreed and implemented.  It is important to check that the solution has resolved the issue fully. This may require multiple iterations of the process.

Where it has not been possible to resolve the issue, it may be necessary to identify a work-around, to allow the user to continue, whilst a deeper look into the issue and cause is carried out.

Incident closure

Once the problem is solved to everyone’s satisfaction, the incident can be logged as complete. It is important that the incident manager logs all the relevant information about the cause and resolution, to help improve future incident management.

Post-incident review

It is good practice to review all the incidents that have occurred, and the responses to them, on a regular basis. This can help identify repeating issues, where a deeper dive might be required.  These reviews better prepare IT support teams for future incidents, and helps create a more efficient incident management process.

How can SCT help?

A great support process is vital to the health of any business, whether for incidents or the management of critical outages. For over two decades, SCT has specialised in the provision of IT support and maintenance services for channel partners, including value-added resellers, system integrators, managed service providers, IT vendors and distributors. With a dedicated 24/7 help desk, highly skilled field engineers, and high priority onsite spares, SCT can help you and your customers with incident management services that keep their IT systems running and their users happy.

To find out more about how SCT works exclusively with the channel, and how we can help support you and your customers, please get in touch.  We would love to hear from you.