Essential Guide to ITIL Incident Management
What is Information Technology Infrastructure Library (ITIL)?
ITIL is a world-renowned best practice framework, adopted by individuals and organizations in both the public and private sector as a framework for aligning IT services with the needs of the business. Its most current version, ITIL 2011, consists of five core publications, including Service Strategy, Service Design, Service Transition, Service Operation and Continual Service Improvement. This guide provides a comprehensive explanation of Incident Management, a critical process within the Service Operations book.
Service Operation is an essential element of the procedural life cycle, delivering service and value to the business, customers and users. It ensures that agreed upon service levels and quality are achieved or surpassed, providing both an introduction and guidelines to activities that contribute to IT operational excellence.
ITIL Service Operation processes include:
- Incident Management
- Problem Management
- Request Fulfillment
- Event Management
- Access Management
ITIL emerged as a concept when the British government determined the quality of IT service provided to them was inadequate. The Central Computer and Telecommunications Agency, which merged with the Office of Government Commerce in 2000, launched the first version of ITIL, called “Government Information Technology Infrastructure Management,” in the early 1980s. The framework spread across Europe in the 1990s.
Version 2 of ITIL was released in 2001, and it quickly became the most popular IT Service Management best practice framework throughout the world. The next major version change came in 2007 with ITIL V3, which emphasizes IT and business alignment.
The most current update to ITIL occurred in 2011 with what is called ITIL 2011 – a tune-up to ITIL V3.
What is Incident Management?
Incident Management is an IT service management process intended to restore “normal” service operation as quickly as possible, minimizing any adverse impact on business operations or the user. Success is achieved by promptly and effectively dealing with all Incidents reported by users, discovered by technical staff or automatically detected by a monitoring solution. The IT Infrastructure Library (ITIL) defines an Incident as “an unplanned interruption that causes, may cause or reduces the quality of an IT Service.”
Common Incident Examples
Although there are endless reasons users contact the Service Desk for assistance, certain Incidents are common across every organization:
- Active Directory password reset
- Delete Active Directory account
- Error message when trying to launch or access an application
- Printer not printing
- Hardware – printer, fax, scanner, tablet not working
- Monitor flickering
The Purpose and Importance of Incident Management
Each stage of the entire ITIL Service Lifecycle provides value to the business in one way or another. Service Operation delivers both long term incremental and short term ongoing improvements. The primary goal of the Incident Management process is to restore normal service operation as quickly as possible. When successfully implemented, Incident Management offers the following types of benefits:
- Reducing unplanned IT service staff costs by reducing the number of Incident tickets
- Decreasing business and user downtime with faster Incident detection and resolution
- Increasing productivity across the organization by restoring normal operation quickly
- Identifying training opportunities and potential service improvements
- Improving user satisfaction
- Demonstrating IT’s value to the business by aligning IT activities to business priorities
- Reducing the impact on the business and user with improved monitoring
- Reducing lost Incidents
Learn how Cherwell tackles Incident Management
Watch 6-Minute Demo
Inter-Related ITIL Processes
ITIL processes interface with one another throughout the service lifecycle. As mentioned earlier, an Incident is an unplanned disruption or reduction in quality of an IT service. Closely related are Problems, which are the unknown cause of one or more Incidents. Problem Management is designed to prevent or minimize the impact of Incidents by performing root cause analysis.
Occasionally, both terms are used interchangeably. A third term, Issue, may also be substituted, further adding to confusion surrounding the ITIL methodology.
Information to remember:
- An Incident can raise a Problem – If an Incident is reported and is likely to happen again, a Problem may be raised to identify and resolve the underlying root cause using the Change Management process.
- A Problem can cause an Incident – If a problem arises and is not resolved, an Incident, or multiple related Incidents, may be reported as a result.
The Role of Knowledge Management
Although the Knowledge Management process is associated with the Service Transition lifecycle stage, it is one that is executed across the entire lifecycle, especially during Service Operation. Knowledge Management can have a very strong impact during the Incident Management process. The Knowledge Management function is typically a feature within a larger IT service management technology solution. Its goal is to collect and share knowledge across the organization. This is especially important when service desk staff seek to quickly solve reported Incidents. Solutions within the knowledge base leverage existing knowledge to save time and lower the cost of service delivery.
Other Key ITIL Process Relationships
- Configuration Management
- Change Management
- Service Level Management
- Availability Management
- Capacity Management
- Event Management
Incident Management Roles and Responsibilities
Well defined roles and responsibilities are critical to the effective execution of the Incident Management process. The Incident Management team is comprised of the following:
The Incident Manager has primary responsibility for driving and continually improving the Incident Management Process. In small- to mid-size organizations, this role is commonly assigned to the Service Desk Manager; in larger organizations, this may be a separately defined role. Key responsibilities include: team leadership, reporting key performance indicators (KPIs) back to management, direct management of first and second line support, managing the Incident Management system and enforcing the Incident Management process work flow.
First Line Support
First Line Service Desk Technicians are the single point of contact for end users seeking information and reporting service disruptions. They are primarily responsible for the initial support and classification of Incidents and the immediate attempt to restore a failed service as quickly as possible. If they are unable to resolve the Incident, the First Line Service Desk Technician will route the Incident to appropriate support personnel, monitor activity and keep users up to date on the status of their Incident.
Level Two Support
Second Line Support Technicians typically have more advanced knowledge than First Line Service Desk Technicians. They may become responsible for Incidents that First Line Support is unable to resolve. These technicians may interact with third party experts from software or hardware vendors to help restore normal service as quickly as possible.
Incident Management Key Performance Indicators (KPIs)
Measurements are important across all stages of the ITIL lifecycle. Each process has metrics that should be monitored and reported to effectively evaluate the overall performance. Continuous Service Improvement necessitates that the performance of each process be measured to identify areas needing improvement.
Typical Incident Management metrics include:
- Total Incidents reported (per category, priority, person, organizational unit, etc.)
- Status of Incidents
- Time between Incident creation and resolution
- Incidents and SLA (reached, breached)
- Average cost per Incident
- Reopen rate
- Incidents handled without escalation
- First call resolution
- Configuration Items experiencing recurring Incidents
- Incidents by time of day
KPIs should be related to Critical Success Factors (CSF) and CSFs should be related to objectives. This relationship helps with decision support for maintaining current state and improving to desired state. Although each organization is different, relevant reports for users, staff and management will help support important decisions that can be used to improve both the processes and the business as a whole.
Best Practices for Implementing Incident Management
Adopting the ITIL framework within a business can be a daunting task. As with any ITIL process, Incident Management implementation requires support from the business. Of particular importance is gaining buy-in from executives and upper management. Before beginning the adoption process, it’s important to have at least one person dedicated to the overall project management and orchestration of adherence to best practices for Incident Management. It is also extremely helpful to have an IT service management (ITSM) tool in place that will support your current state processes and desired future state processes, as well as a Service Desk acting as the primary interface with the IT department.
1) Understand the current Incident Management process
Occasionally an organization does not have a consistent process for handling incidents, or they have a less sophisticated one in place. Either way, it is important to map the existing process as well as possible in an effort to understand what the existing Service Desk process offers.
2) Identify long-term Incident Management process vision
It is also important to understand what the organization expects from the Incident Management process. The expectation may be based on generic Incident Management templates included with the ITSM tool or a more custom process based on the organization’s specific needs.
3) Conduct a gap analysis
Next, identify what must be adjusted between the organization’s current Incident Management process and its long-term vision for Incident Management. This will arm you with valuable information about the effort, time, money and resources necessary to achieve your Incident Management objectives and you overall service goals.
4) Create an implementation road map
Adopting any ITIL process will take time to develop, and you will need a road map to help set expectations for management. Use that road map to describe the activities, timeframe and efforts necessary to deliver. This roadmap should include quick wins, tool implementation, process changes, people and organization enablement, communication plans and overall governance changes.
5) Begin project implementation
It’s time for implementation to begin. Create a project plan that defines the actions or tasks, responsibilities and time line for completion of all tasks. Communicate the successes along the way as you achieve each milestone, demonstrating your progress towards your ultimate implementation goal.
Feature Checklist for Incident Management Software
For IT organizations evaluating Incident Management software and/or IT service management suites that offer Incident Management capabilities, it is important to understand the types of features required to support key processes. At a minimum, Incident Management software should provide the following capabilities:
- Create, modify, resolve, and close incident records
- Generate unique record numbers associated with each incident record
- Link incidents to problem records, knowledge articles, known workarounds, and requests for change
- Link configuration management data to incident record
- Notify incident owners when associated problem is resolved
- Automatically record of historical data in an audit log
- Configurable incident categorization
- Incident search and reporting capabilities
- Route incidents based on resource availability, time-zones, sites, etc.
- Prioritize, assign, and escalate incidents based on categorization; escalate based on priority or other categorization
- Integrate with event monitoring solutions with the ability to automatically create, update, and close incident
- Flexible field configurations including, free text, drop down, date/time, attachments, screen captures
- Link incidents to customer data
- Utilize knowledge base solutions/scripts for diagnosis and resolution
- Assign incidents or associated tasks to external service providers
- Assign incidents to multiple assignees
- Create a problem or request for change from an incident record
- Automated incident alerts (to IT staff and/or end-user) based on deadlines, SLAs, closure, and other activity
- Link incident records to SLAs
- Collect feedback from end-users via a customer satisfaction survey
- Initiate an incident on behalf of someone else
- Stop the SLA clock functionality to put an incident on hold
- Differentiate between an incident and a service request
- Reactivate resolved incident
- Prioritize automatically determined by impact and urgency
- Integrate with Telephony/ACD system to pre-populate customer information based on caller ID