What is a “problem?”
ITIL® 4 defines a problem as “a cause, or potential cause, of one or more incidents.” An incident is “an unplanned interruption to a service or reduction in the quality of a service.”
But just following these definitions does not mean that you are ready to roll out a problem management program. Just because you have a few incidents that seem similar does not mean that you have a problem…or does it?
Really—What Is a Problem?
The first step toward an effective problem management approach is to define the criteria of a problem. Is a problem just x number of repeating incidents over y period of time? Perhaps.
Or is a problem an unknown issue that causes, or may cause, significant negative financial impact to the organization? Or is it something unknown that causes, or may cause, negative impact to your organization’s brand or reputation? Maybe a problem is a threat to your organization’s data that, if not addressed in a timely and effective manner, could result in a data breach and huge fines from various governmental and regulatory agencies due to the exposure of personal information.
While repeating patterns of incidents may be a good indicator of a problem, it should not be the only indicator. In fact, the approach to problem management cannot always be reactive. The first step toward an effective approach to problem management is to define the criteria of a problem for your organization and gain agreement on this definition from both IT and business leadership.
Problem Management Basics
According to ITIL 4, problem management consists of three phases:
Problem Identification. In this phase, problems are identified and logged. Problems may be identified from several sources; I will discuss that later in this article.
Problem Control. Problems are analyzed, and known errors and workarounds are documented in the problem control phase. In some cases, a workaround becomes the ongoing way for dealing with a problem, because the fix is impractical or too costly.
Error Control. In error control, potential solutions to known errors are developed. These potential solutions may result in a request for change if the solution can be justified in terms of costs, risks, or benefits.
Problem Management Approaches
Problem management is a formally defined approach for identifying and eliminating the causes of incidents or reducing the impact of incidents from causes that cannot be eliminated or prevented. There are two approaches to problem management: reactive problem management and proactive problem management. While both approaches utilize the same tools and techniques for cause analysis, reactive problem management looks at data and information from the past. Proactive problem management applies root cause analysis techniques to identify errors before they impact the organization.
Reactive problem management is often a good place to start a formal problem management program within an organization. Reactive problem management provides opportunities to try out and learn various analysis techniques while honing skills and providing noticeable value and impact.
Where to Look for Problems
Now that you know what a problem is and how to approach problems, where should you look for problems?
Perhaps the obvious source for finding problems is by reviewing incident records. Conducting a trend analysis of incident records is often useful for identifying a problem. But what kind of trend? It could be an upward trend in number of high priority incidents. It could be an upward trend in the number of incidents of a particular type, typically based on category.
Major incident post-mortems are another great place to look for problems. By definition, a major incident results in significant impact to your organization. Finding the cause of the major incident and taking actions to address the cause helps prevent a recurrence of that incident.
Do you have ITSM processes that frequently fail? For example, are changes or deployments failing? Is there a high number of re-opened service requests because the consumer was dissatisfied? Yes, processes can and do occasionally fail, but process failures should be a rare event. Finding and addressing the cause of process failure eliminates needless rework, frustration, and waste.
How about the number of times that a knowledge article is used to resolve an incident? A higher number of knowledge article citations or links may indicate the need for cause analysis and development of a permanent solution.
Software release notes are another potential place for finding problems. As part of the release of new or updated software versions, vendors will provide a “bug list” or list of known errors. This information provides input to problem management techniques to determine if these “bugs” may impact your organization.
Problems are not just limited to IT. Problems can also be found outside of IT. Business processes that are not repeatable or have a high failure rate are other places to apply good problem management principles and techniques.
Valuable Outputs from Problem Management
In addition to finding the cause of errors, a good problem management approach produces several valuable outputs:
Proposed fix/solution. Finding the cause of a problem is just the first part of cause analysis. Just as importantly, problem management develops a proposed fix or solution to the problem.
Known errors. A problem that has undergone some cause analysis, but a solution has not been determined is a known error. Knowing that we have known errors is often useful when investigating and resolving incidents.
Workarounds. Workarounds are alternative ways for accomplishing a task or goal when the primary method is not available. A workaround is used to reduce the impact of an incident.
Knowledge articles. A hallmark of an effective problem management program is the production of knowledge articles. Problem management is a knowledge discovery practice; knowledge articles are a natural outcome of this discovery.
The Value of Good Problem Management
The value of a good problem management practice is more than just identifying and resolving causes of incidents. Good problem management also delivers value in these ways as well:
Organizational learning. Problem management pushes an improved understanding of how the organization works; how IT works; and the relationship and impact of technology, process, products, and services within the organization.
Improvement. Problem management drives improvement—improved processes, improved outcomes, improved efficiency and effectiveness, and improved product and service management.
Growth. Problem management encourages the organization to grow its competencies and capabilities in response to the ever-changing landscape of business and technology. Good problem management means that the organization is investing in training and developing skills needed to be successful in a business world of ever-increasing complexity.
Confidence. A good problem management practice gives the organization confidence in its decision-making and ability to overcome any challenge.
The value of a good problem management practice is more than just identifying and resolving causes of incidents.
In my next article, I will explore why problem management is so important and present some selected problem analysis techniques.
ITIL is a registered trademark of AXELOS Limited.
Doug Tedder is the principal of Tedder Consulting, a service management and IT governance consultancy. Doug is a recognized thought leader whose passion is helping and inspiring good IT organizations to become great. Doug is an author, blogger, and frequent speaker and contributor at local industry user group meetings, webinars, and national conventions. Doug holds numerous industry certifications in disciplines ranging from ITIL®, COBIT®, Lean IT, DevOps, KCS™, VeriSM™, and Organizational Change Management. He was recognized as an IT Industry Legend by Cherwell Software in 2016, and is one of HDI’s Top 25 Thought Leaders in Technical Support and Service Management. He is a member and former president of itSMF USA, a member of HDI, a contributing author to VeriSM™, and co-author of the VeriSM™ Pocket Guide. Follow Doug on Twitter @dougtedder or visit his website.