6 Keys to Optimize Problem Management

by Paul Dooley
Date Published July 18, 2019 - Last Updated December 17, 2019

Editor’s note: This article was adapted from a discussion on HDIConnect and has been updated and edited for clarity.

I am looking at driving a problem management improvement effort and want to do so by focusing on two to three small goals that will have the biggest impact. We have a well-defined problem management process in place; adoption seems to be the biggest challenge. In particular, we need to do a better job of creating problems when the incident demand requires us to do so.

The goals I am looking to focus on should be measurable over time. One thing that comes to mind is the ratio of incidents to problems. Not to get to wordy, but I think this one has some pitfalls that scare me a bit.

Any other ideas? Any thoughts on how to get tech/process owners to open a problem rather than focusing on incident?

—Michael M.

Here a few thoughts and suggestions for your challenges relating to improved performance and adoption of problem management within your organization.

#1: Define and Handle Problems Separately from Incidents

First, make sure that there are documented policies and procedures in place with respect to what is a problem vs. an incident, noting that they should be handled differently. Users, support staff, suppliers, and other support groups should be clear on what is a problem (the cause of one or more incidents) vs. an incident (a symptom of a problem). You can refer to ITIL best-practice guidance for the basics around problem management. It is considered a "core" process in ITIL, both for ITIL V3 and for ITIL 4.

It should also be clear in your process documentation that incident management and problem management have very different but complementary goals, for incident management on the one hand to restore quickly, through any means necessary (often a workaround); whereas the purpose of problem management is to get to the root cause of incidents and prevent them from occurring in the first place (by effecting a solution to the problem causing the incidents). For any incident that cannot yet be prevented, problem management’s job is to supply the best workarounds and information about causal factors and solutions forthcoming (supporting incident management).

It should be clear in your process documentation that incident management and problem management have very different but complementary goals.

#2: Implement Incident and Problem Management Processes Together

These two processes not only work closely, but they rely on each other for success. For example, incident management provides the trigger for problems to be reported and investigated, as well as trend reporting for problem management to proactively analyze trends and identify and deal with any as yet unreported problems. Problem management facilitates incident management by removing the causes of incidents, thereby reducing repeat incidents and providing higher productivity for users and IT. Problem management also provides the best workarounds to incident management so that restoration can happen more quickly, as well as a known error database that will inform support staff about workarounds and progress on reported problems and help to manage user expectations. If possible, plan and implement the two processes together.

It’s also important to clearly document guidelines in the incident management policy and procedure as to when a problem report is triggered to problem management. For example, generate a problem report to problem management when:

The incident cannot be matched to any known problem or known error in the known error knowledge base
There are a series of incidents that exhibit a common pattern of symptoms
The incident is classified as Major, the impact to the organization is significant, and an investigation as to its cause is thus justified
An investigation as to the cause of the incident is otherwise justified

The criteria should be documented and communicated to all support groups, so that it is clear when a problem report is to be triggered to problem management. Your automated support tools should also be aware of these criteria and support the policies and procedure.

Problem management must also have its own record keeping system, so that problems and known errors can be tracked and managed through their lifecycle. Problem records must be able to be linked to corresponding incident records, so that the knowledge base can inform support staff and users of the progress on resolving problems linked to incidents. Why? If the people handling incidents cannot see and be informed about the progress on problems, the value remains in incident management from their perspective—fast response to an incident and at least some type of solution (albeit a workaround).

#3: Appoint a Problem Manager to Drive and Manage the Process

Be sure to appoint a problem management process owner who can be accountable for the quality and success of the process along with a problem manager to drive and coordinate the daily activities of the process. The problem manager role requires someone with in-depth technical and analytical skills, communication skills, and team leadership skills. This person should not reside in the service desk area, but ideally in a back-line technical support group (keep in mind that the problem manager could also be two or three people working together) to avoid a conflict of interest with the service desk—which should remain focused on handling incidents and requests and communicating with users.

#4: Emphasize the Value with Balanced Metrics and Reporting

To raise awareness concerning the importance of problem management, it’s key to show the "value" of this problem through a good set of metrics and regular dashboard reports to all stakeholders—support staff, IT management, customers, and suppliers. People have to visually see the value and how problem management is doing its job. This means you must have the right set of metrics in place to measure and report on the performance and contribution of problem management (showing the value).

The metrics that are the key to achieving Critical Success Factors (CSFs) for problem management should be highlighted as your KPIs for this process. To ensure balanced performance, lay out your problem management monthly scorecard in a balanced scorecard format, with metrics outlined in four key areas (see below for an example). Be sure to define SMART targets for each of these metrics, so that you can report performance to objectives for each of your metrics or KPIs.

Problem Management Balanced Scorecard Example

Performance Positive effect of the process on the business and IT	Customer Satisfaction Positive effect on customer and user satisfaction	Financial Speaks to the impact on costs	Learning & Growth Demonstrates compliance and process adoption
The number of repeat incidents per service (declining by x percent per month/quarter) The percentage of incidents closed by the service desk, without escalation to other tier 2 or 3 support groups (increasing by x percentage, by month/quarter) Average incident resolution time for those incidents linked to problem records (declining over time, reflecting faster average incident resolution) Impacted User Minutes (IUM) – (business metric, faster resolution should result in additional user minutes available to do work)	Customer satisfaction level with speed of resolution, and fewer incidents on periodic surveys (increasing) User satisfaction level with speed of incident resolution, and fewer incidents as recorded from ongoing surveys (increasing)	Average cost per problem (declining, due to growing process maturity) Average cost per incident (declining, due to faster average turnaround time)	Number and percentage of incidents linked to a problem/known error record (increasing, showing that people are querying the KEDB, and identifying problems that are the likely cause of their incident) Total number of problems (as a control measure, initially increasing, reflective of problems linked to reported incidents, then gradually decreasing, as the process becomes more effective and quality starts to improve) Backlog of outstanding problems (growing initially, then declining over time as the process becomes more effective and efficient)

#5: Translate IT to Business Metrics in Your Reporting

Too often IT focuses too much on "IT metrics" (technical metrics) in their process/technology metric scorecards, ignoring the need to include "business metrics" which is understandable and resonates more strongly to a customer, user, or a business executive. Be sure to flag in your balanced scorecard which of these are KPIs that say something about performance on critical success factors. It’s also a good idea to include where possible business metrics that will really help communicate the value in business terms to the business, customers, and users.

For example, although I mentioned that one of the KPIs should be a "reduction in the average incident resolution time,” due to ready access to good workarounds and solutions from problem management, a 3% improvement in the average speed of resolution time might not be readily comprehensible from a business perspective. The IT staff might see the value, but a customer or business executive might wonder, “How has that helped the business, my department, or my users?”

How can you adapt this IT metric to a business metric to show the value to the business? Faster resolution normally results in higher availability of time to do work at the workstation. If, for example, you were to also include a metric entitled "user impacted minutes" (which is a relatively common business metric) and were to show that the 3% improvement in the speed of resolution time led to an increase of 120 user minutes at the workstation that month, which in turn enabled users to process approximately 1,500 more orders (realizing a higher level of productivity, revenue, and profit), you have just converted your IT metric to a readily comprehensible business metric!

When constructing your measurements framework, and reporting system, keep in mind that you have a variety of stakeholders that have a vested interest in your process, and you have to ensure that your metrics and reporting convey "value realized" to each of these stakeholder groups—the service desk, other technical support teams, and suppliers, as well as the business (customers, users, and management staff).

#6: Employ Visible Reporting to Effectively Communicate the Value

Once you have put in place the right balanced scorecard for problem management (and by the way, it is also important this be in sync with the scorecard for incident management), you then need to focus on reporting and communicating the value to all stakeholders so that it is clear to them that problem management is delivering on its promise to the organization, to customers, and to users.

The reporting should be tailored to different audiences, so that not everyone gets the same report and each audience get the reported value that relates to their domain. Top management and IT management is interested in bottom line results: lower costs, higher IT productivity, and higher customer and user satisfaction. Other support groups are interested in how they are contributing to reducing the number and frequency of problems in their specialized area. The service desk staff and management are interested in how problem management is reducing the volume and frequency of incidents, providing faster access to quality workarounds, and helping them achieve faster restoration times.

Once the reporting format is sorted out, make this reporting visible through automation and regular updates. What is visible gets acted on; what is hidden does not. For example, for the service desk, place visual displays on the walls in the support center so that people can be reminded of the performance and value of problem management. Ensure the management of other IT support groups gets at least monthly tailored reporting on how problem management is contributing to their area. Share reporting with IT and the organization's management about the value being delivered by problem management.

Problem Management Done Right

Problem management is an extremely valuable practice if done right, which means documented clearly, with roles and responsibilities assigned, good tracking and management, and the value recorded and reported to all stakeholders. Repetitive incidents will decline, availability of enabling IT services will increase, and both user and IT productivity will benefit.

Paul is the president and principal consultant of Optimal Connections LLC. With more than 30 years of experience in planning and managing technology services, Paul has held numerous positions in both support and management for companies such as Motorola, FileNet, and QAD. He is also experienced in service desk infrastructure development, support center consolidation, deployment of web portals and knowledge management systems, as well as service marketing strategy and activities. Currently Paul delivers a variety of services to IT organizations, including Support Center Analyst and Manager training, ITIL Foundation and Intermediate level training, Best-Practice Assessments, Support Center Audits, and general IT consulting. His degrees include a BA and an MBA. Paul is certified in most ITIL Intermediate levels and is a certified ITIL Expert. He is also on the HDI Faculty and trains for ITpreneurs, Global Knowledge, Phoenix TS, and other training organizations. For more about Paul, please visit www.optimalconnections.com.