Effective problem management requires a clearly defined process and documentation

by Buff Scott, III
Date Published January 19, 2017 - Last Updated December 6, 2017

I have had the opportunity to assist several IT organizations in designing and implementing a best-practice problem management process. Inevitably, this question always came up, “When should a problem record be opened?”

Problem management is one of the two IT service management processes we refer to as “service resolution and restoration” processes. The other process is incident management. When performed well, problem management is an indication of a more mature IT service provider. While incident management is focused on restoring normal service operation as quickly as possible, problem management focuses on determining the root cause of one or more incidents, identifying temporary workarounds, and applying permanent fixes so that incidents don't happen again. This would imply that an incident needs to first occur before a problem record can be opened. However, this is not necessarily so.

Buff will deliver the HDI Problem Management Professional pre-conference workshop at the HDI 2017 Conference & Expo in Washington, DC, in May.

Reactive vs. Proactive Triggers

The scope of problem management includes two different aspects: reactive problem management and proactive problem management. Reactive problem management seeks to identify the underlying cause of one or more reported and recorded incidents as they occur, while proactive problem management seeks to identify and prevent potential incidents before the customer is impacted, in essence “resolving” the incident that never occurred. Proactive problem management looks for trends and patterns and is used to foresee and correct errors in the infrastructure before the manifestation of incidents. Thus it is possible that an incident record may not exist at the time a problem record is opened.   

What I have seen most often happen in IT organizations when first standing up a formal problem management process, is that technical staff are hesitant and unsure as to when to open a problem record, so few get opened. To address this challenge, you will want to develop a clear list of the criteria for when a problem record must be opened. The following is a suggested list for when to open a problem record:

  • Any incident that is assigned a Priority 1 or where a major incident has been declared
  • Multiple incidents showing the same symptoms
  • Validated alarms from monitoring devices that are deemed high impact
  • Notification from a supplier that a problem or known error exists in their product or service
  • Any incident that is opened as a result of a security event. A security event can include, but is not limited to, the following:
    • Viruses
    • Trojans
    • Worms
    • Malware
    • Repeated unauthorized access attempts

Other triggers for opening a problem record reactively may include:

  • An incident for which the root cause is not known
  • The service desk suspects an incident may recur after initial resolution
  • Analysis of an incident by a technical support group reveals a potential underlying problem
  • System monitoring and alerting tools may automatically create a problem record due to fault detection
  • An emergency change was submitted for an unreported incident
  • The result of a post-implementation change review

Triggers for opening a problem record proactively may include:

  • Analysis of incidents over differing time periods reveals a recurring trend, indicating an underlying problem might exist
  • Analysis of the IT infrastructure (by category, CI type, facility, etc.) by technical support groups identifies a potential problem
  • Results from data analysis and mining of the knowledge base or configuration management database may indicate an underlying, yet unidentified problem
  • Announcements of known errors from applications development or release and deployment teams
  • Reports generated from application or system software
  • Service review meetings with customers
  • Supplier review meetings
  • Notification from a supplier that a problem or known error exists in their service or product

Problem Recording

Another question that is often asked is, “Who should open a problem record?”  Problems may be identified via a number of different sources:

  • Service desk personnel
  • Technical support groups
  • Other ITSM processes (incident, event, capacity, availability management, etc.)
  • Suppliers
  • Service partners

A problem record can be opened by the service desk or by any technical support group member. The problem record should be opened by whoever first discovers or suspects that a problem may exist. From a reactive problem management perspective, an incident record must be opened first and the incident record linked to the problem record. There is much industry debate as to whether the service desk should be allowed to open a problem record. I believe they should, but under the following conditions or criteria:

  • A clear definition of what constitutes a problem has been established within the organization
  • Detailed training on your designed problem management process has been provided to the service desk agents
  • Initially, the opening of a problem record may/should be limited only to experienced or lead service desk agents

Given the above criteria, there is no reason why the service desk should not be able to open a problem record and assign it to an appropriate technical support group.

Problem Reporting

I suggest the following reports be created to ensure that problem records are being opened appropriately (per your defined policy and organization-specific criteria):

  1. The total number of problems recorded during the reporting period. There should be an increase initially as an organization implements problem management, but the number may plateau or later decrease as process and problem resolution capabilities mature and the infrastructure becomes more stable. The report should be segmented by technical support group as this will help to determine which groups are complying with the process and/or who might need some additional training on the process.
  2. The total number of known error records (or knowledge articles) created during the reporting period. There should be an increase initially as an organization implements problem management. This demonstrates that root causes and workarounds are being identified and documented. Again, report this by technical support group.
  3. A list of incident records where a reported incident should have triggered the opening of a problem record, but where no related problem record can be found. This report will help to identify which IT functional teams need additional training on the process (or processes—possibly both incident and problem management) to ensure problem records are being opened when necessary.

Provide Clarity

To have an effective problem management process, you must define when a problem record should be opened, who can open a problem record, and ensure appropriate reports are produced from the process. As I close this article, I will leave you with a few helpful hints:

  • Determine who (which functional teams) can/should open a problem record
  • Establish clear reactive and proactive criteria for when a problem record should be opened and train (ideally, with examples) the functional teams on when to open them
  • Create and distribute KPI and management reports to identify where compliance issues exist and where additional training may be needed
  • Make sure employees understand the benefits to the organization and to themselves
  • Provide recognition and motivation, and communicate your successes

This article is based on material presented in Problem Management: A Practical Guide , by Jim Bolton and Buff Scott III 

Buff Scott III has more than 30 years of experience in the IT industry. He’s a versatile leader with extensive management experience, and he’s an accredited ITIL v3 Expert, ITIL Trainer, and HDI Faculty member. Among his many other skills and accomplishments, Buff’s been designing and implementing ITIL processes since 2001, and he specializes in business and IT process reengineering.

Tag(s): incident management, IT service management, ITIL, ITSM, problem management, service desk, service management, supportworld


More from Buff Scott, III