Recently, a question came into our HDIConnect community site, asking about the best way to classify and handle performance issues. Performance for the purpose of this discussion is the availability, reliability, or responsiveness of a service.
The short answer is that in most cases issues that are reported as performance problems should be classified as incidents, not service requests. However, there are exceptions. To understand why, let’s look first at the industry definitions for incidents verses service requests, why its important to define what is meant by “normal performance,” the impact of capacity planning and management on performance, what typically triggers a performance issue, and why it’s important to define quality and set standards.
Ensure Staff and Customers Understand the Definitions
The current version of ITIL defines an incident as an unplanned interruption to an IT service or a reduction in the quality of an IT service. Failure of a configuration item that has not yet affected service is also an incident—for example, failure of one disk from a mirror set. The ITIL incident management process ensures that normal service operation is restored as quickly as possible, so that business impact is minimized.
A service request is a user request for information or advice, or for a standard change (a pre-approved change that is low risk, relatively common, and follows a procedure) or for access to an IT service. A good example of a service request is a user request for a password reset. Other examples might include a request for an answer to a common question or for a standard pre-approved update to a workstation configuration.
Provide Guidance for Logging
As I mention above, performance is typically construed as the availability, reliability, or responsiveness of a service. If the performance of the service from the perspective of the customer is not “normal,” and there is evidence of this, then the issue should be logged and processed as an incident (since the service has been interrupted or degraded in quality). The incident management process should be engaged, along with first level support at the service desk, to proceed with handling and resolving the performance incident. If the cause of the incident is not known, and the action is warranted, a problem report should also be opened and forwarded to problem management to investigate the root cause of the performance problem.
If, however, the service is performing normally, and the customer is requesting a higher level of performance—for example, a higher level of availability, more capacity, or improved responsiveness—then the issue should then be handled as a service request. In this case, there is no malfunction in the performance of the service, and the user is requesting optimization of the service, or a higher level of capability for the service—which may mean additional or more powerful components. This could then trigger consulting services or work orders for added components to improve the service and its performance.
Set the Stage and Define the Context
For a customer or user to experience a performance issue with their service, or application, they must have an idea of what to expect as “the norm.” There are two ways in which you can establish expectations for service performance with a customer/user:
By default, with no action on the part of the service provider. In this case, there is no expectation set with the customer as far as what they should expect from their service, in terms of availability, responsiveness, or throughput. In the absence of an explicit definition for what to expect, of course the customer will often expect the unreasonable—that the service is always, up, available, and performing to the highest level they can imagine. This not a preferred option for the service provider, as they will struggle to meet such high expectations for service performance.
Through the provision of a Service Level Agreement (SLA) for the service. An SLA explicitly defines what the customer should expect as “the norm” for the service, in terms of its performance characteristics—typically levels of availability (is the service available when I want to access it), responsiveness to users, and throughput capability. Providing an SLA to a customer for a service is the preferred method of defining the norm, as there is no question then as to what is deemed as acceptable performance for the service (since the customer and the service provider have agreed to this in advance).
An SLA for a service contains a number of components:
- The name of the service and a short description
- Clearance information (with location and date)
- Contact information of both the customer and the service provider
- Roles and responsibilities of the customer and the service provider
- Desired customer outcomes from the service (how it supports and enables their business process and objectives)
- Service and support coverage hours
- Agreement terms
- Agreed levels of support (in the event there are multiple support level available, such as “basic” support, or “premium” support)
- Service level requirements or targets
- Service review schedule and reporting
- Other information, such as change history, references, and glossary
The critical section in the SLA that pertains to setting performance expectations with the customer and service provider is the section on service level requirements and targets.” Examples of service levels for a given service might include as the following:
- Level of availability for the service: 99.8% availability during agreed service hours
- Level of capacity: the service will support with acceptable response time up to 3,000 concurrent users
- Level of response time: average response time to users of the service will vary between 3–5 seconds for a query to the database
By explicitly stating the levels and targets for performance, and agreeing to these, the customer and the service provider arrive at a common understanding of what constitutes acceptable/expected performance for a given service. The customer can then reasonably expect to see these levels of performance exhibited during normal usage of the service, and the service provider can use these levels as targets to aim for in providing the service to production customers.
Thus, the context must first be set, either by providing no SLA for the service, in which case the expectations for performance are typically extremely high, or by agreeing to and providing for an SLA for the service, in which case the performance targets to achieve are very clear (and much more achievable) for both the customer and service provider.
Minimize Performance Issues Through Capacity Planning
The performance of a service is often the result of how well you carry out the ongoing process of capacity planning and management. A given IT service—say a web-based email service—is really a combination of components working together to deliver the value of that that service to the user: email, which is composed of a user client email application, a supporting network, a server, a server application, storage systems, a database, and other components. These may be on premises, cloud based, or a combination of both types of architecture.
For a service to perform well it must be designed for the environment in which it is intended to operate. And that means that the components need to be sufficient and resilient enough to meet the patterns of demand for the service in the user environment, sometimes called “patterns of business activity.” This simply means that the demand for most services is not continuous and stable but varies with the business process. In the case of email, that may mean that in the mornings, just after lunch, and at the end of the day there are peaks in the number of users logged on and in the volume of emails sent and received.
Now if the service and its component parts have been designed with these patterns in mind—enough bandwidth on the network, enough storage, enough compute power—the service will perform optimally though the cycles of these patterns. If these patterns of demand have been ignored, the service is likely to exhibit the symptoms of a performance issue when the demand is up, and the service components cannot respond adequately to meet expectations.
What Typically Triggers a Performance Issue?
This lack of responsiveness will then be noticed by the user(s) as a performance problem and be reported to the support center. Comments like “It’s very slow in the morning,” or “It’s slowing down and likely to crash” will be reported as the symptom. Be certain that you gather detailed descriptive information on the symptoms of the performance issue—time and date, what preceded the issue, and as much supporting evidence as possible. Performance issues can sometimes be intermittent, and difficult to diagnose and resolve, and so the more descriptive and detailed your incident information is, the more likely you are to pinpoint the cause and resolve the issue through problem management.
Performance issues are a perception by the customer that there has been a reduction in the quality of a service. Hence, in most cases they should be classified as incidents.
Performance issues, in my experience as a practitioner, and auditor, are normally related to capacity planning and management issues. The causal factor is typically due to one of the following:
Insufficient service and component capacity to handle the demand. For example, insufficient bandwidth on the network to handle the traffic, insufficient capacity available on storage devices (disk drives, solid state media, or cloud), inadequate compute power (CPU insufficient to handle the demand).
Unforeseen demand that has been placed on the service and supporting components. For example, the service has been designed, planned, and deployed to assume a certain number of users logged on, and all of a sudden there is a huge number of users trying to logon to the system.
In one or both of these situations, the service begins to degrade in the view of the users, its normal performance is no longer in place, and they experience what is commonly known as a performance incident (or problem, if the cause is unknown).
There are exceptions to this scenario. If the user is calling and asking that the performance be increased or optimized (i.e., they would like faster than normal response or throughput for their transactions, or more capacity, or quicker response), then obviously there is no performance problem being reported—what is normal is being delivered. What they are asking for is enhanced performance. In such cases, it would be proper to log these calls as service requests rather than incidents and handle the issue as a service request, potentially triggering consulting services to optimize or raise the performance of the service and its components.
Define Quality, and Set Expectations
Note the use of the word "quality" in the definition of an incident. It is important that your support center define—for IT staff, users, and customers—what is agreed to as service quality. A user or customer receives the value promised for the service, its features and functionality, its performance, at a price I can afford. This definition of quality should appear at a high level in your service catalog and at a supporting level in your SLAs with customers and users. This way, customers know what to expect in terms of quality from services, and your support center knows what they are expected to deliver to customers and users.
Of course, you want to be sure to include these definitions of what is an incident and a service request in your support center's set of guiding policies and standard operating procedures (SOPs), so that it is clear to the support staff, and also to other support groups, what constitutes quality, as well as what should be considered an incident verses a service request.
Your procedures for handling incidents and service requests should include not only the definitions, but also examples of incidents vs. requests, and questions support analysts can ask during call handling to be sure they classify, log, and handle the issue properly—either as an incident or a service request. As a result of putting in place such improved policies and procedures for handling performance related issues, you should experience:
- Improved handling of issues, either as an incident (most cases) or a service request should that be called for
- More consistent classification, and accurate escalations if the need arises
- Improved capacity planning and management, which will have a favorable impact on service design and architecture and in turn service performance
- More effective handling of performance incidents and resultant problems
- Recognized opportunities for performance optimization, should those opportunities be reported as requests for service
- Higher customer and user satisfaction as a result of optimized service performance
I hope this explanation helps your support center deal more effectively with performance issues, and that, as a result, you experience fewer such issues, higher service performance, and happier customers, users, and support staff!
Paul is the president and principal consultant of Optimal Connections LLC. With more than 30 years of experience in planning and managing technology services, Paul has held numerous positions in both support and management for companies such as Motorola, FileNet, and QAD. He is also experienced in service desk infrastructure development, support center consolidation, deployment of web portals and knowledge management systems, as well as service marketing strategy and activities. Currently Paul delivers a variety of services to IT organizations, including Support Center Analyst and Manager training, ITIL Foundation and Intermediate level training, Best-Practice Assessments, Support Center Audits, and general IT consulting. His degrees include a BA and an MBA. Paul is certified in most ITIL Intermediate levels and is a certified ITIL Expert. He is also on the HDI Faculty and trains for ITpreneurs, Global Knowledge, Phoenix TS, and other training organizations. For more about Paul, please visit www.optimalconnections.com.