Business Continuity and Disaster Recovery


A Straightforward Approach

by Wes Gorham
May 23, 2012

 

“Don’t put all your eggs in one basket.” This old chestnut is particularly appropriate when it comes to managing risk. Risk represents different things depending on your audience. We often hear of the importance of managing the risks associated with making personal investments. Until we understand investing well enough to weigh equity against the level of tolerable risk, we risk making uninformed decisions that could have devastating results.

The same principle holds true for organizations seeking to manage their own risk while striving to achieve the goals and priorities set by their management. The business continuity and disaster recovery (BC/DR) plan is a key tool in any organization’s risk management toolbox. This article is intended to serve as a primer for creating a BC/DR plan, providing you with the essential knowledge, expertise, and practical decision-making skills you need to be successful.

An Enterprise-Centric, Holistic Framework

Business requirements drive the criteria for quality. These requirements and their corresponding processes are the “customer-driven” nervous system of the organization. The continuity requirements of these processes drive the level of risk mitigation investment the organization includes in its business strategy. These business processes, in turn, enable service delivery. The result is a BC/DR program that provides for the needs of the business and reflects the business’s commitments to its customers.

IT provides specific technical services to various lines of business throughout an organization. These business operations have an inside-out view of the organization, while the executive-level looks from the outside in. The diagram on the following page illustrates the holistic nature of a typical BC/DR program with this bidirectional view. Lower-level operational activities are the key to linking specific customer-driven requirements for risk tolerance with criteria for quality. The high-level activities set the framework for strategy, policy, and constraints on corporate requirements.

Let’s take a closer look at the continuum model section by section to better understand the key activities and deliverables, and the role the service desk plays in the overall delivery.

Senior Executive Direction and Commitment

Leadership is an essential element in setting the strategic vision. In today’s fast-paced business world, most executives are concerned with developing a strategy that helps the organization achieve and sustain strength and profitability. With this in mind, it is still possible to build a sustainable BC/DR program and secure solid commitment at the executive level. The priorities established during strategic planning become the goals, objectives, values, and guiding principles that support the risk management program and protect the organization’s investment.

From a risk management perspective, senior leadership typically takes the following into consideration: 

  • Regulatory, financial, and legal issues; 
  • Customer obligations; 
  • Insurance coverage (requirements and protection); 
  • Risk mitigation requirements (as a means of protecting some business functions); and 
  • Image and reputation.

BC/DR Strategy and Policy

Before continuing, it is important to understand the differences between BC/DR programs and BC/DR plans. The BC/DR program is the business continuity management lifecycle that supports the risk management process and protects the organization’s investment and assets. The BC/DR plan falls under the control of the BC/DR program and serves as the instruction manual for the continuity and recovery of business operations and technology services. Holistic in nature, it is a road map for seamless recovery, enough to sustain an acceptable level of service delivery.

The strategy and policy phase sets the policies and standards required to achieve the desired results. The key components of this phase are: 

  • Policies and standards (for ensuring the continuity of business operations); 
  • Maximum allowable downtime (MADT) and recovery time objectives (RTOs); 
  • Resilience; 
  • Methodology and tools; 
  • Processes; and 
  • Deliverables and results.

Typically, a business continuity (BC) manager is responsible for driving the program by providing the oversight and direction that opens the doors of communication and cooperation from upper management down through to the lower levels of the organization. The BC manager is ultimately responsible for the program’s success; however, he or she will require the assistance of key individuals. A disaster recovery (DR) manager, for example, may be chosen to lead the DR portion of the program.

Each business unit within the organization also plays a key role. The service desk, for example, is typically a mission-critical business unit. The services it provides before, during, and after a service disruption are vital. Therefore, the service desk should be proactive and develop and maintain its own viable BC/DR plan. Such a plan would include things like workaround procedures to mitigate the adverse impacts of a service disruption and keep key business processes operating at an acceptable level. It would also include provisions for handling the increased call volumes the service desk is likely to experience during a service disruption.

The planning stage is critical, and for best results, you should follow your tactical delivery road map. The essential goal is to design, develop, and implement the BC/DR program in a controlled manner, using the iterative process illustrated in the diagram above. Remember, to ensure seamless implementation, a well-established delivery model and execution plans is critical.

BC/DR Business Processes

The planning, solution development, and delivery activities that take place in this phase are continuously improved over time, as business requirements evolve. For example, the service desk contributes to the BC/DR program by actively participating in planning activities, and by building, training, testing, and continuously improving its own BC/DR plan. It also collaborates with other service units, when necessary, to ensure that the BC/DR Plan is consistent and provides seamless, end-to-end continuity.

The service desk is also responsible for provisioning the back-up facility, to be used when the primary facility is rendered unusable. Any tools and systems lost during the disaster, such as the incident logging tool, must be recovered. Following its BC/DR plan, the service desk would simply mobilize its staff and any other required resources to the alternate facility and resume operations in an effort to maintain service continuity. It may not be feasible to deploy the whole solution all at once, but pieces of the solution can be compartmentalized and deployed selectively over time.

These are the typical elements of an established BC/DR program and plan: 

  • Business requirements; 
  • Methodology, policies, and guidelines; 
  • Processes and procedures; 
  • Tools and templates; 
  • Risk assessment and business impact analysis (BIA); 
  • Delivery and technical capability; 
  • BC/DR strategy and solution;
  • Emergency response (includes disaster declaration, escalation and notification, call trees, etc.); 
  • Technical recovery plan (i.e., recover the IT infrastructure); 
  • Resumption plan; 
  • Repatriation plan (i.e., return process activity and/or IT back to steady-state); and 
  • Ongoing testing, maintenance, audit governance, and continuous improvement.

Building, Maintaining, and Testing Plans

Once the BC/DR program and plan are in place, they must be maintained. Having invested large amounts of time, energy, and money deploying the program and plan, they must be continuously improved, particularly as business requirements change and evolve. This helps ensure that their accuracy and integrity are in line with the business’s needs.

The BC manager will work with key representatives from across the organization, including the service desk, to provide oversight and direction, and to conduct testing exercises. Customers and third-party service providers should also be included in these exercises. Testing provides a mechanism for identification and correcting any deficiencies and nonconformities, and for keeping management and customers happy. The depth and breadth of testing depends on the program’s key requirements. In general, the program should be tested annually, though specific testing can be conducted in isolated settings when and where it is necessary.

As noted above, the service desk must participate in the organization’s test exercises. These exercises must test employee competency and the functionality of processes, facilities, and technology to verify that the BC/DR plan is sufficient. In addition, a test simulation setting at the alternate facility is absolutely necessary.

A typical test plan should consist of the following: 

  • An executive summary; 
  • Scope and scenario; 
  • Dates, locations, participants, and timescales; 
  • Assumptions and limitations; 
  • Objectives; 
  • Results of the objectives; 
  • Key strengths; 
  • Areas requiring improvement, lessons learned, and noteworthy items; 
  • Risk and change control; 
  • Test preparations; 
  • Notification, procedural systems, and participants check; 
  • Postrecovery check; 
  • Test cases; 
  • Technical infrastructure tier testing; and 
  • An activity log, issues log, and action items log.

The Emergency Response Plan: BC/DR Integration

To work effectively, the design of the technology and delivery model must uphold the business’s requirements and it must be seamless. These requirements are typically quantified by three metrics: maximum allowable downtime (MADT), recovery time objective (RTO), and recovery point objective (RPO). The RPO specifies the point in the operating cycle at which recovery must occur for the business to resume normal operating activities. The BC manager will provide the necessary oversight to ensure that the BC/DR plan integrates the business and its technology, and will make adjustments where necessary.

When a disaster is declared, the service desk may provide extended and even additional services to assist with the recovery. However, the service desk will also be busy recovering assets and resuming operations specific to its own services.

Application Analysis and Asset, Incident, and Change Management

To ensure that the required assets are included in the BC/DR plan, a reliable method for maintaining critical assets is crucial. All assets must be tracked and managed when changes are being made. At this stage, a configuration management database (CMDB) or other compatible tool for tracking assets will prove beneficial, as will integration with the change management process, which will ensure that the BC/DR plan stays in sync as changes are implemented.

Incident management is also a key process, as a disaster situation is an incident and it must follow the path to resolution set forth in the incident management process. The service desk is responsible for logging and tracking all of the calls it receives that pertain to the service disruption, as well as logging, tracking, and managing all first-pass resolution attempts. Likewise, the service desk is responsible for maintaining asset information and making sure that information is available to assist with handling service disruptions. Again, as other business and service units invoke their notification and escalation plans, the service desk may be called upon to assist where necessary.

Business systems and applications are interdependent. Establishing a recovery capability that synchronizes application interdependencies will reduce the recovery time window, thus lowering recovery costs and mitigating any adverse effects on service.

DRP/Data Storage Integration

Finally, selecting the most appropriate data storage solution architecture and recovery/restoration method, based on the DR solution, is essential to any BC/DR plan. The way data is backed up and stored off site, whether your organization uses a simple tape-based method or a sophisticated mass storage design, is crucial if adequate recovery is to be achieved. Synchronization between the primary data center and the DR management hot-site must be engineered to facilitate the simplest and most effective recovery possible in the shortest amount of time. Many organizations fail to realize the importance of this step and find that their data storage and recovery solutions are insufficient.

Business continuity and disaster recovery management are complex disciplines. I trust the insights I’ve provided here will help you develop your BC/DR program, produce a well-crafted BC/DR plan, and avoid putting “all your eggs in one basket.”

Author's Note: Thanks to Larry Lall, a senior BC/DR consultant at Integritas Solutions, and Derek Gillard, ITSM practice principal at Integritas Solutions, for providing guidance and insight during the preparation of this article.

 

Wes Gorham is a senior BC/DR consultant with Integritas Solutions. He has more than twenty-seven years of business and IT experience, including the delivery of complex BC/DR solutions for clients across North America. Wes is also certified in ITIL v3 Foundation.

Tag(s): process, practices and processes, framework and methodologies