Negligence or lack of control and insufficient due care, in a regulated environment can bring large indirect financial losses where investors or shareholders lose confidence & trust in crucial control and continuity elements within a given business unit. This can really affect the investment capital on a company’s balance sheet. An indirect sleight on one’s reputation can be harder hitting than purely a monetary loss that can be written off on a balance sheet.
These statements are especially true of companies operating in the financial services sector. Categories of Disaster
Categories of Disaster will be defined. These can range from a localized power outage on a small site to a fire at Head office or a database deletion or compromise after an unathorised network penetration.
Worse situations, where regional disasters have been declared, may require major businesses to instantiate services to contingent regional offices thousands of kilometers away.
Each of these continuity scenarios can be invoked for key systems, operating centres, alone or inclusive of the user base environments such as call centres, or order processing departments and the like.
Special consideration needs to be given to services like traditional analogue Voice circuit termination how and when Primary rate ISDN lines will be logically moved to a contingent site.
What happens frequently in a DR scenario is that dependencies on B2B interfaces that form part of the business critical workflow for key processes are not factored into recovery plans: Fourspiral considers and tests for these from the outset.
Structuring the right contingent environment to correspond the level of risk appetite
Thresholds (or appetites) that key business stakeholders set down are fundamental to mitigating financial or reputational loss. These risk thresholds would be determined in a Business (or Enterprise) Impact Analysis which is required to logically present outcomes and impacts to myriad threat conditions or scenarios. The Criticality (per service or business unit) & likelihood of these events occurring generate acceptable risk thresholds. Risk Tolerance thresholds allow for informed continuity spending and planning. Knowing budgets and staff resources BC & DR invocation actions and responsibilities can be documented and given to nominated senior technical and business staff to own, follow through & test.
Service or operational owners (departmental or operational managers) will own their component continuity plans. The composite plans roll-up into one master ECP Enterprise Continuity Plan would typically be assigned to a senior operations manager (with a secondary owner for resilience) to own, understand, maintain and bear responsibility for becoming the businesses’ expert for running continuity operations at a high level.
Typically the role of a specialist DR
company can be categorised into three potential areas:
Potential problems associated with maintaining a strong, robust failover environment is that senior executive management must be persuaded of its merits and ultimately own responsibility for the business wellbeing in a critical business continuity situation:
a) It is not unusual in the corporate world not to have a serious enough stakeholder involvement and mandate – buy in must be given from the highest business & board level. The reason for this traditionally has been that capital expenditure and monthly costs for contingent lines contributes negatively to the quarterly bottom line.
Executive management must fully understand the direct relationship between capex outlay for failover architecture, procedures and preparation versus the potential revenue reputational loss when a disaster hits. Good Business Continuity & preparation can be likened to insurance payments that are made every month: you hate paying them but when your house burns down you thank the universe that you offset the risk of losing all by contributing a relatively small percentage of one’s income.
b) If confidence in recovery and continuity procedures is not strong because aggressive enough testing has not been conducted – simulated testing must be real and taken seriously. If there is doubt at senior executive levels of an organization as to its capacity to respond to serious incidents affecting its business operations then this doubt will be magnified at the technical levels required to bear the burden in an invocation scenario. Very often companies sit back in self satisfaction under the illusion that they are prepared; engaging an external specialist company to ratify given approaches can be of great benefit. Quote to bear in mind: ‘If it ain’t seriously tested it won’t work’
c) The level of automated monitoring is not in place to alert and failover to the contingent instance.
d) Change management does not include promotions of changes and deployments into the DR environment – this is especially true of very subtle optimization upgrades or parameter tuning (for example on databases) that is not mirrored onto the contingent environment.
e) Connectivity to 3rd party vendors or suppliers is not integrated or tested in DR exercises or invocations.
f) Hardware dependencies
:– these can very often be eliminated through the use of Virtual Machine clustered environments layered on top redundant Blade enclosures for instance . This in itself can offer significant cost and ease of management advantages. In availing of this approach. Many companies are now switching to these environments for their day production operations. Fourspiral highly recommends the use of the following products:
Vitual Machine Ware : VMWARE ESX product suite – : click here : hyperlink
Case Study available on VMWARE site click here: hyperlink Coupled with Enclosure & Blade array from Dell Corporation:
NB: ‘Your IT guys are going to love you if you suggest or even mention VMWare’
RTO & RPO Definitions (courtesy of www.drj.com
The point in time to which systems and data must be recovered after an outage as determined by the business unit [or department]. :
The period of time within which systems, applications, or functions must be recovered after an outage (e.g. one business day). RTOs are often used as the basis for the development of recovery strategies, and as a determinant as to whether or not to implement the recovery strategies during a disaster situation. SIMILAR TERMS: Maximum Allowable Downtime.
Fourspiral Staff have been responsible for the design and maintenance recovery of infrastructure serving business processes generating turnover in excess of Euro 2Billion per annum dollars in Europe.
Within South Africa Responsibility for BC and DR continuity for clients with Turnover in excess of R 200 Million per annum
Timed and audited DR tests to ensure recovery objectives and timeframes required by the business and its auditors have been met. Technical Audits such as SAS 70:
( IT & Corporate Audits
Implementing more in depth technical recommendations which stipulated after audit assertions or recommendations are made (to a client) by KPMG. Typically I have often found a lack of clarity in the nexus between procedural recommendations and what should actually be deployed technically to not only remain compliant but ,more importantly, to have a failsafe, robust failover environment for services, systems, servers and staff.
Fourspiral can assist in sourcing staff of co-ordinating local resources for the delivery of specialised continuity projects for companies with locations in areas of Africa that pose challenges in terms of infrastructure and communications.
Continued upkeep of DR process documentation given changes implemented through a company’s Change Management (CM) processes. FS can be responsible for maintaining, the currency of electronic documentation, plans and procedures against every changing technical environments.
Technical system recovery
User-base bring-up and relocation
Interface services between business crisis management team and the user-base.
Interface with all third parties to ensure communications, systems, business objectives are defined.
Special, rapid site microwave communications deployment available for ultra quick WAN/internet/inter-site connectivity.
This level of assistance is offered with the assistance of key partners: FS works with its partners to ensure the best service in any given area, networking, communications, hardware support, expertise technical knowledge.