Tuesday, December 11, 2012

Business Impact Analysis and Disaster Recovery

The readers may be wondering how backwater a Malaysian author can be to still write about Disaster Recovery when we’re hitting 2013. Well, I figured that with the end of the world hype, floods and political events in Thailand as well as the Japanese earthquake and tsunami, the topic may just be in vogue again. Also, I’ve been working on a Disaster Recover improvement solution and felt that there are still a number of customers not sufficiently aware of what’s best for their organization.

What I really want to expound is the lack of emphasis on the practical side of acting on a Business Impact Analysis (BIA) result. I am assuming the readers are aware of a BIA and have had professionally trained teams walk through the exercise with you; where the CIO and business owners are able to identify IT services, applications and infrastructure that are supporting critical business function. Note; the IT piece of the puzzle is NOT critical unless the business part is.

The outcome desired is the reduction and mitigation of business impact, failing which a contingency is in place. So like all good IT Outsourcer, the vendor has to advise on what’s best for the business, and not how much money the outsourcer can make by overselling hot site synchronous data replication as the solution for all IT ailments.

The figure below illustrates “anecdotal” evidence of risk probability versus event categories (once readers pony up their SLA data to your’s truly and I’ve received at least 120 data points, I’ll be more than happy to provide an accurate statistical sampling of the probabilities), you will see that the largest occurrence of business disruption does not come from an actual Disaster incident but a Type A risk – failures due to operational weaknesses.

Why boil it down to operational weaknesses?

Simply; if you could stress test and quality assure your applications you will reduce the amount of catastrophic bugs that add extra zeroes into your accounting software, risking customer lawsuits as well as incidents where an overworked operator ignored a bunch of RAID drives with parity errors teetering at going blinkers on the 24/7 banking application.

I’ve personally went through an incident in my previous incarnation that led to a 48 hour email outage because of an operational change incident where no Disaster Recovery can salvage.

Type B risks are much easier to resolve because it is an elimination of single point of failures. For example, a transaction server running 1,000 transactions a second, each committing income to the company; but unfortunately, resides on a RAID-0 setup with only a single core switch and a single instance server.

Although rare in this day and age it may still be possible if you look deep enough into how your IT services value chain is setup.

The figure above is highly simplified but when you spread eagle the entire IT ecosystem, any one of those single point of failures can lead to a money losing outage of critical business function. It gets worst when there’s no parts sparing in place and the last backup tape has never been tested for restoration. Suffice to say, the organization won’t have type B risks if it didn’t have an IT outsourcer exhibiting Type A behaviour.

Type C risks are arguable; because organizations which have more direct relations with activities that are deemed “sensitive or sensationalist” may suffer from external or even internal attacks. Strangely, Malaysian and government link sites tend to suffer a spike in hacking or defacing attacks during international level football matches! (You will need to speak to the good folks at Malaysia CyberSecurity for the details)

Finally type D risks; the actual DR solution to mitigate a full blown disaster, is it necessary – YES. But don’t do it because it is mandated by the central bank, do it because you’ve done your homework. Before you spend money on the DR solution, you may wish to allocate your hard earned IT budget to mitigate the biggest risk to business operations. More importantly, find an honest IT Outsourcer that has your interests at heart to work on the solution together.

To summarize, mitigate Type A risks with a top notch IT Outsourcing team and adequate process adherence. Mitigate Type B risks by eliminating single point of failures, Type C risks with security hardening and lastly Type D, when you’ve done what you can with all of the above.

1 comment:

  1. This is a very interesting article. My company has been looking for information on disaster recovery services since our system went down last month. Thanks so much for all the helpful and useful information.