Here's an example that a former employee of one of the largest outsourcing companies shared with me: Their very large retail customer's website crashed on Black Friday. The application was down for six hours, resulting in a loss of $50 million in revenue. The outsourcer's compensation to the retailer? Six hour's service credit--approximately $300.
The morale of this story? Keep the whole SLA discussion in perspective. It's not going to make you whole if your application is unavailable.
The worst thing about investing too much energy into SLA haggling is that it may distract you from the far more important issue: how to ensure uptime. If you're on the Titanic, and it hits an iceberg and sinks, all of the time you spent negotiating the location and conditions of your deck chair isn't going to help your prospects one bit.
The most important issue is, how should you think about application outage and what are your options for improving uptime?
As a starting point, keep in mind Voltaire's observation: "Le mieux est l'ennemi du bien." Loosely translated, that means, Perfection is the enemy of good. Applied to cloud computing, this might be thought of as "Don't avoid adopting a cloud provider because it can't guarantee 99.999 percent uptime, when one's own data centers fall far short of acceptable uptime."
If adopting cloud computing improves uptime significantly, it's the right thing to do. If there are no actual statistics of the uptime availability of one's own computing environment, that's a telling sign that moving to a cloud provider is a step in the right direction. It may not be perfect, but it's way better than an environment that can't even track its own uptime. Believe me, there are many, many IT organizations with nothing more than earnest assurances about their uptime performance.
Here are some steps you can take to improve your application uptime:
1. Architect your application for resource failure. Perhaps the greatest single step you can take to improve your application's uptime is to architect it so that it can continue performing in the face of individual resource failure (e.g., server failure). Redundancy of application servers ensures the application will continue working even if a server outage kills a virtual machine. Likewise, having replicated database servers means an application won't grind to a halt if one server hangs. Using an application management framework that starts new instances to replace failed ones ensures redundant topologies will be maintained in the event of an outage.
2. Architect your topology for infrastructure failure. While judicious design can protect application availability in the event of an application hardware element failure, it can't help you if the application environment fails. If the entire data center that one's application runs in goes down, use of redundant application designs is futile. The answer in this case is to implement application geographic distribution so that even if a portion of one's application becomes unavailable due to a provider's large-scale outage, the application can continue to operate. This makes application design more complex, of course, but it provides a larger measure of downtime protection.
Sign up for CIO Asia eNewsletters.