Cloud-based services give new meaning to the IT holy grail of “cheaper, better, faster” in the right circumstances. You might not even have to settle for just two. But it is important not to let the Cloud fog your thinking when it comes to configuring mission-critical IT-enabled services: adequate failover capabilities, and service levels that will support the operational imperatives of the business, are as important as ever.
It is typical, if not the norm, for Cloud service providers to offer only a single contractual service level – Availability – and then to define it in a way that wouldn’t pass the sniff test in a traditional IT services contract. For example, it is not unusual for a Cloud service’s Availability standard to be exceedingly low by customary data center standards – 98% or even 97% (versus 99.999% or even 99.9999%) – and then to make an already weak standard even weaker by contractual devices such as:
- Excluding downtime during the provider’s weekly maintenance window -which may span 2 days or more during the weekend, with no limit on how long the service can be taken down during that period,
- Excluding so-called “brief” outages – any outage of a few (e.g., 5-15) minutes or less in duration, and
- Providing that performance against the standard is measured over a quarter (or an entire year in some cases) instead of a month
A 98% Availability standard measured over a quarter would permit the service to be unavailable for more than 43 hours during a 3-month span, not counting planned maintenance downtime or excluded short-duration outages. Few substantial businesses would knowingly agree to such a low service commitment, but customers of Cloud services do it routinely. The same 98% Availability standard, if expressed as an annual Availability standard, would permit the service to be unavailable for a staggering 172 hours over the course of a year (not counting planned maintenance downtime or excluded short-duration outages) without violating the service level.
Service Level Agreements (SLAs) for Cloud services often contain other customer-unfriendly terms as well, such as:
- Committing only to use commercially reasonable efforts to meet the service levels rather than making a firm commitment to either meet them or give the customer a service credit.
- Offering a very low service credit in relation to the time period over which compliance with the service level is measured – e.g., providing a credit equal to 1/10th of the customer’s monthly bill for the month in which an annual availability standard was violated. This would give the customer a service credit equal to approximately 3 days’ of the Cloud provider’s annual charges if the provider fails to meet the annual availability standard. To be meaningful, the service credit should represent a significant percentage of the provider’s charges for the affected service for the entire time period over which compliance with the service level is to be measured, whether it be a month, a quarter or an entire year.
- Cloud provider SLAs typically provide that the service credit is the customer’s sole and exclusive remedy for any unavailability or non-performance of the Cloud service or other failure by the provider – meaning you can forget about claiming actual damages.
- To receive a service credit, the customer must request it in writing and provide documentation of each service outage or disruption contributing to the service level violation within a fairly short period (e.g., 30 days) of the last reported incident in the service level claim. Although it is conceivable that some customers might take on the burden of documenting and requesting a service credit for one or a couple of long-duration outages that cause a violation of a service level, it’s hard to imagine most customers taking on this burden to request a low-value service credit.
The recent well-publicized disruption in Amazon’s EC2 service is certainly no reason for companies to back away from the extraordinary opportunities offered by Cloud solutions, but it should serve as a wake-up call to enterprises on the importance of configuring their Cloud services in a way that eliminates single points of failure and to demanding operationally meaningful service level commitments, including a meaningful service availability standard and a commitment to respond to and resolve service problems in a timely manner, coupled with meaningful service levels and service credits if the provider fails to meet the service levels.