Cloud Service Level Agreement Expectations

Service Level Agreements (SLAs) are essential in the IT business. IT managers should know what to expect from their service providers. You should fully understand any SLA before signing it – especially when it comes to uptime. Let’s explore.

SLA Overview

Everyone who has been around IT support has heard of Service Level Agreements. As we consider what is expected regarding uptime in SLAs, it might be helpful to briefly have a look at what SLAs are and their purpose. A Master of Project Academy blogpost gives us a basic description: “A service level agreement states what the two parties want to achieve with their agreement along with an outline of the responsibilities of each party including expected outputs with performance measures.”

In business, everyone must look out for their own interests. It’s a dog-eat-dog world out there. A SLA allows each side to clarify what they want out of the relationship, and it specifies responsibilities and even penalties when one side does not hold up to the bargain. As the blog post indicates, three types of SLAs include service-based, customer-based, and multi-level.

One of the key components of any SLA is the service requirement. This could be fashioned in different ways, with a multi-tiered service offering consisting of different levels of support. Each tier may have its own service level, response time requirement, and set of responsibilities. Usually key performance indicators (KPIs) help to measure performance. But what should be measured?

SLA Uptime Metrics

First and foremost, clients of IT service providers want to be sure that the service remains up. The industry standard is five 9’s, or 99.999% availability. But not every service provider offers that. In fact, when viewed over an entire year, what many companies offer can leave customers down for much longer than they think.

Consider a service provider who offers 99% uptime in their SLA. Sounds good, right? Think again. A service level of 99% uptime can leave your service down for many hours at a time – all within contract limits.

Take a look at the accompanying chart. At 99%, your IT service could be down for almost 88 hours in one year. This shows how important the actual number is.

Uptime Percentage	Average Annual Downtime
99%	87 hours, 40 minutes
99.9%	8 hours, 46 minutes
99.99%	52 minutes, 36 seconds
99.999%	5 minutes, 16 seconds
99.9999%	31.6 seconds

Shopping for Uptime

It’s clear from the figures that there are big differences in the service levels offered in the market. The service level at 99.9% support actually offers considerably less downtime than the 99% level. If you hadn’t seen it laid out this way, would you have realized that there were such big gaps? Does the service level that your provider currently offers match your expectations?

Let’s take a look at a few SLAs on the market. Amazon Web Services says that if your monthly uptime percentage drops below 99.95%, you are entitled to a service credit of 10%. If it slips below, 99.0%, then you get back 30%. Microsoft Azure’s cloud services SLA provides two-tiers of reimbursement: 10% service credit for less than 99.95% uptime and 25% credit for less than 99% uptime. Salesforce.commight not even have an SLA, and comments that they have proven reliability of 99.9+ percent.

Understanding Downtime

All of these calculations assume that we know what downtime is. Downtime is generally viewed as any time that a system or service is not working. Companies do everything they can to maximize uptime. But what the customer sees and what the IT service provider sees may be two different things.

A blog post from the company CloudEndure gives some interesting insights into the nature of downtime. The article is called “5 Things They Never Told You About Downtime”. Their warning is worth our attention: “Unfortunately, it’s very easy to fall into a false sense of security when it comes to service availability in the cloud.” Here’s their list:

SLAs are not always met
Downtime increases with multiple services
Downtime of third party services
Cloud is up, service is down
Bad quality of service

The most obvious issue is #4. There will be times that the service you are running is down even though everything on the supporting IT infrastructure is up. Are you prepared to deal with those circumstances? What if you find yourself in a disagreement with your service provider when he says, “I see up, you see down.” Who is right? And who is responsible? We’ve seen this at Total Uptime, I’m afraid. Our Load Balancer IP is up and receiving traffic, but the customer’s servers behind the scenes are all down.

And what good is a service if the quality is so poor that it is unusable? Quality of service issues may never show up on any uptime monitoring statistics.

Getting a handle on downtime is an important part of SLA management. If you can’t clarify downtime, how will you know what uptime is?

Conclusion

We’ve spoken in general terms about uptime and downtime and SLAs. But every IT infrastructure is different, and service requirements vary according to business needs. It’s true that moving to the cloud has made people start thinking about service or application availability rather than network availability. But the principles of the SLA are the same. Does the service level provided meet your expectations?

Remember, each side fights for their own interests. Users want a well-functioning service, and providers want to justify billing for the services they offer. Take a good look at your SLA. Be vigilant. Never let down your guard. It’s your business.

Prevent your next outage now!

TRY IT FREE

Other articles you might like to read:

5 Ways to Increase Application Availability

A service provider that offers software-as-a-service or another cloud-based solution should understand what customers are looking for and what compels those very customers to choose an off-premise, “cloud-based” solution vs. the more traditional on-premise, self-hosted solution. As a cloud service provider ourselves, we set out to understand how our customers went about choosing one service […]

Redundancy: When Too Much is Just Right

Redundancy is indispensable in the world of information technology. Of course, redundancy is not welcome in every aspect of life. If your company doesn’t need you anymore and makes you “redundant”, you’ll have to look for another job. Poorly written text may be credited to the Department of Redundancy Department. The concept of redundancy is […]

Does Convergence Impact Uptime?

One of the biggest trends in data center infrastructure is convergence. Actually it has been happening for some time. Equipment footprint has been getting smaller for years. Functions that used to be handled by huge dedicated machines are now accomplished by modular cards. Specialized servers, switches, routers, and other network devices have been combined into […]

5 Things We Do That AWS Route 53 Does Not

Total Uptime’s DNS Service along with our DNS Failover solution are often compared to Amazon Route 53, and for good reason. Organizations are increasingly looking for a reliable DNS provider in light of frequent outages at various Domain Registrars like Network Solutions. IT experts understand that because DNS is the first link in the chain, it must be the […]