Service Level Agreements (SLAs) are essential in the IT business. IT managers should know what to expect from their service providers. You should fully understand any SLA before signing it – especially when it comes to uptime. Let’s explore.
Everyone who has been around IT support has heard of Service Level Agreements. As we consider what is expected regarding uptime in SLAs, it might be helpful to briefly have a look at what SLAs are and their purpose. A Master of Project Academy blogpost gives us a basic description: “A service level agreement states what the two parties want to achieve with their agreement along with an outline of the responsibilities of each party including expected outputs with performance measures.”
In business, everyone must look out for their own interests. It’s a dog-eat-dog world out there. A SLA allows each side to clarify what they want out of the relationship, and it specifies responsibilities and even penalties when one side does not hold up to the bargain. As the blog post indicates, three types of SLAs include service-based, customer-based, and multi-level.
One of the key components of any SLA is the service requirement. This could be fashioned in different ways, with a multi-tiered service offering consisting of different levels of support. Each tier may have its own service level, response time requirement, and set of responsibilities. Usually key performance indicators (KPIs) help to measure performance. But what should be measured?
First and foremost, clients of IT service providers want to be sure that the service remains up. The industry standard is five 9’s, or 99.999% availability. But not every service provider offers that. In fact, when viewed over an entire year, what many companies offer can leave customers down for much longer than they think.
Consider a service provider who offers 99% uptime in their SLA. Sounds good, right? Think again. A service level of 99% uptime can leave your service down for many hours at a time – all within contract limits.
Take a look at the accompanying chart. At 99%, your IT service could be down for almost 88 hours in one year. This shows how important the actual number is.
Uptime Percentage | Average Annual Downtime |
---|---|
99% | 87 hours, 40 minutes |
99.9% | 8 hours, 46 minutes |
99.99% | 52 minutes, 36 seconds |
99.999% | 5 minutes, 16 seconds |
99.9999% | 31.6 seconds |
It’s clear from the figures that there are big differences in the service levels offered in the market. The service level at 99.9% support actually offers considerably less downtime than the 99% level. If you hadn’t seen it laid out this way, would you have realized that there were such big gaps? Does the service level that your provider currently offers match your expectations?
Let’s take a look at a few SLAs on the market. Amazon Web Services says that if your monthly uptime percentage drops below 99.95%, you are entitled to a service credit of 10%. If it slips below, 99.0%, then you get back 30%. Microsoft Azure’s cloud services SLA provides two-tiers of reimbursement: 10% service credit for less than 99.95% uptime and 25% credit for less than 99% uptime. Salesforce.commight not even have an SLA, and comments that they have proven reliability of 99.9+ percent.
All of these calculations assume that we know what downtime is. Downtime is generally viewed as any time that a system or service is not working. Companies do everything they can to maximize uptime. But what the customer sees and what the IT service provider sees may be two different things.
A blog post from the company CloudEndure gives some interesting insights into the nature of downtime. The article is called “5 Things They Never Told You About Downtime”. Their warning is worth our attention: “Unfortunately, it’s very easy to fall into a false sense of security when it comes to service availability in the cloud.” Here’s their list:
The most obvious issue is #4. There will be times that the service you are running is down even though everything on the supporting IT infrastructure is up. Are you prepared to deal with those circumstances? What if you find yourself in a disagreement with your service provider when he says, “I see up, you see down.” Who is right? And who is responsible? We’ve seen this at Total Uptime, I’m afraid. Our Load Balancer IP is up and receiving traffic, but the customer’s servers behind the scenes are all down.
And what good is a service if the quality is so poor that it is unusable? Quality of service issues may never show up on any uptime monitoring statistics.
Getting a handle on downtime is an important part of SLA management. If you can’t clarify downtime, how will you know what uptime is?
We’ve spoken in general terms about uptime and downtime and SLAs. But every IT infrastructure is different, and service requirements vary according to business needs. It’s true that moving to the cloud has made people start thinking about service or application availability rather than network availability. But the principles of the SLA are the same. Does the service level provided meet your expectations?
Remember, each side fights for their own interests. Users want a well-functioning service, and providers want to justify billing for the services they offer. Take a good look at your SLA. Be vigilant. Never let down your guard. It’s your business.