What Went Down in 2018
Despite whatever bad news you read in this article, we encourage you to remain positive. It’s just human nature to have a curious interest in the calamities of others. So if that’s what gives you your kicks, feel free to enjoy our 2nd annual dispatch on some of the most interesting outages of the past year. Just don’t forget to keep your spirits up.
Major CenturyLink Network Outage to Close 2018
We end the year with a major CenturyLink outage that started early Thursday morning for many and spread across the country, according to reports on Reddit and elsewhere like here and here affecting Internet, 911 services and other internet-dependent services like waves and VoIP. This is also affecting other providers who lease long haul connectivity from them, such as TATA Communications and GTT, to name a couple.
Network Availability – Is it important to you?
Alright, we admit that our company has a certain obsession with network availability we collectively call “uptime”. It’s even in our name. We’re totally committed to keeping services up and running for our clients. And while uptime is our best friend, we seem to spend a lot of time thinking about the enemy: downtime.
Surprising Cloud Adoption Trends
We all know that businesses are moving to the cloud. But how? Anyone who knows the basics of cloud technology is also aware that there are many approaches to the adoption of cloud technology. An enterprise can choose from public, private, or hybrid solutions. They can go with only one cloud provider or they can develop a multi-cloud approach. Knowing how businesses are using the cloud will benefit any company that offers services to them.
Leading Causes of Downtime
IT systems go down for a lot of reasons. Some downtime causes are obvious, while others take some time to understand. And still others are just plain comical. In this article we’ll have a look at different approaches to assigning blame for outages, and we’ll offer a short list of our own. The concept of downtime applies to so many different arenas in the world of IT, and trying to compare them one-for-one doesn’t always work. Let’s start by having another look at what we mean by downtime.
Is Liquid Cooling Worth The Risk?
Water and computers don’t mix, right? So why would anybody want to try to cool computer equipment with water? Lots of reasons. But the first thing you think, of course, is this: “Will it leak?” Well, probably not -- but we’ll get into that. You should know that water and computers are definitely not mutually exclusive. In fact, you might be amused to learn about a 1940s computer that was powered entirely by water. We’ll tell you more about it at the end of this article. But first let’s deal with the matter at hand.
Can Too Much Redundancy Impact Availability?
Good design is one of the keys to system success, but overengineering can bring it all down in an instant. After unabashedly extolling the virtues of redundancy in a recent article, we thought we would ask the next logical question: Is it possible to have too much redundancy?
One Summer Street Data Center Fire
According to Universal hub and other sources, there was a small fire on the 8th floor where UPS systems are housed causing sprinkler system activation. Whether this is entirely correct or not, we cannot confirm, however a discussion on the outages.org mailing list confirmed the impact was affecting multiple carriers in Massachusetts including Windstream and CenturyLink plus MIT confirmed it knocked their OC11 data center offline affecting multiple services.
read moreSelecting the Right Monitors for Your Website
Website monitoring is all about verifying and tracking the uptime, functionality, and performance of a website. There are many ways to accomplish that. It can be done with fully developed software using a graphical user interface (GUI), or with simple instructions in a command-line interface (CLI).
Redundancy: When Too Much is Just Right
Redundancy is indispensable in the world of information technology. Of course, redundancy is not welcome in every aspect of life. But in information technology or aviation engineering, that can be a very good thing.
Proactive IT Maintenance to Minimize Downtime
The next catastrophe could be just around the corner, but if you prepare for it, you might be able to avoid it altogether. If you don’t have a robust proactive maintenance program for your IT environment, it may just be (in the vernacular) an “accident waiting to happen”.
Top 6 Tools for DNS Troubleshooting
Troubleshooting never ends. Problems in network computing can happen at many different levels. One technology that every internet user depends on is DNS, which stands for domain name system. A domain name is an alphanumeric designation for an IP address. DNS servers are the databases that manage the hierarchical domain name system. Sometimes these servers are not configured properly. That’s where the DNS troubleshooting comes it.
What Went Down in 2017
The internet is replete with Top Ten lists and other rankings. But the criteria for distinguishing between #1 and #10 is often no more than personal whim. So with the caveat that these are not necessarily the worst or the biggest, we’ve decided to list and describe some of the most interesting outages in the world of cyberspace in 2017.
Why Ransomware is a Threat to Availability
Ransomware has vividly shown how truly devastating a ransomware attack can be upon businesses and organizations that rely on an omnipresent connection with their customers, users and partners. This is why high availability and disaster recovery solutions are so imperative today. A ransomware attack is indeed a disaster that can take an organization offline and out of commission.
The Advantages of Software Defined Storage
Companies across the globe are rapidly undergoing a digital transformation covering every aspect of their respected organizations, especially the enterprise. This macro process has pushed IT leaders to migrate applications and data to the cloud as well as software defining their own on premise infrastructures. This is being accomplished by incorporating the technology triad of software defined compute (SDC), software defined networking (SDN) and software defined storage (SDS)
Does the Cloud have a Layer 1 or Layer 2?
The focus has changed in recent years from ensuring connectivity to maintaining the availability of our applications. Since these virtual machines now reside in the higher layers of the OSI model, they can be maintained through software defined networking and other cutting-edge technologies
Hybrid Cloud. Hybrid IT. Hybrid Availability.
There is a lot of talk about hybrid these days when it comes to IT. For the past five years or so, the Hybrid Cloud has been a hot topic as organizations are now open to hosting their digital services and data beyond the walls of the datacenter perimeter. Recently, the new approach of managing IT within the enterprise, called Hybrid IT, has come to the forefront.
Minimize risk. Maximize availability.
Businesses take risks. It comes with the territory. But that doesn’t mean that an enterprise should push blindly forward, ignoring the potential threats to availability and ultimately its success. Risk assessment is essential to understanding the territory and blazing the trail ahead. And risk mitigation is the key to controlling those factors that endanger IT uptime. It all starts with a framework.
Root Cause Analysis to Maintain Uptime
Root cause analysis is an excellent tool for keeping your IT infrastructure healthy. You may need some in-depth troubleshooting to correct an ongoing issue. Or you may be tasked to do a postmortem on a problem that is already resolved. RCA is also a very good approach for dealing with intermittent issues. Whatever the situation, root cause analysis can be your best friend if you are trying to keep an IT service up and running well.
Server Hardening for Security and Availability
Server hardening is a necessary process. And it’s a never-ending one. From the moment you pull the machine out of the box (or create it in the virtual environment), it pays to be thinking about security. But server hardening can do more than keep your machine safe. It will help with performance, and it can even play a part in keeping your machine online and available.
Decrease Downtime with Change Management
Service providers do everything they know how to avoid downtime. Generally the best practice is not to touch a live network. If it ain’t broke, don’t fix it. But change is inevitable, and eventually every network or system will need improvements. The trick is to handle these changes with little to no disruption of running services. That’s the purpose of change management.
Does Convergence Impact Uptime?
One of the biggest trends in data center infrastructure is convergence. Actually it has been happening for some time. Equipment footprint has been getting smaller for years. Functions that used to be handled by huge dedicated machines are now accomplished by modular cards. Specialized servers, switches, routers, and other network devices have been combined into multi-service boxes. Now with the advent of virtualization and the cloud, the footprint is getting even smaller. But with higher convergence comes a significant increase in complexity. And when complexity increases, availability usually suffers.
Website Down? Understanding Why
“The website is down again!” That can be pretty frustrating. In the heat of the moment, most of us don’t really care why it is down -- we just want back online. But the curious user may want to know more. What could make a web server unreachable? Why do they go down in the first place? To understand more, we should start with some basics.
Making a Case for Cloud and its Disruptive Benefits
Once again, society is on the cusp of witnessing another disruptive influence that has the ability to change the course of the business world. Thankfully, the phenomenon known as “cloud” (which includes computing, infrastructure-as-a-service, software-as-a-service and much more) has few doubting its abilities to disrupt the way we conduct business today.
The Essence of Uptime
Uptime is a key performance indicator (KPI). Some would say it is the key performance indicator, the sine qua non, of productive computing. If you can’t keep your system operational, you have nothing. None of the many functionalities – the bells and whistles – matter one whit if your customers can’t access your site or service. The expectation in the industry is for near 100% uptime.
EPO: Emergency Power Off or Extremely Probable Outage?
Have you ever thought about the necessity of EPO buttons in data centers? If you think they are required by law, you are incorrect. Sadly, many believe they are, including data center designers, and they still result in outages even today. We think EPO should be an acronym for Extremely Probably Outage.
Cloud Service Level Agreement Expectations
Service Level Agreements (SLAs) are essential in the IT business. IT managers should know what to expect from their service providers. You should fully understand any SLA before signing it – especially when it comes to uptime. Everyone who has been around IT support has heard of Service Level Agreements. As we consider what is expected regarding uptime in SLAs, it might be helpful to briefly have a look at what SLAs are and their purpose
Downtime is no Longer Acceptable
If you went to bestbuy.com and the site was unavailable, how long would it take for you to go to amazon.com or elsewhere to find what you wanted? On average, it’s less than 30 seconds; it used to be much longer, but our society has grown impatient. If you’re not available when customers are looking for you, they will move on.
What is The Cloud? A Technical Explanation
The Cloud – we hear that phrase thrown around a lot. It is obviously a special place because nearly every company wants to go there, probably because we hear how wonderful everything works in the cloud. Those who go there are promised a great deal of cost savings as well. No wonder everyone is talking about it.
Digital Realty / Telx Atlanta Power Outage
On July 12th a major power event occurred at the prominent carrier hotel at 56 Marietta Street responsible for network interconnections for a significant portion of the southeast including over 60 carriers and over 100 telecom providers. The incident occurred during planned power distribution upgrades and created quite a number of issues for both regional and national organizations
read moreThe Need for Increased Availability is Now
Our predictions for the last half of 2017: Ransomware will keep evolving, the rise of IoT will pave way for increased DDoS Attacks, IPv6 Traffic will continue to grow exponentially, Machine Learning and AI will be applied to enhance security, and the need for increased availability is now.
The True Costs of Downtime for IT
Downtime is a dirty word in the IT business. Unplanned outages are unacceptable and should not be tolerated. In a universe where customers expect services to be available 99.999% of the time, any time your IT service offering is down is costly to your business. And the true cost of downtime may be more than you realize.
Explosion in Downtown Los Angeles Disrupts Data Center Operations
On Thursday night August 20th, 2015, a blast occurred in the basement of 811 West Wilshire Blvd. in downtown Los Angeles taking out an on-site power station leaving 12 buildings in the area without utility power for much of Friday, as reported by ABC News and other sources. The outage significantly affected major network provider Level 3 Communications, LogMeIn, major data center providers Internap and Equinix and several other companies...
Load Balancing Droplets from Digital Ocean
We recently helped a customer who had dozens of virtual machines (called Droplets) from Digital Ocean spanning multiple countries. Digital Ocean doesn’t have a load balancer, but even if they did, he was looking for a way to load balance traffic in each of 3 geographically diverse regions all without having to use DNS. He was using round-robin DNS to distribute the load among the group of servers, but it wasn’t working anywhere near the way he wanted it to, often distributing load unevenly.
Reroute or redirect IP traffic from one data center to another for disaster recovery
Many organizations have a business continuity or disaster recovery plan and have even implemented multi-data center redundancy with servers and other critical infrastructure at a separate location to that of their primary site. But the challenge every organization faces is how to easily and seamlessly redirect traffic from one site to another when disaster strikes.
Enable IPv6 with Cloud Load Balancing
As the global pool of IPv4 space continues to diminish every day, organizations are looking to deploy IPv6 at an ever increasing pace. But sometimes it's just plain difficult, especially when it requires a complete overhaul of your local network. Some organizations can implement dual-stack just fine, but for other organizations who host websites or applications at third party providers such as AWS EC2 and the like, IPv6 may not an option just yet.
Significant Growth Predicted for Hybrid Cloud
In a recent Computer World article, Technology Business Research shared their prediction for cloud growth in 2015. 33% growth rate for private cloud, 25% growth rate for public cloud and a whopping 50% growth rate for hybrid cloud all when compared to 2014 data.
Significant Cloud Outages of 2014
There is no question that the cloud is imperfect. Whether your business uses public, private or a hybrid cloud, outages happen all the time and are a part of life. At Total Uptime, we know this fact quite well, and that’s the primary reason we created our Cloud Networking solutions. Here are some of the biggest cloud outages of 2014 that disrupted service for millions around the world.
Internet Outages Today?
If you heard or felt Internet outages today, it wasn’t just you. It seems many end-users were filing complaints on downdetector.com for a vast number of unrelated ISPs. From our global vantage point, we could see that the Internet of things was acting strange.
Our Cloud is Different Than Yours
A day doesn’t go by where we are yet again annoyed with how flippant organizations are in using the word “cloud” for what appears to be a marketing hype. Take Western Digital and their My Cloud product for which I have seen many TV commercials as of late. It is simply a personal cloud storage device. What’s really “cloud” about it? Other than the fact that you can access it remotely, nothing! It gives honest, legitimate cloud providers like Total Uptime a bad rap.
Downtime costs $7900 per minute, on average
The cost of datacenter downtime has increased more than 40% for many companies over the last 3 years, according to a recent study by Ponemon Institute, sponsored by Emerson Network Power. The report analyzes 67 datacenters...
What Should and Should Not be called Cloud?
We read a great article last month on wired.com called Top 5 Things The Cloud Is Not. It’s rare we see an article like this that helps define what should and should not be called ‘Cloud’, so when we do see something, we like to call attention to it.
read moreAccount Management – A Quick Start Video
Watch this video to understand the functionality found in the Account section of the Total Uptime cloud management portal. Here you'll learn how to modify company information, add/edit and delete users, create roles & security groupings, alert lists and much more.
read moreWhat is IP Anycast, and How Does it Work in the Cloud?
One of the most important aspects of the Total Uptime Cloud Platform is the underlying IP Anycast architecture. Without it, we would not be able to deliver a 100% uptime SLA and the level of performance our customer’s demand. In this article, we’ll explain what IP Anycast is (just called “Anycast”) and how we use it to deliver performance, reliability and uptime to our customers.
read moreNorth American companies give cloud the most business
IT market research firm Gartner says interest in data center services is high around the world, but the market structure, dynamics and maturity are quite varied. In North America, hosting (42 percent) and cloud IaaS have achieved the highest level of client adoption, while the markets in the rest of the world are dominated by data center outsourcing (80 percent).
read moreSecurity… Getting to the Bottom of “Cloudphobia”
Cloud solutions have gotten a bad rap. They have incredible potential to minimize a business’s IT infrastructure, scale to meet rapid demand, support mobile workers, and cut costs, but they have also gained the reputation of being a risky investment. Many people are just not ready to trust a third party to secure their confidential data.
read more