Notable Cloud Outages of 2020

Periodically we talk to someone who says something along the lines of: “we don’t need Total Uptime since we moved to the cloud.” A most interesting statement that is simply the result of someone misinformed about the benefits and capabilities of cloud.

Does the cloud provide incredible flexibility? Yes! Does it let you provision services on-demand at a moment’s notice? Yes! Does it help you save money due to its extreme affordability? Um, no! Are clouds 100% available and uber resilient? Not even in your dreams.

To prove our point on availability, we have compiled our annual list of the most notable outages of 2020. COVID made the entire year beyond horrible for the entire planet, and these notable cloud outages didn’t help whatsoever especially considering more and more users began working remotely in 2020 than ever before.

January

January 27^th – Google G-Suite suffers an outage for just under an hour affecting many users on a Monday morning. more>

February

February 3^rd – Microsoft Teams suffers an outage caused by an expired authentication certificate. more>
February 25^th – Google has a 16-hour outage for Nest video streams and recording that left all users of the security cameras without… security. more>

March

March 3^rd – Microsoft Azure was struck with a 6 hour outage in the US East data center caused by a cooling failure affecting almost all Azure services. more>
March 15^th – Microsoft Azure experiences a power event in the West Central USA region affecting virtual machines, SQL Server and many other services. more>
March 17^th – IBM Cloud was hit with a mystery outage affecting services in the United States as a result of a connectivity issue in a Dallas data center. more>
March 26^th – Google Cloud platform has a 14 hour outage which affected services in multiple regions including Dataflow, Big Query, DialogFlow, Kubernetes Engine, Cloud Firestore, App Engine and Cloud Console caused by a lack of memory in the company’s cache servers. more>
March 26^th – Google services including G-suite, Gmail, Google Drive, Hangouts and Classroom went offline as a result of a significant router failure at a data center in the South Eastern US. more>
March 30^th – European cloud giant OVH suffered a 40 minute cloud outage in France affecting dedicated and virtual servers, domain names, the cloud platform, its anti-DDoS protection and support. more>

April

April 8^th – Google Cloud Platform blames a sweeping outage affecting a vast array of services on IAM API Issues. more>
April 16^th – Cloudflare has a 4 hour outage caused by someone pulling out cables incorrectly when decommissioning hardware at a data center. more>

May

May 12^th – Slack suffers a 48 minute outage affecting the entire platform. Not a big deal unless you’re WFH these days. more>
May 18^th – Microsoft Azure has an outage in the Central India region impacting compute and storage resources. more>
May 27^th – Adobe Cloud has a significant outage affecting use of their software worldwide more>

June

June 9^th – IBM Cloud suffers a two-hour outage of its entire global cloud blamed on an external network provider. more>
June 15^th – Microsoft 365 and Azure suffers an outage in Australia and New Zealand. more>
June 15^th – T-Mobile suffers a massive nationwide voice and data outage and blames a third-party leased fiber network. more>
June 24^th – 30 services on IBM’s Cloud suffer an outage for time ranging from 100 minutes for Continuous Delivery to 19 hours due to a power outage. more>
June 29^th – Google Cloud suffers an outage on its Kubernetes platform and networking services for several hours in their us-east region. more>

July

July 13^th – GitHub started the week with more than 4 hours of downtime. more>
July 17^th – Cloudflare takes out a chunk of the web for about 20 minutes when one of their global backbone routers announced bad routes affecting websites and their popular free DNS resolver service. more>

August

August 18^th – Equinix suffered a major outage at their LD8 data center in London affecting numerous customers in the hosting, cloud and telecommunications sectors including the London Internet Exchange (LINX), one of the world’s largest. more>
August 20^th – Google cloud services including App Engine, Cloud Storage, Cloud Logging and BigQuery suffer a few hour outage. more>
August 24^th – Video conferencing provider Zoom had a three-hour outage affecting many of their 115 million daily active users. more>
August 30^th – CenturyLink / Lumen / Level3 or whatever they are called today knocks out web giants and 3.5% of all internet traffic. more>

September

September 9^th – IBM Cloud suffers an outage in their Sydney data center after it loses power in “multiple racks”. more>
September 14^th – Microsoft Azure suffers a 4+ hour outage in one of its southern UK zones caused by a cooling system failure. more>
September 28^th – Microsoft Azure Active Directory suffers a 3-hour outage affecting Office, Outlook, Teams. more>

October

October 1^st – Microsoft’s Exchange Online service suffers another global outage. more>
October 7^th – Microsoft Office 365 and Azure suffer a 4+ hour outage affecting Teams, Outlook, SharePoint and OneDrive. more>

November

November 5^th – Microsoft Exchange Online suffers a 12 hour outage for many users around the globe. more>
November 5^th – GoDaddy-owned 123 Reg has a six-day DNS record-edit outage. more>
November 11^th – Microsoft’s online game services hit by an outage on Xbox debut. more>
November 25^th – Amazon Web Services (AWS) outage on its Kinesis data streaming service impacted major customers including Roku, Adobe, Flickr, Glassdoor, Autodesk, The Wall Street Journal, 1Password and others including Amazon’s own home security camera company Ring. more>

December

December 9^th – Google has an 84-minute outage at their Europe-west2-a zone causing 60% of virtual machines within the zone to be unreachable from the outside world due to a bad ACL that caused BGP routes to withdraw and the Europe-west2-a zone to become isolated and inaccessible.
more>
December 14^th – Google suffers a 50-minute outage caused by a capacity issue in their central identity management system. This resulted in outages affecting Cloud Console, Cloud Storage, Google Kubernetes Engine, Gmail and more. more>
December 16^th – Google has a full-on Gmail outage which was more than just delaying email delivery or access to mail, they were permanently bouncing messages. more>
December 17^th – Google has another outage affecting Nest, much like the one in February, except significantly shorter at just over two hours. more>

Our 2020 Conclusion

What can we really say to sum up a long list of major cloud outages that transpired in 2020? We haven’t even touched on the hundreds or perhaps thousands of smaller events that impact enterprise application availability every day on a smaller scale, insufficient to make the news but sufficient to injure brands significantly.

The bottom line is that outages happen continually, and we can assure you that they will never cease. Cloud services in general may be more reliable, on average, than on-premise services, but the impact to availability when they fail is enormous. As more and more organizations continue to put everything in the cloud (all eggs in one basket, from our perspective) we would recommend a contingency plan. Perhaps that’s multicloud? Or better yet, maybe it is an ADC-as-a-Service layer that will provide a layer of control and resiliency at the right time.

Prevent your next outage now!

TRY IT FREE

Other articles you might like to read:

5 Ways to Increase Application Availability

A service provider that offers software-as-a-service or another cloud-based solution should understand what customers are looking for and what compels those very customers to choose an off-premise, “cloud-based” solution vs. the more traditional on-premise, self-hosted solution. As a cloud service provider ourselves, we set out to understand how our customers went about choosing one service […]

Can Too Much Redundancy Impact Availability?

After unabashedly extolling the virtues of redundancy in a recent article , you may be wondering why we would follow up with another post questioning whether sometimes too much (redundancy) was just too much. Credit fellow staffers for the suggestion that we revisit the issue. The problem was clearly a part of our initial research, and it deserves […]

Finding Your Network’s Weakest Link

High availability is the nirvana of network engineers. When everything is humming, monitoring screens are green, and no notifications are in their inbox, network managers can spend time dreaming about how big their Christmas bonuses might be this year. But an unforeseen problem can interrupt their dreams and pop their bubbles at any given moment. […]

Data Center Power Reliability

In some respects, a load bank test of a data center is much like a road test of an automobile. Suppose you are in the market for a car, and you meet with a dealer or a private party to check out what they have to offer. You start it up, look under the hood, […]