One of the most important aspects of the Total Uptime Cloud Platform is the underlying IP Anycast architecture. Without it, we would not be able to deliver a 100% uptime SLA and the level of performance our customer’s demand. In this article, we’ll explain what IP Anycast is (just called “Anycast”) and how we use it to deliver performance, reliability and uptime to our customers.
Anycast is not a protocol or proprietary technology requiring special capabilities in servers, clients or networks. It is simply a configuration methodology for BGP described lightly in RFC1546. It has been the basis of large-scale (and mostly static) content distribution networks since at least 1995 and today is being used more widely by large organizations for global redundancy in other areas as well.
Anycast is often confused with Multicast, but for good reason. From an IP standpoint, Anycast might look like Multicast until the connection stage. Multicast is one-to many and allows a client to connect to multiple nodes simultaneously. Naturally, the protocol must support multicasting, so a typical use for multicast is streaming audio or video, for example, or something like the peer-to-peer file sharing network – BitTorrent – which allows a client to download a file in chunks from multiple hosts simultaneously. While the Total Uptime cloud fully supports multicast for applications that require it, we won’t go into detail in this article.
Anycast is similar to Multicast, except that the client connects to a single node, even though multiple nodes may advertise their availability to deliver the service. However, it is important to note that the client may not know of multiple nodes and assume there is only one, which is by design.
While we’re at it, we might as well mention Unicast. It is, of course, your most standard client/host configuration. A single source announcing its availability to provide the service and the client only has one option but to connect to that single host or no host at all. That “host” could certainly be a cluster of devices, but they are all at the same location.
At its core, Anycast is actually quite a simple concept if you remove ‘behind-the-scenes’ tunnels and monitoring, which we will discuss shortly. Essentially, multiple cloud nodes or instances of a service announce and share the same publicly accessible IP address. So for example, the IP address of 22.214.171.124 would be advertised for the cloud node in Singapore at the same time as it is being advertised for the node in London, New York and others.
The routing infrastructure directs any packet to the topologically nearest instance of the service based on BGP paths, which from a router perspective, is no different than any other network looks. When the router near the client requests the path to the IP, it receives various advertised routes and simply chooses the one with the shortest path. In traditional networks all paths lead to the same destination, but in an Anycast topology, all paths might lead to different destinations, but the router doesn’t care and technically has no knowledge of the fact that different paths might lead to different destinations. It simply and consistently chooses the best path each and every time, unless it disappears in which case another path will become the best one.
There are quite a number of benefits to implementing an Anycast network.
The behind-the-scenes implementation is where IP Anycast becomes a little more complex. In a stateless configuration it isn’t as critical, but where state is essential, content synchronization becomes the principal engineering concern. Total Uptime Technologies’ Anycast network is not only comprised of public-facing networks, but back-end private-line tunnels designed to route traffic from node-to-node in the event of failure or to maintain connection state between client and server.
Typically, cloud nodes or server clusters within a node share a common virtual interface attached to their loopback devices and speak an IGP routing protocol to an adjacent BGP-speaking border router. Monitoring of the service ensures that in the event of a failure, routes can be withdrawn immediately to re-route traffic. Once a cluster architecture has been established, additional clusters can be added to gain performance, implement load distribution or failover between them either locally, regionally or globally.
The biggest caveat with implementing an Anycast network properly is the complexity of managing route announcements. You must ensure that announcements are evenly spread over equal-quality providers, and you should use BGP communities and other traffic engineering techniques to maintain proper traffic routing. You must also avoid static routes that could create black-holes during a failure, and focus on more automated approaches with IGP and BGP.
Secondly, you must make certain that in the event of any customer impacting event, no matter how short, route announcements must be withdrawn automatically and until such time as that propagates, behind-the-scenes routing must be utilized in order to divert traffic to alternate nodes maintaining availability and if necessary, session state.
The bigger the network, the better it becomes, but the more critical automation is to its success.
Some organizations believe that a hybrid approach (Anycast and Unicast together) is the best way to deal with Anycast complexities, but we strongly disagree. The inherent problem of an IP address being tied to a physical location (Unicast) does not disappear when combining the two. Yes, it may create some level of redundancy when all systems are online, but during an outage it has the potential to make things worse. Total Uptime Technologies’ dual-stack Anycast network provides a quadruple-level of redundancy that also makes outages completely transparent to the end-user.
The primary reason for avoiding Unicast altogether is due to the fact that any long-lived, persistent TCP transaction would not be re-routed in the event of an outage because the IP address would be inaccessible. Even in simpler applications such as DNS where transactions (in this case queries) are very short-lived and where resolvers generally try additional name servers in the event the first one fails, a downed Unicast node causes ‘time-outs’ of up to 5 seconds while resolvers ‘rotate’ through the list. This does not stop until the Unicast node is back online or the authoritative name server is changed or removed from the root servers, which has the potential to take up to 48 hours to propagate the Internet. Anycast completely solves this problem by ensuring that the IP Address given for a name server always routes and resolves to a functioning server or cluster.
A properly built and well-maintained and monitored Anycast network is the only way to go. It greatly improves the performance and resiliency of a cloud network provided it is properly designed, maintained and proactively monitored.
Wikipedia – Anycast
Anycast Addressing on the Internet by John Kristoff
Deploying IP Anycast – Presentation Resource Page at CMU
Deploying IP Anycast – Ken Miller CMU Network Group NANOG29 – Oct. 2003
On the Use of Anycast in DNS – Sandeep Sarat, Vasieios Pappas, Andreas Terzis 2004
Best Practices in DNS Anycast Service-Provision Architecture Bill Woodcock Gaurab Raj Upadhaya – March 2006
Configuring Anycast DNS
Best Practices in IPv4 Anycast Routing v1.0 by Bill Woodcock August, 2002
Anycast DNS: The Secret to High Availability Whitepaper by Secure64
Running a high-availability hosted DNS Server Service is not a trivial task. DNS is vital to the operation and success of the Internet yet it is an often-overlooked component that is usually taken for granted. What seems so simple on the surface can create so many issues when it malfunctions. Using a Free DNS Service Most small […]
One of Total Uptime’s largest assets is our global cloud platform, deployed in dozens of datacenters around the world with incredible cloud based routing capacity. This platform gives our customers the ability to control and route traffic between the client and the datacenter, in the middle of the Internet. As you can imagine, this provides a […]
I recently read an interesting article by InformationWeek detailing the Top 10 complaints about Cloud Computing by IT professionals. This article brings to the forefront the fact that cost, security, and performance are still the main concerns in Cloud Computing. It also paints the picture of why our work in the space is so important and points […]
One of the biggest trends in data center infrastructure is convergence. Actually it has been happening for some time. Equipment footprint has been getting smaller for years. Functions that used to be handled by huge dedicated machines are now accomplished by modular cards. Specialized servers, switches, routers, and other network devices have been combined into […]