Failing to deliver content to an end user, even for a few minutes, has a crippling effect for eCommerce businesses and other sites that heavily rely on their content. Using a Content Delivery Network (CDN) has so far been seen as the safest way to deliver content to end users, as it boosts performance, increases availability and reduces load on servers that produce dynamic content.
However, despite the confidence we have in the scale and size of CDN infrastructures the reality is that CDNs can and do fail, and according to the figures discussed by Cedexis at the 2014 Content Delivery Summit, there are literally hundreds of events across the globe that lead to tens of hours of outages each month affecting millions of end users.
So how can we survive these outages and ensure that we consistently deliver content to our end users?
Fall-over to Failover
We might consider removing the single point of failure using a failover CDN that is activated following a principal CDN outage. We initially used this strategy at Amplience, but when you consider how CDNs work you soon realise that this approach has lots of limitations.
To explain, when an end user opens a web page the browser requests the content from the CDN, which first checks for the content in the nearest cache to the end user. If the content is not cached it then retrieves it from your platform’s servers (the origin) and passes it to the end user, while at the same time storing the content in its caches – thereby accelerating delivery for this content in the future. Over time this process propagates a site’s content through the network of CDN caches. CDNs with cached content are considered ‘warm’ and likewise empty CDNs are ‘cold’.
Failing over when the principal CDN fails results in switching to the completely cold standby CDN. This reduces performance until the CDN is warmed, which is better than failure, but if you deliver content dynamically you also need to consider the massive influx of traffic to your origin servers until the failover CDN is warmed.
Other problems with the failover approach specifically while trying to maintain SLAs or failing over automatically include:
At Amplience we initially used Amazon CloudFront CDN as failover, its architecture is completely different compared to the big infrastructure based CDNs, making it less likely to have the same type of outages. It is fast and easy to set up, it is a simple pay-as-you-go model fitted perfectly with failover requirements that can boast some great performance stats particularly with origins based on AWS.
CDN Balancing Act
Load balancing is a technique that gets past the issues of failing over to a cold CDN by distributing all the requests for content across two or more CDNs, keeping them both warm. In the event of an outage the traffic is directed away from the offending CDN and served by the remaining warm load balanced CDN(s). Although this approach of basic load balancing removes many of the issues associated with failing over to cold CDN caches it does not provide a complete solution.
So what’s missing? Basic load balancing thinks of the CDN as a single cloud service that is either up or down, but this is not the case. A CDN can be performing extremely well in one city or ISP while being down in another. To understand why, you need to realize that the Internet is a messy set of networks (e.g. Tier 1 Carriers & ISPs) connected with various changing business arrangements (peering). This means that content has to make several hops across various networks before reaching the CDN, and any interference in these hops can result in a CDN outage for an end user for a particular ISP or location.
To combat this, the load balancing has to be much smarter and route the traffic based on the end user’s Internet connection. One of the techniques that can be used to detect localised CDN connection issues is Real User Monitoring (RUM), which records each end user’s interaction in context with a CDN. A load balancer can use this information to choose the best CDN for each user based on their actual Internet connection.
To achieve a 99.99% SLA for Amplience we decided to take the Smart CDN load balancing approach using Cedexis as our partner. We chose Cedexis as it brought together the elements of CDN load balancing and Real User Monitoring allowing us to create sophisticated rules to balance content delivery based on the user’s connection. It also gives us the flexibility to use additional CDNs that specialise in content delivery for specific geographical locations such as China.
Here are the approaches I’d recommend if you are looking for content availability up to certain thresholds:
This blog originally appeared on John’s own blog: CTO Dilemma.
Call toll free 866 623 5705
or +1 917 410 7189
Call +44 (0)207 426 9990