Be prepared for problems if - when? - hyperscale turns into hyperfail.
A recent report co-written by Lloyd's of London and AIR Worldwide concluded that a complete outage at one of the top-three cloud service providers (not named, but presumably Amazon, Google and Microsoft), the consequential business losses could total somewhere between US$5.3 billion and US$19 billion.
When Veeam mentioned this in the context of its January acquisition of a cloud-native AWS backup provider N2WS, it made us wonder what would happen if someone or something did manage to take down AWS completely.
Even if AWS customers have the technical ability to restore their workloads elsewhere, where would they find sufficient capacity, given that AWS has something like 44% of the market? It seems unlikely that the remaining companies would have enough idle capacity to take up the slack.
Veeam's ANZ head of systems engineering Nathan Steiner took up the challenge of addressing these issues.
Like any strategic and aligned business decision, a cloud strategy should follow the fundamental principle of applying due diligence using a risk matrix. This requires a clear understanding and identification of the risks, likelihood of occurrence, consequence and overall impact to a loss of system, service, application or dataset, irrespective of where they reside or are being planned to be placed.
AWS offers a highly resilient, scalable and rich set of architectural options and services for customers to meet a significant portion of their functional requirements. They also maintain the lion's share of the public cloud market. AWS's application of that risk matrix has made public cloud services so attractive at so many levels.
Veeam's N2WS make it possible to back up systems running on AWS and then quickly and easily restore them to a different region if an outage at your 'home' region is likely to go on for longer than you can tolerate.
But what if multiple AWS regions failed simultaneously, or if it was taken down in its entirety? Could the other public/managed cloud service providers in combination take up the slack?
It's a scenario that needs respectful consideration, but it is one that could only be invoked by a particularly large Chaos Gorilla.
The other public cloud service providers should have already assessed their ecosystems - interconnected or otherwise - and identified and protected the most critical parts. Even if those providers can continue to operate, AWS's market share is so large that its sudden absence would leave a big capacity hole that could at best be only partly filled by those remaining.
If an entire AWS outage took place, only the businesses, organisations and government service agencies that have focused on highly effective hybrid cloud services and developed business continuity plans for their environments will be able to manage through the impact.
So keep your strategy, planning and execution framework simple in order to minimise the overall business impact of a large-scale AWS outage. Specifically:
1. Apply a 3-2-1-0 philosophy to system, data, application and service availability across a hybrid cloud ecosystem: three instances of your data, across two different storage media, with one offsite, and with zero errors.
2. Ensure you always have control and recoverability over your data, irrespective of its growth, sprawl and criticality.
The question isn't the significance of such an impact being managed by the remaining public cloud service providers, but rather the importance of ensuring effective planning and leveraging of a strategy that incorporates diversity and flexibility. Make no mistake, being "all in" with a single public cloud service provider creates bias. And bias is a blind spot that creates elevated risk, consequence and impact.
It is difficult, perhaps practically impossible, to move 'native' applications that call directly on AWS (or Azure, or Google Cloud) services to another platform. Virtual machines are more easily transferred, but are relative heavyweights. Containers are perhaps the most scalable and viable option, but provider-specific container services involve a degree of stickiness. So there is a trend towards the open source Kubernetes platform that is extensible across all clouds.
However it is achieved, organisations need to ensure that their most critical applications, services and data are managed with availability and service continuity in mind, and this typically means adopting the hybrid cloud model.
A whole of AWS outage would have far reaching ramifications for all services, even across other hyperscale providers and hybrid cloud services, because the Digital Era is underpinned with an interconnected ecosystem.