A December outage that affected mission-critical applications raises questions about the risks of overreliance on a handful of cloud providers.

by Joao-Pierre S. Ruth
Date Published March 17, 2022 - Last Updated January 20, 2023

This article was originally published on InformationWeek.

On December 7, 2021, which should have been AWS Innovation Day at re:Invent 2021, Amazon Web Services instead was contending with yet another regional outage that affected vast segments of the internet. Analysts with Forrester and Gartner say while the issue was significant, it was not a reason, nor realistic, to backslide on cloud migration.

According to updates from AWS, the cause of the outage was resolved for the most part after some seven hours. Recovery of services continued after that. Beyond questions about how it happened, concerns turn to what systemic breakdowns in the cloud of this scale mean in a world dominated by a small group of hyperscalers.

AWS indicated the latest outage stemmed from “an impairment of several network devices” that affected the company’s Northern Virginia, US-East-1 Region. The outage struck EC2, DynamoDB, Athena, and Chime, as well as other AWS APIs and services. This caused issues and downtime for third parties such as Disney Plus and Netflix. It also affected Amazon’s own resources, such as its package delivery management software and the Alexa virtual assistant.

If this seems a bit like déjà vu, it should. About one year ago, in late November 2020, the US-East-1 Region of AWS saw an outage that the company attributed to issues, as more capacity was added to its front-end servers for its Kinesis data stream.

While the frequency of such cloud outages has not necessarily increased, the overall impact increases, says Sid Nag, vice president of cloud services and technologies research for Gartner. “This was one of the largest since AWS started conducting business.”

Mission-Critical Apps More Susceptible

Back when organizations mostly ran non-mission critical applications on the cloud, outages could be taken in stride more readily. The migration to the cloud has meant more mission-critical apps are susceptible to such disruptions, Nag says.

“The cloud is a multitenant model,” he says. “Many different organizations were affected, not just IT services.” For example, the latest outage also cut off customers of Amazon Prime Video and Ring home monitoring service. “We’re seeing a bigger impact because of reliance on the cloud,” Nag says.

Consolidation of the cloud landscape has put the responsibility of maintaining this resource on the shoulders of a shrinking set of providers. That concentration may be a point of concern. “When they get impacted’ it’s almost like ‘too big to fail,’” Nag says. “That kind of thing worries me.”

Wait, there’s more. To read the full InformationWeek article, click here.

Joao-Pierre S. Ruth has spent his career immersed in business and technology journalism first covering local industries in New Jersey, later as the New York editor for Xconomy delving into the city's tech startup community, and then as a freelancer for such outlets as TheStreet, Investopedia, and Street Fight. Joao-Pierre earned his bachelor's in English from Rutgers University. Follow him on Twitter: @jpruth.

Tag(s): supportworld, cloud, cloud computing


More from Joao-Pierre S. Ruth :