cloud service provider

AWS suffered a massive outage in the US. (Source – Shutterstock)

AWS outage: What happens when the world’s largest cloud service provider goes offline?

  • AWS suffered an outage.
  • Customers had trouble updating their websites and apps.
  • The outage lasted nearly four hours, with customers unable to reach AWS Support.

For most organizations around the world, having an online presence is essential to daily operations. Most companies have invested a lot in perfecting their online platforms. This includes relying heavily on cloud service providers to ensure their sites function well and can handle daily traffic.

In fact, the cloud infrastructure service revenue in the first quarter of 2023 is reported to be US$63 billion, as reported by Synergy Research Group. Amazon Web Services (AWS) controls 32% of the market share in the world, making it the biggest cloud service provider. This is followed by Microsoft Azure at 23% and Google Cloud at 10%. China’s Alibaba Cloud completes the top four with a 4% market share.

However, what happens when the world’s biggest cloud provider goes offline? Chaos could describe what happened recently, when AWS went down for a few hours.

A check on the outage-tracking site DownDetector showed nearly 11,500 reports at the peak of the outage. Users could not access websites and apps while IT engineers could not connect to AWS to get updates on the situation.

AWS is the world’s largest cloud service provider (Source – Statista)

The AWS outage

According to AWS Health Dashboard, there were increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. AWS stated that engineering teams were immediately engaged and began investigating.

“We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use of other AWS services. Additionally, customers may have experienced authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS,” stated AWS in an update.

The company also highlighted that customers may also have experienced issues when attempting to initiate a Call or Chat with AWS Support. The issue initiating calls and chats to AWS Support and the underlying issue with the subsystem responsible for AWS Lambda was eventually resolved.

“At that time, we began processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services. As of 3:37 PM, the backlog was fully processed. The issue has been resolved and all AWS Services are operating normally,” the update said.

With folks in Asia asleep, businesses in the US experienced a big hit from the downtime. Social media channels were filled with screenshots of companies announcing their unavailability. Among the companies affected by the outage included streaming platfoms Netflix and Disney+. Fast food chains like McDonalds, Taco Bell and Burger King as well as news organizations like The Verge and The Boston Globe were also affected.

New York’s Metropolitan Transportation Authority similarly reported on Twitter that its website and app are “temporarily unavailable because of an Amazon Web Services outage.” The transit agency said it would post alerts about train and bus service disruptions on Twitter.

While AWS had resolved the outage and services were back online, the damage had already been done. There has been no official report yet as to how much the downtime will cost but in 2017, when AWS experienced a four-hour outage, the downtime incurred losses of US$150 and US$160 million for the S&P 500 and financial services companies affected.

Screenshot of AWS outage (Source – Twitter)

So what can businesses do when their cloud service provider experiences an outage?

An outage by a cloud service provider pretty much cripples business operations, as was made evident during the recent outage. Online services were not available, bookings couldn’t be made, and information couldn’t be accessed – these are just some of the challenges consumers faced.

For businesses, the loss of income from the outage could now see them take a more serious approach to their backup and recovery operations. Here are several steps businesses can take to mitigate the impact and ensure business continuity:

  • Internal assessment – Organizations first need to evaluate the impact of the outage. This includes identifying critical services and applications that are affected and prioritizing their restoration.
  • Activate business continuity plans – When the main service provider is offline, businesses need to be able to switch to a backup such as alternative systems or processes. Some businesses can also consider having alternative cloud service providers as a backup.
  • Offline operations: Depending on the nature of the business, some companies can consider operating offline temporarily. Switch to manual processes or local servers if possible – like flight check-ins and food orders. This approach may not be viable for all businesses, especially those heavily reliant on real-time data or online services.
  • Monitor service updates: Stay updated with the cloud service provider’s announcements and status updates. Once the service is restored, gradually transition back to normal operations and validate the integrity of data. If there were not enough announcements, check online forums. Reddit and Twitter feeds can also help to understand the extent of the outage.
  • Communication: Keep stakeholders informed about the situation, including employees, customers, and partners. Transparency is essential, so provide regular updates through various channels, such as email, social media, or company websites. For example, fast food chains were quick to update their social media on the AWS outage.

At the end of the day, businesses need to be assured that they are capable of business continuity should there be any incidences like this in the future.