AWS US East Data Center DNS Failure Triggers Global Internet Service Paralysis: AI Platforms and Financial Systems Hit Hard

October 21, 2025
AWS
6 min

Summary

On October 20, 2025, a massive outage occurred at Amazon Web Services (AWS) US East Coast data centers, disrupting thousands of websites and applications globally for several hours. The incident had a widespread impact, affecting AI platforms like ChatGPT and Perplexity, financial services such as Robinhood and Venmo, and social applications including Snapchat and Signal. The outage stemmed from a DNS resolution issue in AWS US-EAST-1, with over 6.5 million outage reports received worldwide.


In the early hours of October 20, 2025, Amazon Web Services (AWS), the world's largest cloud service provider, experienced a severe outage, causing widespread internet service disruptions. This incident once again highlighted the risks of modern digital infrastructure's over-reliance on a single cloud vendor.

Outage Timeline and Scope of Impact

According to the AWS Health Dashboard, the outage was first reported at 12:11 AM ET (12:11 PM Beijing Time) on October 20, primarily affecting AWS's US-EAST-1 data center located in Northern Virginia.

In the initial stages of the outage, AWS confirmed "significant error rates" and latency issues across multiple services. By 1:26 AM ET, the company confirmed that the problem was related to a DNS resolution failure for its DynamoDB database service. The DNS system, responsible for translating website domain names into IP addresses, failed, preventing numerous applications from properly connecting to AWS-hosted databases.

By 3:35 AM ET, AWS announced that the core DNS issue had been "fully mitigated," but service recovery efforts continued until approximately 6:00 PM ET. The entire outage lasted over 17 hours, with some services experiencing intermittent issues into the afternoon.

AI Services and Financial Platforms Hit Hard

The outage significantly impacted artificial intelligence services. OpenAI's ChatGPT experienced Single Sign-On (SSO) problems, preventing users from logging in. Aravind Srinivas, CEO of AI search engine Perplexity, confirmed on social media platform X: "Perplexity is down now, root cause is AWS issues. We are working on it."

FinTech platforms also suffered major disruptions. Mobile payment app Venmo, digital bank Chime, cryptocurrency exchange Coinbase, and stock trading platform Robinhood all reported service interruptions. Customers of several UK banks reported being unable to make card payments, with Bank of Scotland apologizing to customers on social media.

Social, Gaming, and Education Sectors Fully Affected

Social media and communication applications experienced widespread outages. Snapchat users continuously encountered technical problems, and Meredith Whittaker, President of encrypted messaging app Signal, confirmed service disruptions were linked to the AWS outage. Video conferencing platform Zoom, collaboration tool Slack, and design platform Canva all reported connectivity issues.

The gaming industry was not spared. Popular games Fortnite, Roblox, Pokemon GO, and the Epic Games Store all reported login and connectivity failures. Canvas, an online learning platform used by thousands of universities and K-12 schools in the US, was inaccessible due to the outage, displaying an "AWS ongoing event" warning until 2:30 PM ET, affecting students' ability to submit assignments and access course materials.

Smart Devices and Enterprise Services Halt

Amazon's own smart assistant, Alexa, became completely unresponsive, preventing users from voice-controlling smart home devices. Services like Ring smart doorbells and Amazon Prime Video also experienced problems. Self-check-in systems at New York's LaGuardia Airport went down, leading to long queues of passengers.

UK government websites, including HM Revenue & Customs (HMRC) and the official government website, experienced access issues. Ride-hailing service Lyft, food delivery app McDonald's, and dating app Hinge were among hundreds of other services affected.

According to outage tracking website Downdetector, over 11 million outage reports were received globally, with peak daily reports exceeding 50,000.

Technical Root Cause and Recovery Process

AWS later disclosed in updates that the root cause of the outage was a problem with an "underlying internal subsystem responsible for monitoring network load balancer health." The failure of this core component triggered a chain reaction, first leading to DynamoDB's DNS resolution failure, and subsequently affecting the launch of EC2 (Elastic Compute Cloud) instances.

By 8:43 AM ET, AWS stated it had "narrowed down the root cause of the network connectivity issues." To prevent further load, the company implemented throttling measures for new EC2 instance launch requests. During the recovery process, AWS gradually lifted the throttling, but the backlog of requests for the Lambda serverless computing platform required additional time to process.

In its final update at 6:00 PM ET, AWS confirmed: "Services have returned to normal operation," and stated that EC2 instance launch throttling had returned to pre-event levels.

Industry Reaction and Warning

Cybersecurity expert Christian Espinosa noted: "This massive outage affecting AWS and major UK platforms is a stark reminder of the astonishingly fragile foundations upon which our digital world is built. Cloud service concentration – where a few providers host most critical systems – creates single points of failure. When one data region or provider goes down, the ripple effect impacts everything from retail and finance to logistics and communications."

Mehdi Daoudi, CEO of internet performance monitoring company Catchpoint, stated that the economic cost of the outage was yet to be assessed but was likely "immense."

During the outage, Tesla CEO Elon Musk posted mocking content on platform X, emphasizing that his social platform was unaffected, and shared satirical memes targeting Amazon founder Jeff Bezos.

AWS holds approximately 30% of the global cloud computing market share, forming a triopoly with Microsoft Azure and Google Cloud. This incident occurred in AWS US-EAST-1, one of the critical hubs for global internet traffic. Analysts pointed out that many enterprises had not adequately implemented cross-region or cross-cloud vendor redundancy mechanisms, leading to an amplification of the single point of failure's impact.

This was not the first major outage in AWS US-EAST-1. In 2020, 2021, and 2023, the region experienced incidents that led to widespread service disruptions.

Future Outlook

This outage is expected to accelerate enterprises' transition towards multi-cloud and hybrid cloud strategies to mitigate the risks of relying on a single cloud service provider. Industry insiders predict a potential increase in business interruption insurance specifically for cloud service outages.

AWS stated it would conduct a comprehensive investigation into the incident and committed to improving system redundancy and disaster recovery mechanisms. As of the evening of October 20 ET, all services had returned to normal, but this hours-long global disruption once again sparked discussions about the risks of over-centralization in internet infrastructure.