While only a single-digit percentage of customers were impacted by the outage, the scope of AWS's customer base meant the situation impacted a large number of users. Customers such as Netflix, Instagram and Pinterest were among those impacted, including during prime-time Pacific Coast movie watching time for Netflix, which was partially down for portions between 8 and 11 p.m. PDT on Friday.
Netflix Cloud Architect Adrian Cockcroft, who has in the past praised AWS for powering the company's operations, filed somewhat of a play-by-play of the outage via his Twitter feed on Friday night and into Saturday. The company, he says, has architected to AWS's specifications and using multiple AZs. That didn't seem to work on Friday though. On Saturday, Cockcroft tweeted, "We only lost hardware in one zone, we replicate data over three. Problem was traffic routing was broken across all zones."
Shahin Pirooz, CTO and CSO of cloud provider CenterBeam, says AWS certainly shares some blame in this outage. "It seems like they had a house of cards that went down on them," he says. Pirooz says he's surprised so many systems went down at once for AWS. "Amazon failed, their failover systems failed, AWS does own some responsibility in this," he says.
One way to prevent this type of circumstance in the future, he says, is to leverage load balancer, domain name systems and disaster recovery offerings from third parties that are not AWS. A variety of companies offer such services, including New Start Systems, Akamai and DynDNS. The "nirvana" situation, he says, would be giving customers the ability to federate services across multiple public cloud providers. That, he predicts, is still five to 10 years away, though, because industry providers do not yet have common agreed-upon supportable migration standards.
OpenStack is attempting to create that with its project, but open source competitors like Citrix's Apache CloudStack are coalescing around AWS as being the de facto standard.
Sign up for CIO Asia eNewsletters.