Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Who gets blame for Amazon outage?

Patrick Thibodeau | April 26, 2011 has promised to provide a "detailed post-mortem" on the root causes of the prolonged outage of its cloud services in recent days. Users of the Amazon services, meanwhile, may also have to explain how they got caught up in the outage.

Users see the promise of cloud technology as a way to reduce costs and be greener, but "that [also] means concentrating processing in fewer, bigger places," said Brill. Thus, when something goes wrong, "it has a bigger impact."

Meanwhile, the promise of reliable cloud uptime is putting protection advocates -- the IT people who champion more internal reliability and safeguards -- at a disadvantage, he added. "There will always be an advocate for how it can be done cheaper, [but] if you haven't had a failure for five years -- who is the advocate for reliability?

"My prediction is that in the years ahead, we will see more failures than we have been seeing, because people have forgotten what we had to do to get to where we are," Brill added.

AppNeta runs its company on Amazon's cloud technology and was affected by the outage. However, its problems where short-lived because its service is architected to respond to a data center failure in Amazon's cloud.

Matt Stevens, chief technology officer at AppNeta, said its system was able to fall back to an alternative availability zone in another data center in Amazon's cloud.

"You still need to plan for worst-case scenarios," said Stevens, who noted Amazon advises its customers to plan for a potential data center interruption. "It was actually their guidance that helped us [prevent] this from being more painful."

Amazon has built the system with multiple levels of disaster recovery, including a design for high availability across virtual infrastructures within a zone, such as the ability to fail over between servers, as well as planning to fail over to another data center, as AppNeta did.

AppNeta has redundant mirroring of its data in Amazon's S3 storage service, which allowed the company to pull that data into a second data center. AppNeta's problem was limited to a couple of hours Thursday morning, said Stevens.

He believes that Amazon's outage will cause people to step back and ask some question about their internal architectures, as well as consider whether to adopt a multicloud strategy to help mitigate the risk. "That's certainly got to be top of mind for a lot of CIOs today," Stevens said.



Previous Page  1  2 

Sign up for CIO Asia eNewsletters.