An internal hack
If the AWS cloud is impacted then the company tries to hack itself.
"We generate a test scenario to determine if we can trigger the vulnerability," Schmidt said. Then, extensive testing is done to determine if the vulnerability has been used against AWS.
Meanwhile, other teams of security engineers are already building a patch and testing it across all the variants of Xen that AWS runs to ensure it meets security and performance requirements.
Sometimes the process of installing the patch requires a reboot, as it has twice in the past half-year. Just like on a common PC, some updates and patches require a reboot and others don't. The majority of patches AWS implements do not require a reboot; AWS has architected its system to minimize the reboots necessary to patch its services.
"We try very hard not to reboot," Schmidt said. If Schmidt's team finds it "technically infeasible" to install the patch without a reboot, then it notifies customers which services will be restarted.
The dreaded reboot
"It was very straightforward," Schmidt said, referring to the September issue. "We couldn't find a way to patch the service without rebooting, so we had to do it."
Complicating efforts in situations like this is the fact that AWS has to inform customers that some of their EC2 instances need to be rebooted, but they can't say why. AWS can't announce the vulnerability to the world and expose itself or other Xen users.
Customers should be ready for a reboot at any time though and there are steps users should take to ensure their systems can withstand a reboot or VM failure. One is to design their systems to be stateless so that if there is a reboot or a VM failure that the application fails over to healthy VMs without skipping a beat.
Back in September Network Worldspoke with a handful of AWS users and most survived the reboot without a major issue. Born-in-the-cloud apps tend to be resilient to failure; legacy apps that have been migrated tend to have more trouble.
Schmidt said AWS is always looking to improve its services: both technically to ensure it doesn't have to reboot VMs, and it is working to keep customers better informed. Part of that process includes sponsoring academic research, including some leading studies into how Xen servers can be hot-patched without requiring a reboot.
Sign up for CIO Asia eNewsletters.