"How do I run a scalable version of my application stack. And that comes down something that we learned very early on: You've got to be able to silo your application stack, so that at any given time if you do have a failure - as you would in any environment - you can still continue to function beyond that failure."
If an instance dies, then you need to have enough additional capacity and supporting infrastructure, such as load balancers, to keep things going. "That's something that you've really got to engage with early on," Harrison says.
APIs are really Amazon's cloud's "killer feature," he adds. "They've got virtualization, great; they've got lots of different instance sizes, great. But they've got a way of actually programming for the platform, and that's the really amazing part."
"You can then tie that into how you do your scalable architecture," he explains. "So once you've established what your silos look like - once you've thought about your caching architecture and whether you want to shard or whether you want to cluster or whether you just want to use a really, really big instance - once you've established that, then you can actually start to automate. And that's where it really, really gets interesting. And a lot of fun."
Architecting for failure is nothing new for software engineers, Harrison says. If you're running your own physical hosts, you still have to deal with hardware failure. "The difference is, you also had to care about the power supplies, and disk failures, and the racking, and the cabling, and data centres," he says.
"You'd still have to think about what happens when that disk dies and you've got to worry about your EOLs on your hardware. And you've got to talk to your data centre guys and make sure they've got an extra diesel generator."
The difference with cloud is that although you need to design for resilience, "you don't have to worry about all that underlying stuff."
Instead you can spend more time working on the infrastructure itself, "without having to worry about the guts of it in terms of 'Ah okay, so we need to put in another order of Xeon processors' or whatever.
"If you take that out of the equation, what you actually get is the ability to spend more time on the patterns and shape of your infrastructure. And those patterns are really what let you scale."
Sign up for CIO Asia eNewsletters.