"You need to make every piece of application data open, encourage people to share data, to share graphs," he says. "The more deeply the team understands the application as a whole, the [fewer] defects you'll have."
Testing in Production: An Idea That Just Might Work
Microsoft's Karthik Ravindran suggests running tests in production continuously, in order to catch errors immediately after release.
Karthik Ravindran, a director of product management at Microsoft who works on Visual Studio, believes the fix is to consider production just another part of the lifecycle and to keep testing. Microsoft's Azure service includes the Global Services Monitor, which lets programmers take automated tests, register them with Azure and run them in production all the time. When a service goes down, the operations team can be notified immediately. It takes Keyes' idea of merging testing and monitoring one step further.
The last piece of the puzzle may be testing those pesky configurations: The data files that are (and should be) different between test and production. Some open source tools, including Puppet, do have the ability to write automated tests, but most companies keep those tests internal. As a result, there's no great body of open source examples.
A new generation of companies is emerging to provide configuration testing tools, though. Scriptrock, for example, provides a visual layer to design infrastructure, along with the tools to write tests against that infrastructure. You might write a test against a config file, then run that test to make the sure files that should not change will not change with a new rollout.
Between limiting risk with a strangler pattern, improving reaction and fix time, and testing in production with synthetic transactions, tools do exist to reduce risk in production. The challenge, as always, is to make good choices. Having a rich collection of options won't solve the problem-but it's a nice place to start.
Sign up for CIO Asia eNewsletters.