Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to create a company culture that can weather failure

Mary Branscombe | Aug. 16, 2017
In technology, things go wrong all the time, sometimes catastrophically. But if you stop paying attention after you fix the immediate problem, you’re missing out on the benefit of learning from experience.


Do you change processes after handling an incident, or do you just carry on and wait for the next problem? Instead of dealing with individual failures, think about creating a culture in your IT department that can not only handle problems but truly learn from them.

Cloud providers are routinely better at learning from failure than most enterprises - because they have to be. It's critical that they are transparent about failures to keep the trust of their customers, but it also hits the bottom line if they take too long to solve problems. When AWS, Google, Azure or GitHub has a major outage, you'll see regular updates, and once the problem has been fixed, a public incident response will cover what changes are being made to make sure the same thing doesn't happen again.

For example, when an engineer at GitLab accidentally deleted the production database earlier this year (while trying to recover from load issues caused by spammers), the service was down for several hours. Worse still, nearly all of the backup tools GitLab was using turned out not to have been creating backups and six hours of production data across some 5,000 projects was lost. The engineers documented what was happening in real time (on Twitter, YouTube and in a shared public document), followed by a blog with the key details and a full post-mortem. This explained not just the sequence of what went wrong but also the misconfigurations and other complications that resulted in having no up-to-date backups,  giving them a clear list of the on-going changes that needed to be made.

Or consider Target's data breach, which did a lot more damage to the company. After discovering at the peak of the 2013 Christmas shopping rush that hackers had installed malware in their credit card terminals, the retailer found that the details of around 40 million debit and credit cards had been stolen, as well as names, addresses and phone numbers for up to 70 million customers. The data breach cost the business over $100 million in settlements with banks, Visa and a federal class action suit, and Target CEO Gregg Steinhafel resigned in 2014.

Fast forward three years, though, and "Target has become a role model for other retailers," Wendy Nather, former CISO and principal security strategist at security firm Duo, told

"They made a huge turnaround after their breach; they really built up their security program to the point where they really have a lot of transparency. They host security events. They were one of the organizations that helped found R-CISC, the Retail Cyber Intelligence Sharing Center. They really have led the charge to start exchanging threat intelligence amongst retailers."


1  2  3  4  5  6  7  Next Page 

Sign up for CIO Asia eNewsletters.