By contrast, when an engineer mistyped a command that took down the AWS S3 service - and many other services that depended on it, like Quora and file sharing in Slack - for several hours, Amazon's explanation avoided the phrase "human error" and concentrated on explaining the flaws in the tools and process that allowed the mistake to be made.
Lambert maintains that "human error doesn't really exist. Providing that you hire good people who want to do right thing, they will usually do the right thing. It's rare that you can say a person discarded the all good information they had and just did what they wanted and that's why we had this issue."
The real problem is tools and processes that don't prevent (or at least issue warnings about) the inevitable mistakes people make, or the lack of automation that means someone is typing in the first place.
"It's a lazy approach to say people did the wrong thing," says Lambert. "A better approach is to assume that everyone did right thing with the information they had, so you need to take away the blame and look at what information they had at each stage and what was missing, and what additional tools or processes you need to get better next time. "
Do reward reporting
The tale of a developer fired for confessing to deleting the production database on day one at a new job may be apocryphal, but the account on Reddit was certainly plausible and led many to point out that the fault lay not with the new developer, but with the documentation that included the details of the production database in a training exercise.
In contrast, when a chemistry student at the University of Bristol in the UK accidentally made an explosive and reported it, even though the emergency services had to carry out a controlled detonation, the dean of the Faculty of Science Timothy C. Gallagher, praised the student for acting responsibly. He pointed out "the value of investing in developing and fostering a culture in which colleagues recognise errors and misjudgements, and they are supported to report near misses."
In the airline industry, the International Confidential Aviation Safety Systems Group collects confidential, anonymous reports of near misses, cabin fires, maintenance and air traffic control problems to encourage full disclosure of problems. Similarly, when the US Forest Service conducts Learning Reviews after serious fires the results can be used only for preventing accidents, not legal or disciplinary action.
You want your team to feel safe enough to report the problems that haven't yet led to a failure.
"Whether formalized in a policy or not, the team must be well aware that mistakes are tolerated, but concealment and cover-up are not," says Burgess. "Personnel must clearly understand they will never be penalized for volunteering any and all information regarding any failure."
Sign up for CIO Asia eNewsletters.