Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Real-world devops failures -- and how to avoid them

Bob Violino | June 28, 2016
To deliver on the promise of devops, heed these hard-earned lessons of devops gone wrong

Too much accessibility -- not enough education

Back in 2006, when professional content sharing website SlideShare (now part of LinkedIn) was a small startup with fewer than 20 employees, it launched a devops model to speed processes and stay ahead of its competition.

"The [development] team was actually split between San Francisco and New Delhi, and the infrastructure was quite complicated," says Sylvain Kalache, co-founder of Holberton School, an institution that trains software engineers, who worked at SlideShare at the time.

The goals of devops were to achieve maximum efficiency within the engineering team and to spread technical knowledge as much as possible, so that if someone went on vacation or left the company, there would be limited impact.

"Working in a devops environment pushes every contributor to work and contribute to different parts of the product," Kalache says. "Having a cohesive team is super important, and this happens by making people interact and help each other."

One of the main ideas behind devops is a greater sense of ownership of work responsibilities, "and for that you need to give access to part of the infrastructure that developers do not generally have access to," Kalache says. While working at SlideShare, engineers had access to production servers and production databases.

A software engineer was working on a database-related project and trying out a tool that offered the ability to explore a MySQL database graphically. "He decided to reorganize the database columns' order in that tool so that the data would make more sense to him," Kalache says. "What he did not know was that it was also actually changing the columns' order in production on the actual database, locking it, which brought down SlideShare.net."

When it happened, the person responsible did not realize that the tool was actually performing actions. It took 15 minutes of collective effort to figure out the source of the problem.

"There were two takeaways from this failure," Kalache says. "First, while devops is pushing for everyone to have an impact on any step of the product/service cycle, [it's] good practice to take a step back every time you give access to something and make sure it is actually valuable. In this specific situation of the database outage, we realized that giving access to production data was actually not useful at all and was very dangerous. The developer could have extracted the same exact value by using a staging database, but with a much more minor impact on the company."

The second takeaway is to better educate developers on the workings of infrastructure. "Many of them have never been exposed to production infrastructure," Kalache says. "Devops is based on a way of working, which obviously is more about human interaction. You can't expect everyone to naturally know ‘the hidden rules.' That's why onboarding is mandatory and critical."

 

Previous Page  1  2  3  4  Next Page 

Sign up for CIO Asia eNewsletters.