Data lakes should be managed as a highly valuable corporate asset, Hockenberry says. “In many cases, executives look at this as a ‘tech problem,’” he says. “However, a data lake should be seen as corporate IP [intellectual property] and if someone gains access to it, they could see strategic information that could affect shareholder value, compromise [research and development], and reveal plans and intentions that can create issues for a company.”
How to protect data lakes
appropriate access and authorization controls
- strong identity management
- audit processes
- robust and well-tested incident response plan
- deploying data encryption
The best way to address these issues is to understand what data the enterprise is collecting, how it’s being analyzed, protected and disseminated, Hockenberry says. Business, IT and security executives need to build data-centric risk management strategies to ensure information is protected no matter where it resides, he says.
Hackers, cyber criminals and other bad actors are sure to go after large data stores if they think there is something to gain from these resources and if they sense they are not adequately protected.
“Because of the data they contain, they may be seen as a great target—someone could steal much of the most important and sensitive data that a company owns by stealing the contents of a data lake,” Hockenberry says. As such, one of the biggest risks companies need to be aware of is ransomware, which brings the possibility of costly denial-of-service attacks. “The denial of use of corporate data can be far more damaging than simply stealing it,” he says.
The most important security functions with regard to data lakes are authorization and access. Research firm Gartner has warned companies not to overlook the inherent weaknesses of lakes. Data can be placed into a data lake with no oversight of the contents, Gartner analyst Nick Heudecker noted at the firm’s Business Intelligence & Analytics Summit last year.
Many data lakes are being used by organizations for data whose privacy and regulatory requirements are likely to represent risk exposure, Heudecker said. The security capabilities of central data lake technologies are still emerging, and the issues of data protection will not be addressed if they’re left to non-IT personnel, he said.
Many of the current data lake technologies on the market “don’t have fine-grained security controls that allow for multi-faceted control at the object level,” Hockenberry says.
The promise of data science and the data lake can only be realized by the free flow and joining of very large data sets. “This freedom creates opportunity, but is also harder to manage from a security perspective,” Hockenberry says. “Executives should ask questions about access, encryption, and tracking of data throughout its lifecycle in the enterprise.”
Sign up for CIO Asia eNewsletters.