Integrated lifecycle management (ILM) faces a new frontier when it comes to big data. The core challenges are threefold: the sheer unbounded size of big data, the ephemeral nature of much of the new data, and the difficulty of enforcing consistent quality as the data scales along any and all of the three Vs (volume, velocity, and variability).
That's my takeaway from a recent article by Loraine Lawson. What she says is consistent with my general thinking on the topic. However, I disagree with her assertion that ILM "matters more" with big data than with smaller-scale data analytics environments. Keeping all of your business data assets secure, governed, and managed matters just as much in this new era as it ever did — no more, no less.
What has changed is that comprehensive ILM has grown more difficult to ensure in big data environments, given rapid changes in the following areas:
- New big data platform: Big data is ushering a menagerie of new platforms (Hadoop, NoSQL, in-memory, and graph databases) into enterprise computing environments, alongside stalwarts such MPP RDBMS, columnar, and dimensional databases. The chance that your existing ILM tools work out of the box with all of these new platforms is slim. Also, to the extent that you're doing big data in a public cloud, you may be required to use whatever ILM features — strong, weak, or middling — that may be native to the provider's environment. To mitigate your risks in this heterogeneous new world and to maintain strong confidence in your core data, you'll need to examine new big data platforms closely to ensure they have ILM features (data security, governance, archiving, retention) that are commensurate to the roles for which you plan to deploy them.
- New big data subject domains: Big data has not altered enterprise requirements for data governance hubs where you store and manage office systems of record (customers, finances, HR). This is the role of your established EDW, most of which run on traditional RDBMS-based data platforms and incorporate strong ILM. But these systems of record data domains may have very little presence on your newer big data platforms, many of which focus instead on handling fresh data from social, event, sensor, clickstream, geospatial, and other new sources. These new data domains are often "ephemeral" in the sense there may be no need to retain the bulk of the data in permanent systems of record.
- New big data scales: Big data does not mean that your new platforms support infinite volume, instantaneous velocity, or unbounded varieties. The sheer magnitudes of new data will make it impossible to store most of it anywhere, given the stubborn technological and economic constraints we all face. This reality will deepen big data managers' focus on tweaking multitemperature storage management, archiving, and retention policies. As you scale your big data environment, you will need to ensure that ILM requirements can be supported within your current constraints of volume (storage capacity), velocity (bandwidth, processor, and memory speeds), and variety (metadata depth).
Sign up for CIO Asia eNewsletters.