Here, data cleansing and mastering are critical to big data success. "Contrary to some beliefs, this requirement does not go away," says Joe Caserta, founder and CEO of Caserta Concepts, a data management and big data consulting firm. "If the big data paradigm is to become the new corporate analytics platform, it must be able to align customers, products, employees, locations, etc., regardless of the data source."
In addition, known data quality issues that have long jeopardized credibility of data analyses will have the same impact on big data analytics if not properly addressed, he says.
On a typical big data project, data management is often "deprioritized" by development staff and can go unresolved, DRC's Chabot notes. Effective data management involves ensuring mature techniques -- process and automation -- are put in place to address model management, metadata management, reference data management, master data management, vocabulary management, data quality management, and data inventory management, he says.
People are discovering what works and what doesn't when it comes to managing big data and analytics. When they are employed by the same organization, why not share this knowledge?
One way to do this is by creating a big data COE (center of excellence), a shared entity that provides leadership, best practices, and in some cases support and training.
Typically, COEs have a dedicated budget and are designed to analyze issues; define initiatives, future state, and standards; train users; execute plans and maintain progress, says Eliot Arnold, co-founder of Massive Data Insight, a consulting firm that specializes in big data and analytics programs. Getting a COE started requires an audit of available resources and a senior executive sponsor, he says.
While a big data COE is a good idea on paper, its effectiveness will be determined by how well it's implemented in practice, DRC's Chabot says.
There are many basic challenges with a COE covering the entire data lifecycle, Chabot says, including authoring and identifying the best practices; vetting them in a nonbiased fashion; properly documenting their applicability; overseeing their adoption; and modernizing them over time.
DRC has defined a big data maturity level similar to the CMMI (Capability Maturity Model Integration), a process improvement framework used by organizations. The big data maturity-level models map out relevant best practices. These are divided into four groups: planning/management, project execution, architecture, and deployment/runtime/execution, for organizations to incrementally adopt over time. This avoids the pitfalls of trying to be too sophisticated too quickly, Chabot says.
Big data is a business initiative, not just a technology project, so it's vital that business and IT leaders are on the same page with planning, execution, and maintenance.
Sign up for CIO Asia eNewsletters.