"Data discovery occurs in three parts - capture, curation and analysis," he said. "Analysis is where we need to do more work as the progress is slow. We don't do a good job at curation, which is 'cleaning the house' the boring part."
"Gartner last year said 90 percent of deployed data lakes - or reservoirs - will be useless through to 2018," Brobst said. "Managing what data goes in to avoid duplication and inefficiencies is a problem of governance. One of the most neglected aspects of curation is the keeping track of the provenance (aka audit trail, lineage, or, pedigree) of both the internal and external data - as well as the manipulations used to create derived data assets."
He said startups in Silicon Valley were developing a new generation of metadata tools during the last six months to help reinvent metadata and continue the move to automation, which was the key to scalability. "We are underinvesting in curation. Curation will become more complicated as more data comes in from other sources, which may be uncontrolled, such as social media data."
"You can't control the process outside your enterprise," said Brobst. "There will be dirty or mangled data such as from external sensors through wireless communication."
"A single system is better but world is more complicated now with these different data types and locations and multiple technologies," he said. "Teradata will now have to help you with handling the interoperability with other platforms and locations with software such as QueryGrid, co-developed with the SAS Institute, and platforms such as Unified Data Architecture, which encompasses the data reservoir, integrated data warehouse, discovery platform and real time stream processing."
"The bottom line is there is now a shift in the industry to where data is being created and collected," Brobst said. "Tackling multiple locations will need right ecosystem. As data shifts to cloud (such as sensors and social channels), this may mean shifting analytics to the cloud. Analytics follows the data."
Sign up for CIO Asia eNewsletters.