Business data is growing so fast that the task of managing it all is becoming nearly as complicated as indexing the Web, and new technologies are needed to help enterprises cope.
That's the message from Microsoft researcher Andrew Conrad, who is leading the company's "Project Barcelona" to create a metadata information server to help businesses "understand and facilitate management of data across the enterprise." The project will provide crawlers to extract metadata from Microsoft products and an index server with an API to allow querying.
Introducing Project Barcelona earlier this month, Conrad compares the vast web of enterprise data with the World Wide Web.
Business data is expanding so fast that it's becoming almost as complicated for enterprises to manage it as it is to index the Web.
"The modern Web is vast and decentralized topology of websites and services connected via an almost infinite amount of links," Conrad writes. "Fortunately as the Web has grown more complex, tools for understanding and leveraging the Web have kept pace."
Web crawlers index the Web, helping us discover sites and information through search engines "that we could not possibly find outside of random chance," he notes, adding that "by contrast, as the modern enterprise has trended towards becoming more Web-like, the tools for understanding and leveraging the enterprise data topology have been almost nonexistent."
Although relational databases have become the "corporate standard for storing data," Conrad says several trends have made the current model inefficient. These include the low cost of acquiring and storing data, the ease with which data can be moved and changed, proliferation of self-service technologies such as databases and Web portals, leading non-developers to build and maintain data-producing services, and "Excel hell," what Conrad calls "the great proliferation of Excel (and Access, SharePoint) as the enterprise data management tool."
These trends have led to big productivity gains but also "made even the simplest DBA and ETL developer tasks increasingly complex and error prone," he says. "On top of that, it is almost impossible for information workers to know anything about enterprise data outside of their specific data silos."
The Microsoft team working on a solution to these problems is jokingly calling it the "Marauder's Map," after the magical map in "Harry Potter" that shows the location of every person in Hogwarts.
Project Barcelona will provide multiple crawlers for Microsoft products, including SQL Server, Excel, SharePoint and others that will extract metadata and "enterprise dataflow information" for indexing in the Barcelona Index Server. Some sources can't be crawled, and in those cases Barcelona "will provide a declarative way of describing the metadata and dataflow information."
Sign up for CIO Asia eNewsletters.