The Index Server will cache all the harvested data and "expose an API for querying, augmenting, and annotating the metadata and dataflow information." There will also be tools for administrators to manage the crawlers and Index Server, and database administration tools to handle advanced tasks.
Instead of a centrally controlled metadata repository, the Project Barcelona "overall design embraces the decentralized and web-like nature of the modern enterprise," Conrad writes.
Conrad declined an interview request, saying, "We anticipate being able to do those once we firm up release plans," which should be in late summer. Conrad also said his team will answer technical questions on the Project Barcelona blog and Twitter feed.
The project team will also seek community input through a series of technology previews.
"Although we are designing the first iteration of the product to be a DBA/ETL developer solution, we believe that the long term value will grow significantly beyond this," Conrad writes. "Hence, from the start, the base platform for the product will be completely open. For example, developers can plug in their own crawlers or metadata providers. They can also access the harvested metadata and dataflow information via the query API. Finally, we will support metadata augmentation and have rich annotation support (both crawler support and via server API) which will allow producers and consumers of the system to leverage the crawlers and Index server in ways we haven't even thought about."
Microsoft is also tackling the big data problem with new data warehousing appliances using SQL Server.
Sign up for CIO Asia eNewsletters.