Information management architectures have been evolving constantly in the past decades. In the early days of computers, data was at the core of mainframe systems, and programs were essentially designed to process specific data sets (the official terminology then was "data processing," not "information technology"). Working in a data center was the place to be.
The advent of the so-called open systems, client-server, n-tier architectures shifted the emphasis toward the application and the business logic, and the data itself started to be perceived primarily as a resource for the application. This transition from a data-centric world to an application-centric world created a perception of data as a commodity, without much intrinsic value. Applications had become trendy.
What was this commoditised data? Primarily transaction records: purchases, shipments, accounts, contracts, and so on. Little attention was paid to consolidation or to consistency. For example, you could have five different contracts with an insurance firm, but they still would not have a consolidated view of these five contracts or a single record of you as a customer. Applications, built on top of the commodity data, would perform the necessary calculations on an ad-hoc basis.
Still, this consolidated view of contracts, this single customer record, was needed. In an application-centric world, the answer was -- quite logically -- application-based. Business intelligence, master data management projects were based on the deployment of new applications and new business processes, with their own data resources: data warehouses, master data hubs. These applications catered to reporting needs, to marketing needs such as customer segmentation, and quite often became critical parts of the company's IT infrastructure. However, only in rare cases did they close the loop and actually resolve the siloed and fragmented nature of enterprise data in an application-centric world.
Today, the pendulum is swinging back. Purchase records become infinitely more valuable when enriched with navigation logs, cart abandonment data, social media posts, and sentiment analysis. Manufacturing records get a precision boost from production chain sensors, road traffic and weather information, and customer support systems.
On the flip side, this massive influx of new sources of data makes it impossible for applications to assume ownership of all this data anymore. A new paradigm is needed, which put emphasis (back) on data.
Enter big data, and especially the associated data infrastructure: Hadoop, the next-generation data platform, and its derived "architecture," the infamous data lake. Why these quotation marks and derogatory adjective? Because industry scholars consider the data lake to be more a data dump than a proper architecture.
They are right: It's not the lake that counts, it's what you do in it. The data lake provides a venue to deploy a new data-centric architecture. Properly handled, the data lake becomes a reservoir, a pool of resources for applications, where these applications come and retrieve all this data, the data that they are unable to handle and own directly.
Sign up for CIO Asia eNewsletters.