Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to derive real, actionable insights from your data lake: Five best practices

Keith Kohl, VP of Product Management, Trillium | April 27, 2017
By taking steps to ensure the quality of assets within the data lake, organizations can prevent their lakes becoming data swamps


Don’t look at data integration and data quality as sequential steps. Wherever possible, it’s best to cleanse the data as it’s ingested into the data lake.

Ensuring data is of good quality while simultaneously ingesting it hourly, daily or weekly will save time and frustration later on. While this is true in most cases, there are exceptions, especially when looking for duplicate customer information that may already exist within the lake.


Enlist the help of third party databases to find and add the missing information to create a single view of customers.

Organizations want to use data lakes to create a single, 360-degree view – whether for marketing purposes or otherwise. But common “dirty data” issues, like duplicate records or mismatched email addresses, detract from the efforts and ROI of the entire data lake initiative. One way to add missing information is through third party databases, creating a complete picture of a customer. To get the most out of these third parties, ensure you select a partner that has both a world-wide view of data and expertise in targeting both B2B and B2C.

For a complete, accurate and detailed view of prospects and customers, companies should consider adding the following data:

  • Email Services to confirm global email addresses are valid, active and deliverable to target accurate and usable email addresses.
  • Phone Services to ensure a phone number is valid, in service, and matches the subscriber name.
  • IP Services to identify the location of an IP address and whether it’s a proxy.
  • Address Services to determine whether a postal address is a P.O. Box or a single- or multi-unit dwelling.

By taking steps to ensure the quality of assets within the data lake, organizations can not only prevent their lakes becoming data swamps, but can truly reap the benefits of the troves of data available to them. By harnessing all types of data – from legacy to newer sources – companies can rest assured their decisions are being based on complete, enterprise-wide data sets, and that they have a complete view of their customers.


Previous Page  1  2 

Sign up for CIO Asia eNewsletters.