In these heady days of big data, a lot of organizations treat data collection like a Pokémon game: Gotta catch it all. But Dane Atkinson, CEO of cross-platform marketing analytics specialist SumAll says most organizations need to think wide, not big, when it comes to data.
"The real impact of data always comes from the intersection of different data sets," Atkinson says. "You don't get into the earth shattering discoveries until you start to link disparate data sets."
To illustrate this idea, Atkinson points to oceans and their tides. To understand how tides work upon oceans, you need to understand the correlation between oceans and the moon.
"You can't look in isolation and find causes," he says.
Moreover, when you have data of appropriate width (i.e., enough disparate data sources), the volume of data doesn't necessarily have to be large to provide you with effective results. For example, through SumAll.org, a nonprofit dedicated to using data for social good, SumAll is helping New York City and nonprofit organization CAMBAcombat homelessness in a pilot program.
Eviction notices are among the primary signifiers that a family is about to become homeless — though not all evictions lead to homelessness. About 200,000 households get evicted in New York City each year. In big data terms that's not an exceptionally large number of records. But identifying which of those 200,000 are most at risk of becoming homeless as a result of eviction proceedings is a challenge.
Before SumAll got involved, CAMBA, which focuses its efforts in Brooklyn, would manually go through the list of roughly 5,000 new eviction cases in Kings County Housing Court each month and then send letters about its services to those in the areas they serve — about 400 a month. With SumAll's help and some targeting techniques borrowed from data-driven marketing, CAMBA was able to narrow the list considerably.
First, it geo-coded all the cases to determine which were in neighborhoods CAMBA served. Then it went "wide" with its data, pulling in data from different data sets that indicated a family was "at risk" — past experience with the shelter system, past experience with the foster care system, education level, employment status and age. By correlating these disparate data sets, SumAll was able to help CAMBA identify the 30 to 50 most at risk cases. CAMBA, in turn, was then able to leverage its resources more efficiently to help those families.
The end result was that CAMBA was able to provide 50 percent more families in the pilot neighborhood with eviction prevention services.
"It really is the power of wide data, of seeing correlations in spots that have never been connected before," Atkinson says.
Sign up for CIO Asia eNewsletters.