But with the right data, you can statistically recreate an experiment, Dalessandro says, and his experience in the advertising world equipped him with the skillset to do just that, only a few years ago, his team at m6d figured out how to estimate the causal impact of ads by analyzing impression logs. But approaching the city's problem wasn't cut-and-dried. After all, while the city had been collecting lots of data, it had been collecting it for reporting purposes, not for actionable insight.
Data Collection Is a Key Issue
"Their data had not been designed in a relational way," Dalessandro says. "They weren't really thinking about joining the data sets together."
For example, the data sets had different levels of granularity. Data on past pruning work was recorded on a block-by-block level, while data on clean ups was recorded at the address level.
"One of the biggest challenges of this is determining the fundamental unit of analysis," Dalessandro says. "As a statistician, you divide the world into entities. What would be the equivalent of a single row? They don't give a unique identifier to every tree. It's a balance between having the data as granular as possible while also having the greatest coverage possible."
Eventually, they settled on the city block as the fundamental unit of analysis. With the blessing of m6d's CEO, Dalessandro devoted a few work hours here and there to downloading, cleaning, merging and analyzing the data. He was even able to use the firm's high-powered server infrastructure to run some intensive modeling. He found the answer to the city's question: pruning trees for certain types of hazards caused a 22 percent reduction in the number of times the department had to send a crew for emergency cleanups.
"A block that is pruned this year will have 22 percent fewer hazardous problems the following year," Dalessandro says. "We were told this is the first time this number has ever been generated."
Using Analytics to Build a Risk Profile
While an important first step, this number is only a beginning. After all, the city already has a pruning program in place. But even a city as large as New York doesn't have the resources to prune every block every year. The department has to choose which blocks to target for pruning.
"The first thing is to create a baseline so the Parks Department can work with supervisors to determine how much of their resources they can allocate," Dalessandro says. "The second phase will be intelligent pruning. Part of my vision for this is I want to help them set up the analytics on their end so they can start asking different questions and getting the answers themselves. That's all part of building the risk profile of a block: the number of trees, types of trees, whether the block is in a flood zone or storm zone. These are all the types of questions that can be answered."
Sign up for CIO Asia eNewsletters.