Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to explain to Big Data newbies why correlation doesn't equal causation

Mark Gibbs | June 23, 2014
It's easy to assume that because two data sets appear to be linked, they are.

With the explosion of interest in Big Data everyone in every department is looking for actionable intelligence. That's great but there's a downside: Trying to explain to, say, your VP of sales that the sales of barbecue sauce might appear to be connected to the selling price of beef but you can't say that's true for certain and that it would be inadvisable to act on that conclusion without deeper analysis.

"What?!" she'll say. "I can see with my own eyes that they curvey things go up and down together." "Ah" you can reply, "let me show you something ..." so you show her the Spurious Correlations web site.

This site is a treasury of examples that demonstrate, very clearly, that correlation does not prove causation. For example, the correlation between US spending on science, space, and technology and suicides by hanging, strangulation and suffocation is a remarkable 99.2% yet no one in their right mind would says that one causes the other.

Correlation Example

Similarly, the per capita consumption of cheese in the US correlates 94.7% with the number of people who died by becoming tangled in their bedsheets and is just as easily rejected as not causative even though there's a very high degree of correlation.

Published by Tyler Vigen the Spurious Correlations site currently contains 27,724 correlations many of which are very amusing (for example, the marriage rate in New York has an 87.9% correlation with murders by blunt objects) and Tyler's mini-lecture on correlation and causation is worth putting in front of the unwashed to get 'em up to speed.

 

Sign up for CIO Asia eNewsletters.