We got a kick out of Tyler Vigen’s blog which demonstrates that mining data doesn’t mean a whole lot if you don’t know what you’re doing.
He looks at large databases to find nominal correlations between completely unrelated variables. Let’s see if you can come up with some theories to explain why divorce rates in Maine correlate perfectly with America’s per-capita consumption of margarine. Or why the number of deaths caused by falling into a swimming pool track very closely to the release of Nicholas Cage films.
The message, of course, is that you can’t just compare two sets of numbers and assume that there’s a relationship just because there’s a correlation. But we still wonder whether, if we think hard enough, we can come up with a valid theory to explain why the national revenue generated by skiing facilities correlates so well with the number of lawyers in Georgia.
Photo credit: http://www.tylervigen.com/