One of the joys of data science / Big Data / the quantitative turn* in my area is the chance to recycle ancient maths jokes. So, for instance, take this old chestnut:
Theorem: All positive integers are interesting.
Proof: Assume the contrary. If that is the case, there must be an integer x such that x is the smallest integer that is not interesting in any way. But that makes x a pretty interesting number – contradiction! Therefore all positive integers are interesting.
We can apply that to data dredging – the process of ploughing through a large dataset without much of an idea of what you’re looking for, in search of ‘interesting’ findings – along with some mischief about probability, and we get:
Theorem: All data dredging exercises yield interesting findings.
Proof: Assume the contrary. If that is the case, a data dredging exercise x could find no significant correlations despite testing many hundreds, if not thousands of potential correlations. But we’d expect 5% of correlations to be significant at the p < 0.05 level by pure chance. To look so hard and find so few would be extremely unusual. So that would be a pretty interesting finding from exercise x – contradiction! Therefore all data dredging exercises yield interesting findings.
May your data analysis in 2014 be at least that interesting.
* If nobody else has used ‘the quantitative turn’ as a phrase to characterise what’s happening with analytics at the moment, particularly in education, I am totally claiming bagsies. Although actually I’m more keen on an empirical turn than a quantitative one per se.
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.