One of the joys of data science / Big Data / the quantitative turn* in my area is the chance to recycle ancient maths jokes. So, for instance, take this old chestnut:

Theorem: All positive integers are interesting.

Proof: Assume the contrary. If that is the case, there must be an integerxsuch thatxis the smallest integer that is not interesting in any way. But that makesxa pretty interesting number – contradiction! Therefore all positive integers are interesting.

We can apply that to data dredging – the process of ploughing through a large dataset without much of an idea of what you’re looking for, in search of ‘interesting’ findings – along with some mischief about probability, and we get:

Theorem: All data dredging exercises yield interesting findings.

Proof: Assume the contrary. If that is the case, a data dredging exercisexcould find no significant correlations despite testing many hundreds, if not thousands of potential correlations. But we’d expect 5% of correlations to be significant at thep< 0.05 level by pure chance. To look so hard and find so few would be extremely unusual. So that would be a pretty interesting finding from exercisex– contradiction! Therefore all data dredging exercises yield interesting findings.

May your data analysis in 2014 be at least that interesting.

* If nobody else has used ‘the quantitative turn’ as a phrase to characterise what’s happening with analytics at the moment, particularly in education, I am totally claiming bagsies. Although actually I’m more keen on an empirical turn than a quantitative one per se.

–

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.

No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.