New data: old jokes

One of the joys of data science / Big Data / the quantitative turn* in my area is the chance to recycle ancient maths jokes. So, for instance, take this old chestnut:

Theorem: All positive integers are interesting.

Proof: Assume the contrary. If that is the case, there must be an integer x such that x is the smallest integer that is not interesting in any way. But that makes x a pretty interesting number – contradiction! Therefore all positive integers are interesting.

We can apply that to data dredging – the process of ploughing through a large dataset without much of an idea of what you’re looking for, in search of ‘interesting’ findings – along with some mischief about probability, and we get:

Theorem: All data dredging exercises yield interesting findings.

Proof: Assume the contrary. If that is the case, a data dredging exercise x could find no significant correlations despite testing many hundreds, if not thousands of potential correlations. But we’d expect  5% of correlations to be significant at the p < 0.05 level by pure chance. To look so hard and find so few would be extremely unusual. So that would be a pretty interesting finding from exercise x – contradiction! Therefore all data dredging exercises yield interesting findings.

May your data analysis in 2014 be at least that interesting.

* If nobody else has used ‘the quantitative turn’ as a phrase to characterise what’s happening with analytics at the moment, particularly in education, I am totally claiming bagsies. Although actually I’m more keen on an empirical turn than a quantitative one per se.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.


Author: dougclow

Academic in the Institute of Educational Technology, the Open University, UK. Interested in technology-enhanced learning and learning analytics.