New data: old jokes

One of the joys of data science / Big Data / the quantitative turn* in my area is the chance to recycle ancient maths jokes. So, for instance, take this old chestnut:

Theorem: All positive integers are interesting.

Proof: Assume the contrary. If that is the case, there must be an integer x such that x is the smallest integer that is not interesting in any way. But that makes x a pretty interesting number – contradiction! Therefore all positive integers are interesting.

We can apply that to data dredging – the process of ploughing through a large dataset without much of an idea of what you’re looking for, in search of ‘interesting’ findings – along with some mischief about probability, and we get:

Theorem: All data dredging exercises yield interesting findings.

Proof: Assume the contrary. If that is the case, a data dredging exercise x could find no significant correlations despite testing many hundreds, if not thousands of potential correlations. But we’d expect  5% of correlations to be significant at the p < 0.05 level by pure chance. To look so hard and find so few would be extremely unusual. So that would be a pretty interesting finding from exercise x – contradiction! Therefore all data dredging exercises yield interesting findings.

May your data analysis in 2014 be at least that interesting.

* If nobody else has used ‘the quantitative turn’ as a phrase to characterise what’s happening with analytics at the moment, particularly in education, I am totally claiming bagsies. Although actually I’m more keen on an empirical turn than a quantitative one per se.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries.

%d bloggers like this: