One of the joys of data science / Big Data / the quantitative turn* in my area is the chance to recycle ancient maths jokes. So, for instance, take this old chestnut:

**Theorem**: All positive integers are interesting.

**Proof**: Assume the contrary. If that is the case, there must be an integer *x* such that *x* is the smallest integer that is not interesting in any way. But that makes *x* a pretty interesting number – contradiction! Therefore all positive integers are interesting.

We can apply that to data dredging – the process of ploughing through a large dataset without much of an idea of what you’re looking for, in search of ‘interesting’ findings – along with some mischief about probability, and we get:

**Theorem**: All data dredging exercises yield interesting findings.

**Proof**: Assume the contrary. If that is the case, a data dredging exercise *x* could find no significant correlations despite testing many hundreds, if not thousands of potential correlations. But we’d expect 5% of correlations to be significant at the *p* < 0.05 level by pure chance. To look so hard and find so few would be extremely unusual. So that would be a pretty interesting finding from exercise *x* – contradiction! Therefore all data dredging exercises yield interesting findings.

May your data analysis in 2014 be at least that interesting.

* If nobody else has used ‘the quantitative turn’ as a phrase to characterise what’s happening with analytics at the moment, particularly in education, I am totally claiming bagsies. Although actually I’m more keen on an empirical turn than a quantitative one per se.

–

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.

No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

### Like this:

Like Loading...