@stephenfry 20x better than BBC News

As I mentioned yesterday, we launched the Evolution Megalab in the UK yesterday.  It was on the Today programme (weekly audience around 6.5m), on the BBC News website (weekly audience around 14m), and on various regional broadcasts including BBC Scotland. We were hoping for a bigger splash on the BBC – we promised them an exclusive in return – but they cancelled the larger-scale broadcasts at the last minute.  No matter – it was a big trad-media splash, and kept at least four-and-a-half people at the OU busy most of the day (Kath the media contact, Jonathan and Jenny the media faces, Richard the programmer, plus me and others spending some time).

We got about 2400 unique visitors as a result.   Which is by far our busiest day … yet.

(We got about 1000 when we launched similarly on German national TV last month.)

Devolve Me is a related project – which I had nothing to do with – and is a bit of silliness that lets you morph a photo of yourself (or a loved one, or indeed a hated one) to look like an early hominin.  The site was pootling around nicely at about 1500 hits a day, and then a certain Stephen Fry tweeted:

Indebted to @iRobC once more: See how you’d have looked as an early human – OU site http://tinyurl.com/9tacnt #devolve Coolissimo

… and it got 52,500 hits. As our press release about this points out,

The spike in traffic to the OU website illustrates the growing influence that social media is having in today’s communications, with people increasingly sharing links and sourcing their news feeds online.

A single tweet by a single power user on a single social network gets you more than twenty times more exposure than mass broadcasts to tens of millions.

Cut another way, of the order of 1 in 1000 people who heard about the Evolution Megalab via the BBC visited the site, but about 1 in 7 people who saw Stephen Fry’s tweet visited that site. (He has over 360,000 followers at the time of writing.)

It is, of course, all about audience and targetting. I’d bet the majority of people following Stephen Fry on Twitter would be mildly interested in a cool website about evolution; and I’d bet that the overwhelming majority of the Today audience isn’t.

I suppose I shouldn’t be surprised, but I am – at least at the scale of the difference.  The old ways of getting messages out are being superseded while we watch. Sometimes dramatically.

Evolution Megalab launches

Today is the big launch day for one of my projects, the Evolution Megalab.

There’s an OU press release:

Snails, often the unloved blight of gardeners, are being put under the microscope with a new public science project being launched today (Monday 30 March) by The Open University. The Evolution MegaLab is a mass public research programme which is investigating how ordinary banded snails – found in back gardens, river banks and parks – have evolved over the last 40 years, by comparing data supplied by members of the public with a database of more than 8,000 historical records.

And a feature on Platform, the OU’s social site, including some videos featuring hot snail hunting action:

And some minor news outlets have picked up the story, including BBC News online, the Today programme, and some more substantial and trustworthy sources like John Naughton’s blog.

There’s been a lot of hard work gone in behind the scenes – mostly from other people, not me, I hasten to add – and the big credit should go to Richard Greenwood, our ace programmer, who even came in last night (i.e. Sunday) to fix a technical hitch which in crude terms amounted to persuading some people who had pulled the network plug out of the server that plugging it back in again would be a good idea.

One of the things I particularly like about this project is that you can get value out of it at lots of levels.  Young kids can have fun going out and spotting and counting up snails – the video above includes a 4-year-old.  At the other end of the scale, the clear phenotypic indications of the underlying genotype makes it a useful case study for population genetics in a third-level (i.e. third year undergraduate) course on Evolution.

Wild woods vs walled gardens

One the way back from the discussion of the Future of the Internet earlier today, I walked through campus and saw yet again two contrasting ways of growing plants:
Wild wood (OU campus)

The wild wood is a fairly heavily-wooded area in the middle of campus, with trees above an underzone of largely annual shrubs, herbs and grasses.
Walled garden (OU campus)

This walled garden is just outside Walton Hall, the Georgian manor house from which the campus takes its name.

I’m not the first one to note that these two contrasting approaches seem to have strong parallels with different ways of organising technical or educational projects (Martin Weller is fond of an ecosystem/succession model of HE institutional use of technologies, for instance). Or indeed with any human endeavour.

But it’s on my mind, and a good excuse to be outside in the Spring sunshine, and to (implicitly) sing the praises of lightly-managed endeavours like woods, meadows and hugely successful applications of generative technologies.

With a walled garden, you plan the layout of the garden, and spend intensive effort making sure that everything is neat and tidy and fits with the plan you’ve made.  It’s largely about control.  With a wild wood, things are a lot less tidy and constrained.

However, it’s a serious mistake to think that the two approaches are entirely at odds.

A walled garden needs regular attention, but with good planning that can be minimised, and if you imagine that you can control precisely what the plants do there … you obviously haven’t been gardening very long. You have to work with what actually grows, not your ideal of what should grow and how.

A wild wood also needs management and attention – it may not look it but it does – even when mature.  Certainly, over-management is generally a bigger danger than prolonged periods of what the medics call ‘expectant management’.  But if you don’t promptly remove invasive species you can easily end up with a stand of nothing but Japanese knotweed.  And you probably need some mechanism to keep the woody plants from over-dominating:  either natural grazing – a problem if you’ve fenced out all grazing animals in an effort to protect your wood – or something else. (This particular wood gets an annual cut of the meadow-like herbaceous lower layer – meadows are another of my favourite habitats.)

Creating a wood is a hard and long job.  You can do some things to speed up the process – though not much if you’re going for semi-natural woodland, but you’re looking at many decades, not a few years, before it looks anything like mature.  It can require as intensive processes of planting and weeding – at least in the early years – as any walled garden.  It certainly won’t look precisely like you imagined it at the start, if you’re even around to see it.

And all that is assuming that the local environment supports the sort of woodland you’re wanting. If it doesn’t, you’re looking at tremendously intensive inputs, if you can achieve it at all.

Walled gardens can be spectacularly beautiful and peaceful places.  But I much prefer woods.

Future of the Net

Liveblog from a seminar on The Future Of The Net (Jonathan Zittrain’s book – The Future of the Internet and How to Stop It.), 20 March 2009, by John Naughton.

Update: Listen to the MP3 and see the useful concept map from John Naughton himself.

Audience small but quite high-powered (eight, including Tony Walton, Paul Clark, Andy Lane) . OU Strategy Unit trying to reach out to academic units and others.

train tracks with points set to go off a cliff

John  lost his physical copy … but rightly guessed it’d be available online as Creative Commons-licensed text.

Jonathan Zittrain was employed sight-unseen as a Unix sysadmin at 13, then by some process (probably involving Larry Lessig) became a lawyer.

Part of an emerging canon – Lessig’s Code 2.0, Benkler’s Wealth of Networks – heavyweight academic stuff. Two sorts of people – trailblazers and roadbuilders; Lessig is the first. Our role in OU (including Relevant Knowledge Programme) is to follow and be roadbuilders, which is an honorable activity.

Core argument of book: Internet’s generative characteristics primed it for success, and now position it for failure. Response to failure will most likely be sterile tethered appliances.

Transformation of the Internet in a blink of an eye from thinking it’s just “CB de no jours” to taken-for-granted. John’s message is don’t take this for granted.

Three parts: 1 rise & stall of generative network, 2 after the stall (including a long and good analysis of Wikipedia), 3 solutions.

Conjunction of open PC and open Internet created the explosion of creativity, but contains within it the seeds of its own destruction. Parallel with T171 You, Your Computer and the Net (Martin did the PC, John did the net) – but didn’t study what happens when you put them together, which Zittrain does here. Not about proprietary versus open source – PC was an open device, if you could write code you could program the device.

John says people don’t understand what we’ve got in the current Net. Knowing the history helps. Design problem (Vint Cerf, IETF etc) – design for apps that haven’t yet been dreamed of, given distributed ownership. If you’re designing for the future, you don’t optimise for the present. Architectural solution has two key points: anyone can join (permissiveness); dumb network, clever apps (end-to-end principle). The openness is a feature, not a bug. Contrast with the case of the Hush-a-Phone.

Zittrain equation: Open PC + surprise generator = generative system

Thought experiments from James Boyle – gave two talks recently, at the RSA and John’s Cambridge programme. Almost everybody has a bias against openness: when something free and unconstrained is proposed, we see the downsides. (Because you can imagine those, whereas you by definition can’t imagine what hasn’t been invented yet.)  Imagine it’s 1992 and you have to choose between: approved sites with terminals at the end (like teletext/Minitel); dumb, unfiltered, permissive network (the Internet) with general-purpose computers at the end. Who would invest in the latter? Second question, still 1992, have to design an encyclopedia better than Brittanica: broader coverage, currency. Options: 1 – strong content, vast sums of money, strong editorial control, DRM. 2 – I’d like to put up a website and anyone can post stuff. Who’d pick the latter?

Posits tension – or indeed tradeoff – between generativity and security. Consumers will become so worried about this that they’ll (be encouraged to) favour tethered appliances and heavyweight regulation.

(I wonder if I can’t bring myself to believe in the Net being locked-down out of all recognition because I’ve always had it around in my adult life. It’s probably easier for people who really knew a world without it to imagine it going away.)

Part 2 explores our likely response to these problems, then Wikipedia. “With tethered appliances, the dangers of excess come not from rogue third-party code, but from […] interventions by regulators into the devices themselves.”

Criticism of book – it underestimates the impact of Governments on the problem. Remembering 9/11, like JFK assassination. (John was on the phone to a friend who was there at the time!). John wrote in his blog on that day that this was the end of civil liberties as we knew them, and in many ways was right. (My memory was that it was the first huge news story that I got almost entirely from the web.) But – one day the bad guys will get their act together and we’ll see a major incident. Dry-runs with what happened to Estonia. But there will be something huge and coordinated, and that’ll evoke the same sort of response.

Rise of tethered appliances significantly reduces the number and variety of people and institutions required to apply the state’s power on a mass scale. John thinks it’s like the contrast between Orwell and Huxley – likelihood of being destroyed by things we fear and hate, or things we know and love.

Dangers of Web 2.0, services in the cloud – software built on APIs that can be withdrawn is much more precarious than software built under the old PC model.  Mashups work (except they’re always breaking – see Tony Hirst’s stuff, just like links rot). Key move to watch: Lock down the device, and network censorship and control can be extraordinarily reinforced.

iPhone is the iconic thing: it puts you in Steve Jobs’ hands. It’s the first device that does all sorts of good things and could be open but isn’t.  (What about other mobile phones?) Pew Internet & American Life survey – Future of the Internet III – prediceted that the mobile device will be the primary connection tool to the internet for most people in the world in 2020. So this could be a big issue.

Wikipedia analysis in the book is extensive.  Looks at how it handles vandalism and disputes – best treatment John’s seen. How it happens is not widely understood. Discussion about whether Wikipedia or Linux is the more amazing phenomenon. (My argument is that Linux is in some ways less startling, because you have some semi-independent arbitration/qualification mechanism for agreeing who’s a competent contributor and which code works.)

Part 3 – solutions to preserve the benefits of generativity without their downsides. “This is easier said than done”. The way Wikipedia manages itself provides a model for what we might do. (I think not – I think Wikipedia works because it can afford to piss off and exclude perfectly good and competent contributors.) Create and demosntrate the tools and practices by which relevant people and institutions can help secure the Net themselves instead of waiting for someone else to do it – badwarebusters.org.

Barriers – failure to realise the problem; collective action problem; sense that system is supposed to work like any other consumer device.

Nate Anderson’s review in ArsTechnica – three principles – IT ecosystem works best with generative tech; generativity instigates a pattern; ignore the downsides at your peril.

Criticisms: too focused on security issues and not on commercial pressures; not enough on control-freakery of governments; too Manichean – mixed economies; too pessimistic about frailties (and intelligence and adaptability) of human beings; over-estimates security ‘advantages’ of tethered appliances.

Discussion

Parallel with introduction of metalled roads. Crucial to economic development, move people and stuff around as a productive system.  Early days were a free-for-all, anyone could buy a car (if rich enough) and drive it, no need for a test.  Then increased regulation and control.  (Also to cars – originally fairly easily tinkerable with, now not/proprietary engine management systems.)  Issue about equity, as much as open/closedness.

Lessons of Wikipedia and the creators of malware. Malware creators only need to be small in number. To take down Wikipedia and make it undependable would take too much effort and coordination. (I disagree – a smart enough distributed bot attack would do it.)

I can’t imagine no Internet/generative/smart programmable devices because never not had them. Grew up on ZX81 onwards, had the CPU pinout on the connector.  Helps to have smart people around who have known the world before that.

South Korea got taken out by SQL Slammer, bounced back though – system is pretty resilient.

Manhattan Project perhaps a bad parallel for an effort to help here – it was the ultimate in top-down command-and-control project, with a clearly-defined outcome. And it was constrained and planned so tightly that it couldn’t actually work until people like Feynman loosened things up a bit to allow some degree of decentralisation.

How do you sign people up? Won’t do anything about e.g. climate change – until their gas bills shot up. Science and society stuff, well known that people only become engaged when it becomes real to them. Liberal is a conservative who’s been falsely arrested; conservative is a liberal who’s been mugged.

Surveillance – makes it unlikely that major public outrage leading to reaction is small, most people don’t realise their clickstream is monitored. It’s only if something happened that made people realise it that they’d say no.  Hard to imagine the scale of community engagement happening.

Case a few months ago – Wikipedia vs Internet Watch Foundation. Readymade community leapt in to action immediately.  But less likely where you don’t have such an articulate and existing community. Also photographer crackdown – they do have access to the media. Danger of the Niemoller scenario where they come for small groups one at a time.

It’s an argument about the mass of technology, not the small cadre of techies – iPhone can be jailbroken if you know what you’re doing. And there are more not fewer opps for techies and more techies than ever before. Most PC users in the 80s only used what they were given. In 1992 I could write an app for the PC and send it to anyone on the Internet. Except hardly anyone was on the Internet then, and even though most techies were among them, most Internet users then couldn’t write their own stuff – or even install software off the net.  Techies a small proportion still (even though bigger in number than before), so still vulnerable to this sort of attack.

Mobile devices are key here, consumerism. People just want stuff that works, generally.

Google as another example – they build very-attractive services, but on the basis of sucking up all our data.  Harness amoral self-interest of large corporations in this direction. Also (enlightened?) interest of Western Governments in promoting openness.

John uses example of bread mix and a recipe  to illustrate open source. Parallels with introduction of car (wow, I can go anywhere); PC (wow, I don’t have to ask people for most disk quota) and Net (wow, I don’t have to ask for more mail quota). These things have an impact on society, can damage it. So for instance, if you have an open machine, could damage other people’s computers, hence need to regulate ownership and operation. With car, annual check you have road tax, insurance, MOT; with a PC the surveillance needs to be continuous.

The 9/11 disaster scenario is instructive: why didn’t we have the same response to the Troubles? Because not transnational/non-State actors. The Provisional IRA have tangible, comprehensible political objectives that could be taken on. Whereas 9/11 terrorism is more vague.  And malware is different. Wasn’t a problem when it had no business model … but now it has. Can now take it on?

Is the Internet just (!) an extension of civil society and how you should regulate it, or is it something quite different?  Motor traffic law introduced absolute offences (no mens rea) – it’s an offence to drive over the speed limit regardless of whether you know you are going that fast or what the limit is) because quite different threat.  Internet is at least as new so likely to spur at least as revolutionary – and shocking – change to our legal system.  Ok, now I’m scared, so that’s a result.

But we’re only eighteen (nineteen?) years in to the web.  It’s idiotic for us to imagine we understand what it’s implications are.  So the only honest answer is we don’t know. John argues we’re not taking a long enough view. 1455, eighteen years after the introduction of the printing press. MORI pollster, do you think the invention of printing will undermine the authority of Catholic Church, spur Reformation, science, whole new classes, change of concept of childhood.  Web is a complex and sophisticated space, so to regulate it right can’t be done overnight.  Tendency for people to make linear extrapolations from the last two year’s trends.

In the long run, this won’t look like such a huge deal in the history of humanity. It’ll be a bit like what happened with steam. It looks like the biggest deal ever to us only because we’re in the middle of it.

So what do you do when you know that on a 20-year horizon you’re blind?

My answer: get moving now, plan to change and update regularly.  Expect to have to fiddle with it, throw great chunks of things away because they’re no longer relevant. Challenge to OU course production model! (Actually, I’m wrong to say throw away – more expect that things will become eclipsed and superseded – old technologies die very hard.)

We’ve become more open/diverse in our offer to bring in enough people. Which is hard – costs and scale versus personalisation.

iSpot and taxonomy

Work on the Biodiversity Observatory – to be called iSpot to the public – is proceeding apace. One of the things we want to be able to offer to help people in getting scientific names is to be able to map between common names for things and the scientific names. Once you know the scientific name of a species, you can find much more information than if you only know the common name. We also want to be able to help people get scientific names right – it’s easy to get them wrong – so we want to be able to provide facilities like ‘did you mean X’ when people mistype a name.

To make that work, we need a database behind the scenes that has a list of correct scientific names, a mapping between common names, and information about the taxonomic tree: each Species is part of a Genus, which is part of a Family, which is part of an Order, which is part of a Class, which is part of an Order, which is part of a Division, which is part of a Kingdom, which is part of Life.

This gets reasonably complicated even if everybody agreed on what goes where.  There’s all sorts of messing around with sub-Families and supra-Classes and things like that on top of the basic tree structure. But of course people don’t agree. And even if everybody agreed now, new information about species’ relationships to each other is becoming available all the time – especially as genetic sequencing becomes cheaper and easier to do and cleverer ways of mining genetic information to reveal evolutionary history are devised. So as we learn more, species get renamed, merged, split, and relocated in the taxonomic tree. And it’s not just obscure species that most iSpot users will never see that get changed around like this – the common garden snail has now been given at least four different scientific names (Helix aspersa, Cryptomphalus aspersus, Cantareus aspersus, and Cornu aspersum) and which is ‘correct’ or ‘preferred’ has been the matter of debate, sometimes vigorous, over time.

We really don’t want to do this work ourselves. It’s a whole discipline in itself, and we can’t hope to duplicate or exceed it. And one of our central development principles is to build on or link to existing work, rather than duplicating effort.

Luckily, there are two important databases that have (some of!) the information we want.

The first is the National Biodiversity Network‘s NBN Species Dictionary. This is as close as we can get to a complete, definitive list of species in the UK. Different parts of it (checklists) are maintained by different groups of specialists, and updated as those specialist groups decide. New versions are published roughly four times a year. (Although there’s a backlog of new information to check in for the latest update so that’s somewhat delayed.) It includes a scientific name and an NBN species ID. This species ID can be used to access the lovely web services that NBN make available via an API. It also has some mapping to common names for classification groups – it has a controlled list of about 160 names (e.g. ‘terrestrial mammals’, ‘higher plants’) that map on to the scientific names for points on the taxonomic tree but are (hopefully) comprehensible to ordinary people – or at least, ordinary people who start to get a little bit interested in nature. Even better, within each checklist, there is definitive hierarchical information – what Order, Family, Genus etc each species belongs in – for each preferred scientific name. However, combining all these to give a single consensus tree is a huge amount of work. The Natural History Museum (who do a lot of the work looking after the Species Dictionary for the NBN) did this work once, but then gave up because maintaining it was so hard. So the Species Dictionary can give us a definitive list of preferred scientific names (and also other scientific names and how they map on to preferred scientific names). It can also give us a broad-brush top-level classification for all of these. There is taxonomic hierarchy information, but not that’s easily combinable. (I want to have a look to see if we can use the checklist-level hierarchy data to support browsing at that level.), But the Species Dictionary doesn’t give us very comprehensive mappings between common names and scientific names. There’s some, but not a lot.

The second source of data is the Natural History Museum’s Nature Navigator. This is a lovely website for browsing the taxonomic tree, switching back and fro as you want between scientific and common names. Nature Navigator contains everything that has a common name. Some common names map on to scientific species names, but others map on to other parts of the tree – so ‘Pea family’ maps on to ‘Family Fabaceae’. It also contains complete and reasonably definitive hierarchical information (all keyed from the scientific names, rather than the common ones, but you can generate the common one on the fly). This looks much more promising for our purposes, since it has so many more common names and complete, usable hierarchical data.

However (there has to be at least one ‘but’ in taxonomy, I’m learning): it only covers things which have a common name, and lots of things don’t, including things that people will want to spot on iSpot – including insects and spiders. And it’s been frozen in stone since the funding ran out in 2004, and things have changed since then. And the taxonomic data it uses differs from common UK usage in many important regards – for instance, the bird data is quite different to what most UK birders use.

In rough order-of-magnitude figures:

NBN Species Dictionary: contains 250,000 scientific names, which reduces to about 80,000 preferred scientific names for species when synonyms and so on are taken in to account.  Some patchy common name mappings. All classified in to just over 100 ‘comprehensible’ taxonomic groupings. Updated regularly.

Nature Navigator: contains 140,000 common names, mapped on to appropriate preferred scientific names/points on the taxonomic tree.

Just to add to the fun, there is an international effort well underway to create a definitive list for all species across the world, called Catalog of Life, merging work by ITIS in the US and Species 2000 at the University of Reading. The aim is to create a globally-unique identifier for species – a Life Sciences ID or LSID. Thankfully, though, we as a project can leave the coordination and mapping between that and the NBN Species Dictionary to others.

The Right Answer would be to include Nature Navigator data as a checklist within the NBN Species Database, which would fold all of the Nature Navigator common name data in to the definitive Species Database. That’s a fair amount of work, but may well be within the scope of what the Taxonomic support project within OPAL (Open Air Laboratories – the parent project of the Biodiversity Observatory) will do. We’ll be pressing them to do that.

Of course, that almost certainly won’t happen in time for the launch of iSpot in the Summer, so we’ll need a stopgap solution of some sort … somehow I think converting taxonomists, biologists and field studies experts to a loose, Web 2.0 folksonomy approach is going to be beyond the scope of this project!

Backup on XP (geeky)

I asked the crowd via Twitter (and thence Facebook) about backup solutions for Windows XP, and got several responses, plus a few requests to hear what I found out, so this is to summarise that.

The particular problem I want to solve is backup on to an external huge hard disk.  This post gets a bit long and techie, but the short answer is I went with NTBackup, the backup tool built in to XP.

As the canard goes, backing up is a bit like flossing, in that everybody knows you ought to do it regularly but most people don’t. Except people who’ve been burned in the past.

Luckily, my then-technophobic mother taught me that particular lesson at an early age, when she wiped my first ever full-scale program by accidentally knocking the power cable out from the back of ZX Spectrum.  (I was trying to get her to test how user-friendly I’d managed to make it, and so I also learned the valuable lesson that real users can create whole categories of problems you did not anticipate.)

Backup is one of those things that in my head is a known solved problem.  There are two interesting problems to solve – the main one is how to back up the minimum amount of stuff but still cover everything; a secondary one is how to structure the backups to make it easy to get things back.

The ‘back up the minimum amount of stuff’ problem is essentially the problem that the rsync algorithm solves: how to find the minimum amount of data to cover the changes between an original and an updated chunk of data.  So any GNU/Linux installation can use rsync as the basis for an automated (or any degree of semi-automated) backup system.

And Unix-like file systems have another property that makes the secondary problem easy: hard linking. This essentially means you have a single file on the disk, but appearing in more than one place in the directory tree (folder hierarchy, if you prefer).  This is really really useful for backup, because it means you can do a full backup – copying everything – in to one directory on your backup disk, and then subsequently do an incremental backup (just the stuff that has changed) to another directory, adding hard links to the full backup.  And you can keep doing incremental backups like that. The clever bit is that each time you do a backup, the directory looks like a complete copy of whatever you are backing up, but the extra disk space taken up is only the difference between that backup and the last one.  Even better, you can delete (unlink) arbitrary backups without losing any other data. So, for instance, you could create a backup every hour, and delete backups on a rota so you end up with backups every hour for the last day, every day for the last fortnight, every fortnight for the last few months, etc.

(If you don’t have this system, you have to keep everything between the last full backup and the last incremental backup, or you’ve effectively lost your backup.  This is very fiddly to get right, and is a common cause of problems restoring from backups.)

If you’re a half-decent Linux geek, you can easily roll your own backup system with cron, rsync and a short shell script.  If you have a Linux box but that’s more fuss than you can be bothered with, there are umpteen Open Source graphical front ends to essentially the same system. These are of variable beauty and usability.

If you have a Mac, you get Time Machine, which has Apple’s beauty and usability built in to its interface, and the power/efficiency of the Unixy approach underneath. If you have an external drive to devote to it, it really is as simple as saying ‘Time Machine, do your thing on this drive’ and remembering to plug the drive in from time to time.  This is my dream backup system.

Alas, Windows XP doesn’t have this option. And my existing backup strategy (burn DVDs at pseudorandom times, keeping manual notes of what’s been backed up and what’s not) left a lot to be desired.

The problems run moderately deep, though.  Windows doesn’t come with rsync (though there are multiple ports, but you usually have to go half-way to a dual boot system (Cygwin) to make them work properly), and it doesn’t really do hard links (actually, it can, but not in a way that’s simple and straightfoward to the user, and so hardly any software does). It has its own system of flagging changed files (the archive attribute) which is fraught with problems.

So what’s to do?

The first solution suggested (thanks @andrew_x!) was to convert the Windows machine to a dual-boot system with Linux (e.g. Ubuntu), and use that to back up the Windows data.  That has the mathematician’s appeal of reducing it to a known solved problem.  If I wanted a dual boot system anyway and planned to spend most of the time in Linux, it’d be the top choice. But I don’t (I have other machines with Linux on).  Any backup regime that has ‘reboot in to a different operating system’ as step one is unlikely to be pursued as rigorously and regularly as it should.

The next set of solutions (thanks @elpuerco63 @hockeyshooter and others) is to buy some backup software.  There are plenty, from ECM/Dantz Retrospect (which is aimed at people with several Windows boxes to back up) and similar server-based packages, to the straight-up standard consumer packages like Symantec Norton Ghost, or Symantec Norton Save and Restore. (These two are the ones of the standard paid-for offline backup tools that Which? apparently rates as Best Buys.)  All of these, however, cost actual money, which I am very keen not to spend – partly because I have very little spare cash at the moment, partly because it seems silly to spend money on something when there are good Free/Open Source Software solutions, and partly because it’d mean I couldn’t get a backup done this weekend.

There’s a plethora of back-it-up-to-the-cloud solutions. I wasn’t interested in any of those because I have:

  • a) 120 Gb to back up and a capped Internet connection,
  • b) some nervousness about sending every last drop of my personal data in to the network,
  • c) a degree of skepticism about the reliability of such services, and
  • d) a vague, woolly echo of Richard Stallman’s political objection to cloud computing – though usually this is often balanced by a similarly vague, woolly echo of David Brin’s argument that a transparent society would be a good thing, and utterly outgunned by the siren call of Convenience.

Plus they cost real money for more than a few Gb, and my first two objections apply.  (If you do only have a few Gb of files to back up, I can heartily recommend Dropbox – free for <2Gb, syncs multiple machines and platforms easily.)

You also often get simple backup software bundled in with other things: Nero (the CD-burning package) apparently has a backup feature, and many external hard drives come with some toy backup software thrown in. Mine didn’t.

What I did manage to put my hands on, though, was NTBackup, the backup tool built in to Windows XP.  (In XP Home, it’s not installed by default – you need to get your original media and find and run NTBackup.msi in \Valueadd\Msft\Ntbackup.) It lives in Start | Accessories | System Tools.

It’s not world-class stuff: you can tell it was written for the original Windows NT 3.51.  Charmingly it defaults to writing the backup to A:\BACKUP.BKF (off the top of my head, I make it that I’d need over 80,000 floppy disks to back up my data, which would be a little tedious to insert). And the interface is almost wilfully ugly.

But (a) it didn’t cost me any more money, (b) it was to hand, (c) it has a handy option for backing up the system state (including the Registry), (d) it groks Volume Shadow Copy so can copy in-use files, and (e) it worked.