iSpot and taxonomy

Work on the Biodiversity Observatory – to be called iSpot to the public – is proceeding apace. One of the things we want to be able to offer to help people in getting scientific names is to be able to map between common names for things and the scientific names. Once you know the scientific name of a species, you can find much more information than if you only know the common name. We also want to be able to help people get scientific names right – it’s easy to get them wrong – so we want to be able to provide facilities like ‘did you mean X’ when people mistype a name.

To make that work, we need a database behind the scenes that has a list of correct scientific names, a mapping between common names, and information about the taxonomic tree: each Species is part of a Genus, which is part of a Family, which is part of an Order, which is part of a Class, which is part of an Order, which is part of a Division, which is part of a Kingdom, which is part of Life.

This gets reasonably complicated even if everybody agreed on what goes where. There’s all sorts of messing around with sub-Families and supra-Classes and things like that on top of the basic tree structure. But of course people don’t agree. And even if everybody agreed now, new information about species’ relationships to each other is becoming available all the time – especially as genetic sequencing becomes cheaper and easier to do and cleverer ways of mining genetic information to reveal evolutionary history are devised. So as we learn more, species get renamed, merged, split, and relocated in the taxonomic tree. And it’s not just obscure species that most iSpot users will never see that get changed around like this – the common garden snail has now been given at least four different scientific names (Helix aspersa, Cryptomphalus aspersus, Cantareus aspersus, and Cornu aspersum) and which is ‘correct’ or ‘preferred’ has been the matter of debate, sometimes vigorous, over time.

We really don’t want to do this work ourselves. It’s a whole discipline in itself, and we can’t hope to duplicate or exceed it. And one of our central development principles is to build on or link to existing work, rather than duplicating effort.

Luckily, there are two important databases that have (some of!) the information we want.

The first is the National Biodiversity Network‘s NBN Species Dictionary. This is as close as we can get to a complete, definitive list of species in the UK. Different parts of it (checklists) are maintained by different groups of specialists, and updated as those specialist groups decide. New versions are published roughly four times a year. (Although there’s a backlog of new information to check in for the latest update so that’s somewhat delayed.) It includes a scientific name and an NBN species ID. This species ID can be used to access the lovely web services that NBN make available via an API. It also has some mapping to common names for classification groups – it has a controlled list of about 160 names (e.g. ‘terrestrial mammals’, ‘higher plants’) that map on to the scientific names for points on the taxonomic tree but are (hopefully) comprehensible to ordinary people – or at least, ordinary people who start to get a little bit interested in nature. Even better, within each checklist, there is definitive hierarchical information – what Order, Family, Genus etc each species belongs in – for each preferred scientific name. However, combining all these to give a single consensus tree is a huge amount of work. The Natural History Museum (who do a lot of the work looking after the Species Dictionary for the NBN) did this work once, but then gave up because maintaining it was so hard. So the Species Dictionary can give us a definitive list of preferred scientific names (and also other scientific names and how they map on to preferred scientific names). It can also give us a broad-brush top-level classification for all of these. There is taxonomic hierarchy information, but not that’s easily combinable. (I want to have a look to see if we can use the checklist-level hierarchy data to support browsing at that level.), But the Species Dictionary doesn’t give us very comprehensive mappings between common names and scientific names. There’s some, but not a lot.

The second source of data is the Natural History Museum’s Nature Navigator. This is a lovely website for browsing the taxonomic tree, switching back and fro as you want between scientific and common names. Nature Navigator contains everything that has a common name. Some common names map on to scientific species names, but others map on to other parts of the tree – so ‘Pea family’ maps on to ‘Family Fabaceae’. It also contains complete and reasonably definitive hierarchical information (all keyed from the scientific names, rather than the common ones, but you can generate the common one on the fly). This looks much more promising for our purposes, since it has so many more common names and complete, usable hierarchical data.

However (there has to be at least one ‘but’ in taxonomy, I’m learning): it only covers things which have a common name, and lots of things don’t, including things that people will want to spot on iSpot – including insects and spiders. And it’s been frozen in stone since the funding ran out in 2004, and things have changed since then. And the taxonomic data it uses differs from common UK usage in many important regards – for instance, the bird data is quite different to what most UK birders use.

In rough order-of-magnitude figures:

NBN Species Dictionary: contains 250,000 scientific names, which reduces to about 80,000 preferred scientific names for species when synonyms and so on are taken in to account. Some patchy common name mappings. All classified in to just over 100 ‘comprehensible’ taxonomic groupings. Updated regularly.

Nature Navigator: contains 140,000 common names, mapped on to appropriate preferred scientific names/points on the taxonomic tree.

Just to add to the fun, there is an international effort well underway to create a definitive list for all species across the world, called Catalog of Life, merging work by ITIS in the US and Species 2000 at the University of Reading. The aim is to create a globally-unique identifier for species – a Life Sciences ID or LSID. Thankfully, though, we as a project can leave the coordination and mapping between that and the NBN Species Dictionary to others.

The Right Answer would be to include Nature Navigator data as a checklist within the NBN Species Database, which would fold all of the Nature Navigator common name data in to the definitive Species Database. That’s a fair amount of work, but may well be within the scope of what the Taxonomic support project within OPAL (Open Air Laboratories – the parent project of the Biodiversity Observatory) will do. We’ll be pressing them to do that.

Of course, that almost certainly won’t happen in time for the launch of iSpot in the Summer, so we’ll need a stopgap solution of some sort … somehow I think converting taxonomists, biologists and field studies experts to a loose, Web 2.0 folksonomy approach is going to be beyond the scope of this project!

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow

2 thoughts on “iSpot and taxonomy”

Martin Harvey says:

5 March 2009 at 11:51 am

> “However (there has to be at least one ‘but’ in taxonomy, I’m learning)”

Good to see you’re entering into the spirit of taxonomy by using “however” and “but” as synonyms, even in the same sentence!

Thanks for these comprehensive notes from yesterday’s meeting. We may end up using different solutions for different parts of iSpot, e.g. the Nature Navigator taxonomy may work fine for mammals and plants, but we use NBN dictionary for invertebrates, fungi etc. And for at least some of those latter groups we may be able to use one of the ‘recommended checklists’ within the NBN dictionary.

Another consideration is that we do need to be able to link to the NBN taxon_version_key, can’t remember whether the Nature Navigator list does that directly.
dougclow says:

5 March 2009 at 12:50 pm

Using different checklists in different parts of iSpot would be a bit of a challenge – not impossible, but would be more work.

I’m currently thinking a good starting point for would be to work with the Species Dictionary ‘comprehensible’ taxa for now, mapping them down from our eight top-level groups, and create groupings in iSpot for each of those. I think these get far enough down the taxonomic tree to identify groupings that (some? many?) schemes and societies might be interested in, and will be at a level where it’s sensible to talk about ‘how to spot things like this’, and ‘where to get good keys/species data/pictures/other info for this taxon’.

I think we can manage without Nature Navigator’s taxonomy tree until someone (else!) does the work of integrating it in to the Species Dictionary. We probably can’t do without the common names mapping, though – I think that’s going to be crucial for helping people move from casual interest to the beginnings of expertise.

Comments are closed.

Share this:

Related

Author: dougclow

2 thoughts on “iSpot and taxonomy”