Digital Scholarship Hackfest

A bunch of us got together yesterday and today at Bletchley Park for a Digital Scholarship Hackfest:  Martin Weller, Will Woods, Kev McCleod, Juliette CulverNick Freear, Antony McEvoy (a designer newly joined Other Parts of the OU spending a couple of days with us), and Lynda Davies.

P1010242

The setting is great for doing techie things, although I always feel slightly awed being at Bletchley Park for work.  When I’m just in tourist gawking mode it’s fine, but when I’m doing techie things I always feel a bit of an impostor. Alan Turing and the other wartime codebreakers were doing really really clever stuff, and here I am trying to do things in that line … it’s a tall order to live up to.  The place was open to visitors while we were working, and we fondly imagined that the tourists peering in through the window would think that we were hunched over our laptops (almost exclusively Macs) were busily engaged in modern day codebreaking.

The one major downside to the venue was the terrible wifi.  It was desperately slow, and not up to supporting a roomful of geeks.  I’m sure it’s been better at other meetings I’ve been to – but it may be that something special was laid on then.  It was just enough to keep us going, but I think we’d have been a lot more productive with more.

Digital Scholarship

The Digital Scholarship project is an internal one, led by Martin, with two aims:

  1. Get digital outputs recognised. (Martin’s posted about this sort of stuff already)
  2. Encourage academics to engage with new types of output and technologies.

There’ll be a programme of face to face events and sessions, but we want an online system to help support and encourage it all, and that’s what we’re here to do.

Principles:

  1. Based around social media principles
  2. Easily adaptable
  3. Inforporate feeds/third party content easily
  4. Not look and feel like a (traditional!) OU site

On that last one, we don’t want it to feel like a carefully honed, current-standard, usual-topbar OU site.  But we do want it to look like what the OU might become – what we’d like it to become – in the future.

The audience is OU academics (and related staff), but we (may?) want to make it more open to those outside later.

What we did

We spent the first day thrashing through what we meant by digital scholarship, and what the site might do for users, and what we could build easily and quickly.  We spent the second day getting down and dirty with building something very quickly.  I say ‘we’, but it was mostly the developers – Juliette and Nick – plus Antony, our designer.  Martin and I floated around annoying them and pointlessly trying to make the wifi work better.

Sadly, Martin rejected my offer to invent a contrived and silly acronym for the project, but my suggestion to call the site ‘DISCO’ (for DIgitial SChOlarship) seemed to be reasonable enough to run with.  We had a bit of light relief thinking about visuals for the site – all browns and oranges and John Travolta pointing at the sky – but I suspect Antony was too sensible to take on our wackier suggestions, and the final site will not feature a Flash animation of a rotating glitter ball in the middle of the page sending multicoloured sparkles all over the screen.

Digital Scholarship Profiles

While we were beavering away, Tony Hirst (armed with a better net connection, no doubt) was musing on what we might be measuring in terms of metrics.  Well, here goes with how far we got.

One aspect of the project I worked on particularly was a Digital Scholarship Profile (a Discopro?).  The idea of this is some way of representing digital scholarship activity, and working towards some metrics/counts.

What we want to be able to do is to show – for each person – the range, quantity and quality of their digital scholarship outputs.

This would serve several purposes.  Firstly, it’s a stab at representing activity that isn’t captured in conventional scholarship assessments.  Secondly, by showing the possibilities, and letting you link through to see people who’ve done good stuff, you make it easier for people to develop in new ways.

We could show, for each area of digital scholarship output, what each person was doing, and how ‘good’ that was (more of which later).  On your DISCO profile the site would show your best stuff first, and as you went down the page you’d get to the less impressive things, and then (perhaps over a line, or shaded out) would be the areas where you had no activity yet.  For each area, we’d have links to:

  • suggestions for how to improve in that area (and on to e.g. Learn About guides)
  • links to the profiles of people with very ‘good’ activity in that area
  • information about how we do our sums

P1010240

Of course, metrics for online activity are hugely problematic.  They’re problematic enough for journal articles, but at least there you have some built in human quality checking: you can’t guarantee that a paper in Nature is better quality (for some arbitrary generic definition of ‘quality’) than one in the Provincial Journal of Any Old Crap, but it’s not a bad assumption.  And any refereed journal has a certain level of quality checking, which impedes the obvious ways of gaming the metrics by simply churning out nonsense papers. (Though I wouldn’t claim for a moment that there has been no gaming of research metrics along these lines.)

How do you measure, say, blog activity, or Slideshare?  You can get raw production numbers: total produced, average frequency of production, and so on.  However, there’s only negligible bars to publishing there, and any half-techie worth their salt could auto-generate semi-nonsense blog posts.

But this is relatively straightforward to measure, and nobody in academia is going to be so stupid as to simply equate quantity of stuff produced with quality, so I think we can do that without too much soul-searching.

How can we assess quality? One approach would be to take a principled stand and say that peer review is the only valid method.  This view would see any metrics for academic output as irretrievably problematic at best, and highly misleading at worst  That stance is one that might appeal particularly in disciplines which outright rejected a metrics-based approach for the REF.  The downside, of course, is that peer review is hugely expensive – even for the most selective stuff academics do (journal articles), the peer review system is creaking at the seams.  There’s no way that we could build a peer review system for digital scholarship outputs.

There are, however, some – very crude – metrics for assessing (something that might be a proxy for) quality of online resources.  You can (often) get hold statistics like how many times things have been read, number of comments made, number of web links made to the resource, and so on.  As with the production numbers, you can game these up.  It’s not entirely trivial to do more than a handful – but most academic blog posts (in the middle of the distribution) will be getting handfuls of pageviews anyway, so getting your mates to mechanically click on your posts would probably have a noticeable effect.  And these are proxy measures for quality at the very best.  The sort of stuff that’s likely to get you large amounts of online attention is not (necessarily) the sort of stuff that is of the highest academic quality.  I can guarantee that a blog post presenting a reasoned, in-depth exposition and exploration of some of the finer points of some abstruse discipline-specific theory will get a lot less views than, say, a blog post promising RED HOT PICS OF BR*TN*Y SP**RS N*K*D, for instance.  Less starkly, the short, simple stuff tends to get a lot more link love than long, heavyweight postings – which is, alas, an inverse correlation with academic rigour (though not a perfect one).

There’s also an issue that statistics in this domain will almost certainly be very highly unequal – you’ll get a classic Internet power-law distribution, where a small number of ‘stars’ get the huge overwhelming majority of the attention, and a long tail get next to nothing.  We can probably hide that to some degree by showing relative statistics – either a rank order (competitive!) or perhaps less intense by showing, e.g. quintiles or deciles, with some nice graphic to illustrate it.  We mused about a glowing bar chart, or a dial that would lean over to ‘max’ as you got better.

P1010231

This is an experiment, and we want to explore what might work, so we don’t have to solve this problem.  And in a two-day hackfest to get something going, we’re going to shortcut all that careful consideration and just see what can be done quickly – knowing that we’ll need to tweak and develop it over time.  Or even through it away entirely.

So what could we possibly measure, easily?

The model we’re running with is that there are several categories of things you might produce (research papers, blog posts, photos, etc), and for each category, there’ll be one or more service that you might use to host them – so, for instance, you might put presentations on Slideshare or on Prezi.com.  And then for each service, we can measure a whole range of statistics.

Here’s an outline of what we’re thinking:

P1010230

Categories:

  • Research paper repositories: Open Research Online and/or other institutional repository, subject-specific repository, and so on
  • Learning resources: repositories – e.g. OpenLearn, MERLOT, etc etc
  • Documents:  Scribd, Google Docs, OU Knowledge Network
  • Websites: Wikipedia (contributions – not your biography for fear of WP:BLP), resources, etc
  • Blogs: individual personal blogs, group blogs (could get feed for each one), etc
  • Presentations: Slideshare, Prezi, etc
  • Lifestream: Twitter, Tumblr, Posterous, FriendFeed, etc
  • Audio/video: podcasts, iTunesU, YouTube, Blip.tv, etc
  • Links/references: Delicious, Citeulike, Zotero, etc
  • Photos/images: Flickr, Picasa Web Albums, etc

The idea for these categories is that they’re a level at which it makes some sort of sense to aggregate statistics.  So, for instance, it makes some sense to add up the number of presentations you’ve put up on Slideshare and on Prezi … but it probably makes no sense at all to add up the number of photos you’ve posted to Flickr and the number of Tweets you’ve posted on Twitter.

Statistics – production statistics:

  • Count of number of resources produced
  • Frequency of resources produced (multiple ways of calculating!)

Statistics – impact/reception statistics:

  • Total reads/downloads/views of resources (sum of all we can find – direct, embed, etc) (also show average per resource)
  • Count of web links to resource (we generate? via Google API)
  • ‘Likes’/approval/star rating of resources (also show average per resource)
  • Count of comments on the resource (also show average per resource)

Statistics – Composite statistics

  • h-index (largest number h such that you have h resources that have achieved h reads? links? likes?)

I really quite like the idea of tracking the h-index: it takes a bit of understanding to suss how it’s calculated, s0 not everybody instantly understands it.  But it’s moderately robust and it’s a hybrid production/impact type statistic.  The impact component needs a little thought, and it might well vary from service to service.  There’s less symmetry in online statistics than there is in citations: if you get a few hundred citations for a paper, you’re doing really very well, but it’s not that hard to get a few hundred page views for an academic blog post.  A few hundred links, however, might be equivalently challenging.

We’re imagining some sort of abstraction layer for the profile, so we can plug in new services – and new categories – fairly easily.  One key point we want to get across is that we’re not endorsing a particular service or saying that people ought to use them: we’re trying to capture the activity that’s going on where it’s going on.

We’ll need to keep a history of the statistics, and also careful notes about our calculation methodologies and when they change (as they no doubt will).  Nice-to-have down the lines features could then include graphs, charts, trends (trend alerts!) and so on.

There’s no way that we can get all of these things up and running in two days of hacking – highly skilled as our developers are.  So we’re going for a couple of example ones to get the idea across, and will add others later.

We want to produce feeds of all this stuff and/or expose the raw data as much as possible.  But again, that’s one for later rather than the proof-of-concept hack we’re putting together just now.

Sadly, the wifi connection at the venue was a bit flaky and slow, so we did the hacking on local machines rather than somewhere I can point you to right now – but expect a prototype service to be visible soon!  Unless, alas, you’re outside the OU … one design decision we made early was to keep it behind the OU firewall at least initially until the system is robust enough to stand Internet scrutiny – both in terms of malicious attacks, but also in terms of getting our ideas about what this should be thrashed through.

There’s the eternal online educational issue of open-ness versus security: making things more open generally makes them more available, and (with the right feedback loops in place) better quality; but on the other hand, people – especially people who don’t live in the digital world, like our target audience – often appreciate a more private space where they can be free to take faltering steps and make mistakes without the world seeing.  We’re trying more up the walled garden end to start with, but will revisit as soon as the site has had more than two academics look at it.

Next steps

We didn’t quite have a working site when we finished, but ended up with this list of things to do to get the site up and working:

  • order URL (disco.open.ac.uk?)
  • get Slideshare embeds working (problem with existing)
  • put on server – integrate design, site (Juliette), profile (Nick)
  • integrate with SAMS
  • finish coffee functionality – Juliette
  • finish barebones profile functionality – Nick
  • allow users to add link (in Resources)
  • check of site (and extra content added) by Martin
  • put ‘alpha’ on the site

And this list of longer term actions:

  • support
  • extended profile/statistics – API/feed/data exposure
  • more integration with OU services
  • further design work
  • tag clouds / data mining
  • review of statistics/profile
  • review the (lack of) open-ness
  • get more resource to do more with the site

For now, though, the best picture of the site I can give you is this:
P1010238

(There’s more photos of our flipcharts and the venue in this photoset on Flickr.)

Medium and message (liveblogging debate round 2)

Today, the OU’s Vice Chancellor, is giving a speech to Council (the OU’s governing body – like the Board of a company) and an internal audience on ‘Scholarship in the Digital Age’.  She will be speaking “about the impact of new technologies, including Web 3.0, on the University’s business: how it affects teaching, research and the student learning experience”.

I’m really encouraged: this is exactly the focus we as a University need to be taking.  And I’m not just saying that because it’s my area and so I obviously think it’s important. (Although there probably is an element of that.)  I’ve said before that the VC is reading the right stuff (e.g. Here Comes Everybody) and there was further proof in the internal publicity for the talk – it was accompanied by a big copy of the xkcd Online Communities map:

But I’m also worried about whether to bring my laptop or not.  Last time I liveblogged from a talk by the VC, people complained – entirely reasonably – that I was disturbing them with my typing – which I was.  I did a post arguing (not entirely clearly) that the fact that such typing was disruptive showed that we have a mountain to climb in getting the OU to where we need to go. The discussion generated more traffic – and blog links – on here than anything else I’ve written, and ended up with me realising that by (inadvertently) reinforcing the stereotype of laptop users as antisocial inconsiderate types, I’d set things back, not forward.

Just to be totally clear: I am not saying that people are wrong when they say they are being disturbed.  They are being disturbed, and that impairs their ability to hear and understand what’s going on.  And with this particular presentation, people who aren’t (yet) natural laptop users and bloggers are the ones who really need to hear it: us techies are the choir the VC is preaching to here.

So do I bring it or not?  If I do, it’ll disturb people.  If I don’t, we lose (some of) the benefits of doing just what (I expect) the VC will be exhorting us to do.

Happily, our Comms team have come up with an excellent compromise: there’ll be a blogger area outside, with a screen showing the VC, and we can hammer away on keyboards to our hearts’ content without disturbing people who find such things distracting.  (There may even be sufficient power sockets!)

It’s not ideal – to exclude this group from the room itself is a little unfortunate.  But I’d rather leave the people inside free of distractions from technology so that they can come to love (appropriately used) technology.

… and I’m also going to try to keep the meta-discussion about liveblogging separate from the actual stuff, at least on here.  I’m sure there’ll be all sorts of stuff on Twitter.