July 2009 – Doug Clow's Imaginatively-Titled Blog

The murky issue of licensing

Over on the excellent new-look Cloudworks, there’s a debate going on about what to do about licensing of content on the site. There’s two questions: one is what license to choose, and the other is what to do about the stuff that’s already on the site (!). The latter I’m going to discuss over on that article, since it really only applies to Cloudworks, but what license is the Right One is a big and more general question.

This is far from a settled issue in the educational community. There’s reasonable consensus that Creative Commons licences are Good, rather than more restrictive ones, but as you probably already know, there are multiple versions of Creative Commons licenses. The details are set out nicely on the Creative Commons Licenses page. As someone releasing material, there are basically four conditions you can choose to apply:

Attribution (BY – must give credit)
Share Alike (SA – can make derivative works but only if licensed under a similar licence)
Non-Commercial (NC – only if not for commercial purposes)
No Derivatives (ND – can only distribute verbatim copies, no derivative works)

You can combine (some of) these to create six distinct licences:

Attribution (CC:BY)
Attribution Share Alike (CC:BY-SA),
Attribution No Derivatives (CC:BY-ND),
Attribution Non-Commercial (CC:BY-NC),
Attribution Non-Commercial Share Alike (CC:BY-NC-SA)
Attribution Non-Commercial No-Derivatives (CC:BY-NC-ND)

There’s also a newish licence called CC0, which is intended as a way of unambiguously releasing a work into the public domain, free of any restraint.

So – assuming your aim is to promote the widest possible access to the material – what license should you choose?

There is a big, ongoing and fundamental argument going on in the Open Educational Resources (OER) and wider online educational community about this, with Stephen Downes and David Wiley perhaps the most articulate/notable exponents of two different positions. To caricature horribly (and my apologies to both), Stephen Downes’ position is that the most effective licence is CC:BY-NC-SA, and David Wiley’s is that simple CC:BY is better (or even CC0). This is overlaid (or perhaps underpinned) by a difference of approach, between a strong principled approach, and a pragmatic one. (If you’re at all interested, I really do recommend digging in to their ongoing conversation about this. A good starting place might be this post by Stephen Downes, and this one by David Wiley, mostly on NC – or these lists of the responses of one to the other. If you’re going to Open Ed in Vancouver, they’re planning to debate each other face-to-face, which should be very illuminating. A recent contribution to the ongoing debate is that Wikipedia has recently moved to CC:BY-SA.)

The argument for minimal licensing (CC:BY or less) is in essence that the other conditions create un-necessary barriers to reuse and distribution. So, for instance, insisting on Non-Commercial would stop a company distributing printed copies of the work for profit, which might make it more available than it would otherwise be. The arguments for more restrictive licensing include a fear that commercial interests will crowd out the free sources, using their greater marketing leverage, and that requiring Share-Alike keeps the ‘open-ness’ attached to the work.

There are obvious parallels with the Free/Open Source Software debate: there, the ideologically-pure line (what you might call the Richard Stallman view) has not been anything like as widely-adopted as the more flexible one (what you might call the Linux view). Being widely-used, of course, does not mean that the approach is right.

For educational resources, my own current personal view is that CC:BY is the licence of choice, where possible.

It’s the least restrictive license and is the lowest barrier to sharing. All the other CC licenses create speedbumps (or worse) to people who want to use or remix material.

We know that re-use is not widespread default practice in the educational community, and adding in extra barriers seems entirely the wrong tack to me. If you’re wanting to use something, the extra conditions create headaches that make it – for most practical purposes – at least look like it’s easier and quicker to just re-create your own stuff. It’s hard enough persuading colleagues that it’s a good idea to re-use material where possible rather than re-creating it, never mind if they also need a long course in Intellectual Property Rights to understand what they can and can’t do with it. Each of the qualifications to a simple CC:BY adds extra questions that the potential reuser needs to think through.

We can dismiss ‘No-derivatives’ fairly easily: it’s an explicit barrier to remixing or editing. As a potential user, you have to think about things like how much you’re allowed to quote/reuse as fair use/comment. And if you are prepared to simply copy it verbatim, what constitutes verbatim? What if you change the font? Or print it out from an online version? Put a page number, heading or links to other parts of your course at the top or bottom? Can you fix a glaring and misleading typo?

‘Non-commercial’ is also full of tricky questions. Most universities are not commercial for these purposes … except not all university activities are covered. What about using it on a website with ads on it? Like, say, your personal academic blog that’s hosted for free in exchange for unobtrusive text adverts? What about a little ‘hosted for free by Company X’ at the bottom? A credit-bearing course where all the students are funded by the State is clearly not commercial in this sense … but what about one where (in the UK context) they’re all full fee-paying foreign students? Or a CPD-type course where there’s no degree-level credit and the learners all pay fees?

‘Share-alike’ means you have to worry about whether the system you’re wanting to use the material on allows you to use a CC licence or not. Does, say, your institutional VLE have a blanket licence that isn’t CC-SA compatible? And what if you want to, say, produce a print version with a publisher who (as most do) demands a fairly draconian licence?

For any given set of circumstances, there are ‘correct’ answers to most of these questions. (And they’re certainly not all ‘No you can’t use it’ in many situations that obtain in universities.) But you need to be pretty savvy about IP law to know what they are. And even then, a lot of it hasn’t been tested in the UK courts yet, so you can’t be certain. Worse, what you want to do with the stuff when you’re reusing it may change in future – you might start off making a free online course, but then it might take off and you want to produce a book … but you can’t because some of the stuff you used had NC attached. Or you might want to transfer your free non-assessed online course to a more formal for-credit version in your University on the institutional VLE … but you can’t because some of the material had SA attached.

You can be a lot more confident about future flexibility if you stick to CC:BY material, and there’s a lot less to worry about whether you’re doing it right. So my view is that if you want to release material to be re-used as widely as possible, CC:BY makes your potential audience’s life much easier.

Complete public domain release would – on this argument – be even better, except that as an academic, I see attribution as crucial and fundamental, so I can’t let go of that!

I’m not overwhelmingly ideologically committed to this position: it’s very much a pragmatic view of what is most likely to get the best outcome. I certainly don’t dismiss the counter-arguments about the dangers of commercial, closed pressures: they are real. But I think on the balance of probabilities that the ease-of-reuse argument outweighs those, and CC:BY is usually the licence of choice.

Information Use on the Move

Another IET Technology Coffee Morning, this one presented by Keren Mills, from the Open University Library.

Keren spent 10 weeks at Cambridge through the Arcadia Programme, funded by the Arcardia Trust. It’s a three-year programme in to improving library services, especially moving research libraries in to the information age. She wanted to find out what people actually wanted.

When you talk about mobile libraries … people think about vans full of books. But widespread perception that mobile internet is slow and expensive.

Students are in to texts, though – 58% of OU student respondents to Keren’s survey already receive text alerts (and continue to receive some) from their bank or whatever. A student services pilot in sending texts was successful, sending prompt SMSs to students to remind them about study, upcoming TMAs, and so on. Students felt the university cared about them and were thinking about them – even if they didn’t need the reminder they appreciated the communication. Feedback survey showed most students wanted exam date notification and results.

Mobile-friendly websites: AACS noticed people using our websites using mobile devices. 50% of student respondents access mobile internet via their phones; 26% once a week or more. Very little interest from Cambridge students – might be younger than OU ones (on average) but they’re local to the University.

The perception is that mobile browsing is expensive – it’s better than it was, but still costs. Some better than others – Virgin currently cap 3G data at 30p/day for up to 25Mb.

Only 26% of student respondents have downloaded apps to their phone and would so so again – higher than for overall, but not much. iPhone might be changing that. (E.g. app being developed by KMi – the Virtual Microscope project and some others.)

Use of media on phones – students view photos most (75%)! Staff listen to music more (60%), and have more podcasts/journal articles/e-books exposure. Students don’t, probably because we don’t prompt them to.

(An interesting discussion ensued about authentication to get access to e-journals.)

OU Library have been working to make their site more mobile-friendly. They’re using autodetecting reformatting software, which tries to suss the resolution, strips out the pictures, and reformats it. It’s the same content, navigation and so on.

Students were particularly interested in location details and opening hours, and being able to search the catalogue. So they’re trying to make that easier. Moving towards a more CSS-based system in the future.

Safari – information skills site – has recently been overhauled. Developed some mobile learning objects for reinforcement and revision – cli.gs/mSafari. Using their LO generator developed in-house.

Also – iKnow project – mobile learning objects, currently under evaluation.

About 33% of OU respondents have used text reference services (e.g. rail enquiries); a further 26% said they might, having heard about it through the survey.

General pattern of increased interest among OU students than others, probably because of our distributed area.

There are a range of mobile devices and emulators available in the Digilab.

Discussion

The autodetect and reformat software doesn’t work well with mobile version of Safari – so the Library site treats iPhones and iPod touches as ordinary browsers. Best practice is to give people the option of using mobile or standard version.

Digital Scholarship Hackfest

A bunch of us got together yesterday and today at Bletchley Park for a Digital Scholarship Hackfest: Martin Weller, Will Woods, Kev McCleod, Juliette Culver, Nick Freear, Antony McEvoy (a designer newly joined Other Parts of the OU spending a couple of days with us), and Lynda Davies.

The setting is great for doing techie things, although I always feel slightly awed being at Bletchley Park for work. When I’m just in tourist gawking mode it’s fine, but when I’m doing techie things I always feel a bit of an impostor. Alan Turing and the other wartime codebreakers were doing really really clever stuff, and here I am trying to do things in that line … it’s a tall order to live up to. The place was open to visitors while we were working, and we fondly imagined that the tourists peering in through the window would think that we were hunched over our laptops (almost exclusively Macs) were busily engaged in modern day codebreaking.

The one major downside to the venue was the terrible wifi. It was desperately slow, and not up to supporting a roomful of geeks. I’m sure it’s been better at other meetings I’ve been to – but it may be that something special was laid on then. It was just enough to keep us going, but I think we’d have been a lot more productive with more.

Digital Scholarship

The Digital Scholarship project is an internal one, led by Martin, with two aims:

Get digital outputs recognised. (Martin’s posted about this sort of stuff already)
Encourage academics to engage with new types of output and technologies.

There’ll be a programme of face to face events and sessions, but we want an online system to help support and encourage it all, and that’s what we’re here to do.

Principles:

Based around social media principles
Easily adaptable
Inforporate feeds/third party content easily
Not look and feel like a (traditional!) OU site

On that last one, we don’t want it to feel like a carefully honed, current-standard, usual-topbar OU site. But we do want it to look like what the OU might become – what we’d like it to become – in the future.

The audience is OU academics (and related staff), but we (may?) want to make it more open to those outside later.

What we did

We spent the first day thrashing through what we meant by digital scholarship, and what the site might do for users, and what we could build easily and quickly. We spent the second day getting down and dirty with building something very quickly. I say ‘we’, but it was mostly the developers – Juliette and Nick – plus Antony, our designer. Martin and I floated around annoying them and pointlessly trying to make the wifi work better.

Sadly, Martin rejected my offer to invent a contrived and silly acronym for the project, but my suggestion to call the site ‘DISCO’ (for DIgitial SChOlarship) seemed to be reasonable enough to run with. We had a bit of light relief thinking about visuals for the site – all browns and oranges and John Travolta pointing at the sky – but I suspect Antony was too sensible to take on our wackier suggestions, and the final site will not feature a Flash animation of a rotating glitter ball in the middle of the page sending multicoloured sparkles all over the screen.

Digital Scholarship Profiles

While we were beavering away, Tony Hirst (armed with a better net connection, no doubt) was musing on what we might be measuring in terms of metrics. Well, here goes with how far we got.

One aspect of the project I worked on particularly was a Digital Scholarship Profile (a Discopro?). The idea of this is some way of representing digital scholarship activity, and working towards some metrics/counts.

What we want to be able to do is to show – for each person – the range, quantity and quality of their digital scholarship outputs.

This would serve several purposes. Firstly, it’s a stab at representing activity that isn’t captured in conventional scholarship assessments. Secondly, by showing the possibilities, and letting you link through to see people who’ve done good stuff, you make it easier for people to develop in new ways.

We could show, for each area of digital scholarship output, what each person was doing, and how ‘good’ that was (more of which later). On your DISCO profile the site would show your best stuff first, and as you went down the page you’d get to the less impressive things, and then (perhaps over a line, or shaded out) would be the areas where you had no activity yet. For each area, we’d have links to:

suggestions for how to improve in that area (and on to e.g. Learn About guides)
links to the profiles of people with very ‘good’ activity in that area
information about how we do our sums

Of course, metrics for online activity are hugely problematic. They’re problematic enough for journal articles, but at least there you have some built in human quality checking: you can’t guarantee that a paper in Nature is better quality (for some arbitrary generic definition of ‘quality’) than one in the Provincial Journal of Any Old Crap, but it’s not a bad assumption. And any refereed journal has a certain level of quality checking, which impedes the obvious ways of gaming the metrics by simply churning out nonsense papers. (Though I wouldn’t claim for a moment that there has been no gaming of research metrics along these lines.)

How do you measure, say, blog activity, or Slideshare? You can get raw production numbers: total produced, average frequency of production, and so on. However, there’s only negligible bars to publishing there, and any half-techie worth their salt could auto-generate semi-nonsense blog posts.

But this is relatively straightforward to measure, and nobody in academia is going to be so stupid as to simply equate quantity of stuff produced with quality, so I think we can do that without too much soul-searching.

How can we assess quality? One approach would be to take a principled stand and say that peer review is the only valid method. This view would see any metrics for academic output as irretrievably problematic at best, and highly misleading at worst That stance is one that might appeal particularly in disciplines which outright rejected a metrics-based approach for the REF. The downside, of course, is that peer review is hugely expensive – even for the most selective stuff academics do (journal articles), the peer review system is creaking at the seams. There’s no way that we could build a peer review system for digital scholarship outputs.

There are, however, some – very crude – metrics for assessing (something that might be a proxy for) quality of online resources. You can (often) get hold statistics like how many times things have been read, number of comments made, number of web links made to the resource, and so on. As with the production numbers, you can game these up. It’s not entirely trivial to do more than a handful – but most academic blog posts (in the middle of the distribution) will be getting handfuls of pageviews anyway, so getting your mates to mechanically click on your posts would probably have a noticeable effect. And these are proxy measures for quality at the very best. The sort of stuff that’s likely to get you large amounts of online attention is not (necessarily) the sort of stuff that is of the highest academic quality. I can guarantee that a blog post presenting a reasoned, in-depth exposition and exploration of some of the finer points of some abstruse discipline-specific theory will get a lot less views than, say, a blog post promising RED HOT PICS OF BR*TN*Y SP**RS N*K*D, for instance. Less starkly, the short, simple stuff tends to get a lot more link love than long, heavyweight postings – which is, alas, an inverse correlation with academic rigour (though not a perfect one).

There’s also an issue that statistics in this domain will almost certainly be very highly unequal – you’ll get a classic Internet power-law distribution, where a small number of ‘stars’ get the huge overwhelming majority of the attention, and a long tail get next to nothing. We can probably hide that to some degree by showing relative statistics – either a rank order (competitive!) or perhaps less intense by showing, e.g. quintiles or deciles, with some nice graphic to illustrate it. We mused about a glowing bar chart, or a dial that would lean over to ‘max’ as you got better.

This is an experiment, and we want to explore what might work, so we don’t have to solve this problem. And in a two-day hackfest to get something going, we’re going to shortcut all that careful consideration and just see what can be done quickly – knowing that we’ll need to tweak and develop it over time. Or even through it away entirely.

So what could we possibly measure, easily?

The model we’re running with is that there are several categories of things you might produce (research papers, blog posts, photos, etc), and for each category, there’ll be one or more service that you might use to host them – so, for instance, you might put presentations on Slideshare or on Prezi.com. And then for each service, we can measure a whole range of statistics.

Here’s an outline of what we’re thinking:

Categories:

Research paper repositories: Open Research Online and/or other institutional repository, subject-specific repository, and so on
Learning resources: repositories – e.g. OpenLearn, MERLOT, etc etc
Documents: Scribd, Google Docs, OU Knowledge Network
Websites: Wikipedia (contributions – not your biography for fear of WP:BLP), resources, etc
Blogs: individual personal blogs, group blogs (could get feed for each one), etc
Presentations: Slideshare, Prezi, etc
Lifestream: Twitter, Tumblr, Posterous, FriendFeed, etc
Audio/video: podcasts, iTunesU, YouTube, Blip.tv, etc
Links/references: Delicious, Citeulike, Zotero, etc
Photos/images: Flickr, Picasa Web Albums, etc

The idea for these categories is that they’re a level at which it makes some sort of sense to aggregate statistics. So, for instance, it makes some sense to add up the number of presentations you’ve put up on Slideshare and on Prezi … but it probably makes no sense at all to add up the number of photos you’ve posted to Flickr and the number of Tweets you’ve posted on Twitter.

Statistics – production statistics:

Count of number of resources produced
Frequency of resources produced (multiple ways of calculating!)

Statistics – impact/reception statistics:

Total reads/downloads/views of resources (sum of all we can find – direct, embed, etc) (also show average per resource)
Count of web links to resource (we generate? via Google API)
‘Likes’/approval/star rating of resources (also show average per resource)
Count of comments on the resource (also show average per resource)

Statistics – Composite statistics

h-index (largest number h such that you have h resources that have achieved h reads? links? likes?)

I really quite like the idea of tracking the h-index: it takes a bit of understanding to suss how it’s calculated, s0 not everybody instantly understands it. But it’s moderately robust and it’s a hybrid production/impact type statistic. The impact component needs a little thought, and it might well vary from service to service. There’s less symmetry in online statistics than there is in citations: if you get a few hundred citations for a paper, you’re doing really very well, but it’s not that hard to get a few hundred page views for an academic blog post. A few hundred links, however, might be equivalently challenging.

We’re imagining some sort of abstraction layer for the profile, so we can plug in new services – and new categories – fairly easily. One key point we want to get across is that we’re not endorsing a particular service or saying that people ought to use them: we’re trying to capture the activity that’s going on where it’s going on.

We’ll need to keep a history of the statistics, and also careful notes about our calculation methodologies and when they change (as they no doubt will). Nice-to-have down the lines features could then include graphs, charts, trends (trend alerts!) and so on.

There’s no way that we can get all of these things up and running in two days of hacking – highly skilled as our developers are. So we’re going for a couple of example ones to get the idea across, and will add others later.

We want to produce feeds of all this stuff and/or expose the raw data as much as possible. But again, that’s one for later rather than the proof-of-concept hack we’re putting together just now.

Sadly, the wifi connection at the venue was a bit flaky and slow, so we did the hacking on local machines rather than somewhere I can point you to right now – but expect a prototype service to be visible soon! Unless, alas, you’re outside the OU … one design decision we made early was to keep it behind the OU firewall at least initially until the system is robust enough to stand Internet scrutiny – both in terms of malicious attacks, but also in terms of getting our ideas about what this should be thrashed through.

There’s the eternal online educational issue of open-ness versus security: making things more open generally makes them more available, and (with the right feedback loops in place) better quality; but on the other hand, people – especially people who don’t live in the digital world, like our target audience – often appreciate a more private space where they can be free to take faltering steps and make mistakes without the world seeing. We’re trying more up the walled garden end to start with, but will revisit as soon as the site has had more than two academics look at it.

Next steps

We didn’t quite have a working site when we finished, but ended up with this list of things to do to get the site up and working:

order URL (disco.open.ac.uk?)
get Slideshare embeds working (problem with existing)
put on server – integrate design, site (Juliette), profile (Nick)
integrate with SAMS
finish coffee functionality – Juliette
finish barebones profile functionality – Nick
allow users to add link (in Resources)
check of site (and extra content added) by Martin
put ‘alpha’ on the site

And this list of longer term actions:

support
extended profile/statistics – API/feed/data exposure
more integration with OU services
further design work
tag clouds / data mining
review of statistics/profile
review the (lack of) open-ness
get more resource to do more with the site

For now, though, the best picture of the site I can give you is this:

(There’s more photos of our flipcharts and the venue in this photoset on Flickr.)