LAK11: Learning Analytics And Knowledge, Banff

I’m at the 1st Learning Analytics and Knowledge conference, at the Banff Centre in Banff, Canada.

Today – Sunday 27th February 2011 – is the Pre-Conference Workshops . It’s being streamed live. The hashtag is #LAK11, which is also the tag for the pre-conference open course on learning analytics.

We are right in the middle of the Rocky Mountains, and the views all around are stunning.

This is the first of (probably) six or more liveblog notes – my plan is one per half day.

Overview – George Siemens

There are many definitions/understandings of learning analytics. George is keen to use learning analytics to create an adaptive curriculum, using technologies like semantic web, linked data.

Also keen on a multidisciplinary approach.

A commercial announcement: conference sponsors. TEKRI – are exclusively online. They have the data that many universities will have in five years time – what each student clicks, what they engage with, who accesses recordings, much data. Puts education systems ahead of traditional ones. Trailblazing around learning analytics will come from online universities, but traditional sector has opportunities too. Universities need to become more intelligent, and the process of learning needs to become less of a black box.

What are Learning Analytics?

An example of why there’s an interest. Heritage Healthcare organisation – wanted to ask, who is going to be admitted into a hospital? We’ll give you $3m if you can answer. (Like Netflix prize) That would be a fraction of the billions they could save if they could identify people who might be admitted. What would you want to know?

Suggestions from the floor: Some kind of health data trail from their past. Dietary patterns. Population, epidemiological data. Exercise. Income and education data. Load and capacity data on the hospital. Distance of the people from the hospital. Unemployment.

This is why analytics can be very complex. Each new data point amplifies the value of the whole dataset. E.g. unemployment within a region – might mean higher issues if higher, link to diet. The other ones build up a potential pattern structure – you could extract meaning from the data. With just a few, it’s guessing; with a large enough dataset – e.g. big data, scientific data – you can begin to look for patterns and correlations you couldn’t define or explore in a traditional hypothesis testing model.

In education, you could ask any number of questions. Which students are going to fail? Could say, we have this attrition rate, the peak is at 4 weeks, then it stabilises – could get aggregate data. But doesn’t tell you about individuals. Analytics – need masses of data to spot patterns; but also need specific information about the individual. A potential dichotomy. Who is going to succeed, and fail? Not just percentages, individuals.

Another example – healthcare – life insurance. Marketing data can predict life spans for life insurance to assess individual’s health risk. (Car loan companies pay to get hold of your credit history, and so on. Each new node amplifies the value of the whole dataset.) Things like home ownership, credit history, length of commute (inferred) – can get from traditional data sources. Also stuff you could pick up from social media – e.g. food, TV viewing. History of purchasing e.g. diet and weight loss supplies. So company might select who to invite to apply for insurance.

Concern of insurance company is that doing blood test is expensive. Creating it themselves costs large sums; getting it from existing sources might be much cheaper.

Ethics isn’t big on the agenda at this conference, but will likely be a huge question in the future. Initiatives like Microsoft’s Azure marketplace – proprietary data as well as free/open datasets. Data is becoming a commodity. Infochimps is another example of an online data marketplace. Also OECD, World Bank making their data publicly available. Also StatsLink showing you the original source data for a diagram or chart in reports.

There is a growing employment market for data analysts and similar things. In education in general, his advice is: go data. Not necessarily be a statistician, but need to be able to structure and make sense of data. Provocative statement: from the trends in the past, analytics will have a much more profound and dramatic impact than any other technology in education in the last 15 years. Because it runs the gamut, from senior decision-makers in universities, through Deans, down to faculty members and students. And informal learning as well.

Background to analytics is four trends:

1. Business intelligence. Sociologist in the 40s – we need to understand who talks to whom, about what, and to what effect. That’s what we need to know too. IBM is a corporate leader in effective use of analytics – e.g. in email analysis, who’s connected in what projects. Changing the way they work through use of their online collaboration systems. Also supply-chain value analysis.

2. Data-driven decision making in the public sector, and call for increased accountability for education, from K-12 to university. We want the data to support decisions. Tied to international ranking systems.

3. Technological change and increase in: scale (e.g. Map/Reduce, Hadoop), computational power (speed), digital data (data exhaust, many activities happen online already). Also sensors/Internet of Things.

4. Quantification of business and society. Pronounced after WW2, Taylorism, Ford, McNamara: not just managing how you want to do it, but getting data and making decisions based on that. But counter-example of LTCM – best data, algorithms and brain-intensive organisation, but misjudged human elements. Education is 30 years late to the quantification party.

Academic analytics – helps address the public’s desire for institutional accountability with regard to student success, given the widespread concern over the cost of higher education and the difficult economic and budgetary conditions.

Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. – Definition from online LAK11.

Most commonly-cited illustration is the Signals project – purchased by SunGuard. Blackboard just announced BlackBoard Analytics. Also Desire2Learn. Aiming to increase value of LMS. An LMS is hard to sell, have to sell a connected, integrated structure. Aso IBM Darwin. Texas A&M analytics gone awry – want to quantify their faculty: how many students do you teach, how much does it cost to keep us here? But some faculty, best known, are deemed as being a net loss to the university – WSJ had a good article on it. UK also has analytics model gone awry in the shape of the RAE and REF.

Knowledge analytics – inseparable from learning analytics. Advanced approaches (text/data mining, information retrieval machine learning, linked data), for processing data to provide representations in forms from which conclusions can be drawn in an automated and domain-aware way.

Presents a model: Learners off-put data – anything that’s digital. Intelligent curriculum – bring in data mapped to a knowledge domain: what concepts a student needs to have mastered to pass a course. Then combine those in analysis, connect with their profile – inferred from existing data. Then make predictions – then what do they need to get further help. The prediction leads you to adapt, personalise, intervene.

Martin Weller: Ethics thing. What’s the purpose of these things? This is stuff being done to the learner, not for the learner. You get different reactions depending on the purpose. We may defer the ethics question, but not the purpose.

George: One challenge is some individuals think this is the next step in industrialisation of education – one more structured process. So here have a combination of multiple disciplines – social sciences/humanities, computer science, organisation leaders. Criticism is pronounced. Do you sit and rant, or engage and try to shape.

Martin: There is an Orwellian interpretation. But we can define our principles – e.g. analytics for learners, rather than analytics of learners. We can say what we think is good. When you measure, you start to change behaviour. If it’s to get benefit for themselves, they may be more honest.

George: A good distinction – analytics for rather than of learners.

Dragan Gašević – Semantic Technologies and Learning Analytics

The social web is here! So what now?

He’s not a big user of social networks – can’t keep up with the huge volume of data. A key question is how to deal with these huge volumes of data. One possibility – use of semantic technologies.

Semantic technologies are not exclusively semantic web. Aim is to create a universal medium for the exchange of data.

The links are made using an ontology. A broad definition – basic structure or armature around which a knowledge base can be built. Can be costly to build. And it’s defined using RDF.

The Semantic Web is not opposite to the Social Web. They can’t exist separately.

Linked Data. DBpedia is Wikipedia transformed in to RDF – used in real application. And end in SPARQL end points, enable data mash-ups. Web services standards – SOAP or RESTful. Example – BBC Linked Data. Wildlife Finder.

Chris: Semantic web, social web tie together. Another entity: no people at all, e.g. LSA, completely automated, scraping, making deductions. How do those tie in?

Dragan: Will touch on some of those points later. Deep learning analytics require e.g. discourse analysis.

Learning Ecosystems

Today’s idea of learning is like a pizza delivery model.

Ecosystem automated – authoring tool, repositories/reusability, packaging, to provide some learning and collaborating experience. But not just LMS – more tools and services too.

Learning (life) is not in a box – it is an ecology of systems. A software engineer’s life covers a huge range of tools and services. Blur boundaries between personal and social data – these isolated islands are not connected, it’s not ecological.

Semantics and learning analytics

A crazy problem requires a crazy solution (Griff Richards) – we have lots of things coming from lots of different places. Answer: Learning Context Ontology – LOCO. A learning activity, content associated, and user -> profile. Integrates several ontologies – user model, content type, content structure, learning design, and domain ontologies. Domain ontology no longer a block, thanks to DBpedia.

Education Social Semantic Web – was called educational feedback, but is analytics. Combines a content packaging tool/LMS, connected to an RDF-based semantic data of the learning objects. Built a tool called LOCO-Analyst. Visualisation of when students are learning particular concepts. And some social graphs, who is interacting. And finally, leveraged collaborative tagging to see how much of the whole group learning an individual knew. Best predictors of perceived quality of learning analytics tool for educators: information about interactions of students; students comprehension of content; and intuitiveness and ease-to-understand.

IntelLEO – Harmonization of personal and organizational goals. System where you create your learning goals, can see mappings between them and learning activities.

Challenges

Some clues – little semantics goes a long way. And DBpedia has big promise.

Open challenges around providing learning analytics for students – personal planning, etc. For organisations. And discourse analysis.

Ubiquitous learning analytics – tool and format independent. And visual learning analytics.

Workshop on Linked Learning coming up, and special issues coming up.

Final point:

Learning analytics will be more meaningful and ubiquitous with semantic technologies.

Steve, CMU: Evaluation of LOCOanalyst, perceived utility – social interaction are major predictors of it. Seems very plausible – if people interact a lot with the online environment, they’re receiving utility. How to perceptions of utility link to learning? Is it a predictor of learning?

Dragan: Good question. This was perceived by educators, not students. The tool was for educators.

Katrien Verbert – Research and Deployment of Analytics in Learning Settings

From ARIADNE/K.U.Leuven.

1. Visualizing activities for self-reflection and awareness.

Online, have less face-to-face interaction. Hard to get feedback on how you’re doing. So visualise data that we track – activities in online learning activities, have tools to support self-reflection and awareness.

Self-monitoring for learners; awareness for teachers; learning resource recommendation.

Tool visualising data from LAK11 online course preceding this conference – online here http://ariadne.cs.kuleuven.be/monitorwidget-lak11/

Someone from U Phoenix: students really like seeing how their stuff compares to others – can only show the mean/average, because of privacy. But they’d like more than that.

Katrien: Have tried anonymising by showing the lines for individuals, but without the names.

George: How real-time is the data?

Katrien: It’s pre-loaded at the moment. Done every day, mainly for performance issues.

Someone: Time – distinguish between active time and time getting a coffee?

Katrien: No, these are estimates. If they’re downloading a document, or viewing a page, can see that, but they

Someone: Integration with keystroke analysis?

Katrien: No.

Someone else: Only on the Moodle? Idea of MOOC is it’s all round the web.

Katrien: I’ll come on to that in later slides.

Someone from U Saskatchewan: Would be interesting to compare altruistic learning scenarios – I just want to learn; versus degree credit where grades are on a curve where you want to be ahead of the others. Especially in STEM classes, e.g. pre-med students, very competitive.

Katrien: Would be interesting to evaluate.

Griff from Kamloops: A mesh of the knowledge, let people check themselves off, see where the knowledge is, who’s absorbed. This is participation, doesn’t tell you about learning. perhaps some self-evaluation. A map of the content, the new ideas, would be useful.

Dragan: Agree with Griff. On regular courses, often don’t use forums.

Someone (?from Gates Foundation): We’re doing work with Centre for Communicty College Student Engagement (and another) – an interview with students, self-reports of what they did, how they were involved – matching up outcomes with activity. Also survey of faculty – when students need intervention, what are you referring them to do. Going to match up those and see how they’re aligned.

They use Contextualised Attention Metadata (CAM) data – deployed in ROLE PLE, RWTH-Aachen engineering, Moodle.

Evaluation results – usability and user satisfaction, 12 CS students, task based interview with think aloud (after 1 week), user satisfaction (after 1 month). They understand the visualisations, but many issues uncovered. Want more input on how to improve them.

Questionnaire at http://bit.ly/laksurv

They can put your data into the tool from other systems – they can transform your data, put it into their tool. Keen to evaluate their work, get in touch.

2. dataTEL initiative

EU work collecting, sharing data so can do research on how to modify or customise existing recommendation engines for learning. Experimenting, need large datasets.

Theme Team within STELLAR network of excellence (EU project).

Explore questions around sharing data sets – including privacy and legal protection; and expertise in collating datasets and working with recommender systems.

Issued a call for TEL datasets – http://bit.ly/ieqmWW – got several, including Mendeley, Melt

More to come on Tuesday.

3. ROLE project

Visualised usage of widgets and services in PLEs.

ROLE – responsive online learning environment – empowering learners to build their own PLE. Open learning environments, responsive. Tools to support awareness and self-reflection.

Created many widgets for learners to assemble to make their own PLE, and then generated dashboard showing their usage, so could get insights in to which are used, and how the use develops over time – e.g. used at the beginning, then another takes over.

Slides will be here: http://www.slideshare.net/kverbert

Tony Hirst – Messing with Data

Data in the general sense – not just learning analytics.

Key question – how can non-programmers do the things that we might normally think we need a developer to do? Lowering the barriers to e.g. producing interactive visualisations.

Data is a dish best served raw. If tabulated in a PDF, it’s essentially impossible to do anything with it without copying it out by hand to extract actual numbers. Wants bits you can manipulate at will

Process: Discovery – challenge to find it, in a format you can use. Acquisition as data (not, e.g. PDF). Then representation, cleansing (tidying, making it consistent, correcting errors), and (visual) analysis. In a cycle.

Good way of getting data is through APIs. Many social networks are releasing these. Tony plays with Twitter data a lot. (Twitter API, but also Google Social API which doesn’t require an API key and lets you hammer on it harder.) Others include screenscraping – loading the page and extracting the information from it, e.g. grabbing data from HTML tables. ScraperWiki as a tool. Also, document formats that are harder for humans to read e.g. XML.

Representation – can be structured or unstructured, human readable (text) or opaque.

Examples of ways of interacting with data and getting it flowing through tools.

Google Spreadsheets – can use it as a database and as a screenscraper. A mash up built around the Winter Olympics last year. Found a table of the medals awarded by country, as an HTML table on a wikipedia page. Use ‘import HTML’ function in to Google Spreadsheets – say whether to grab a table or a list, identify the table by number (e.g. 3rd table, 1st list, etc). And then can run queries using the ‘QUERY’ command in Google Spreadsheets, mapping it to the ISO country codes. Then use with the Google Countries Heat Map widget. Had to map from 3-letter to 2-letter codes. Then can publish the data from this as a CSV.

Then can use it using e.g. Yahoo Pipes, that can publish CSV. So could grab RSS, feed through Yahoo Pipes, then export as CSV to Google Spreadsheets, and process it there. Yahoo pipes can map a CSV with Lat/long in to KML formatting, for Google Earth, so you can generate maps automatically.

JSON format very popular, consumed by libraries that need programming skill.

Another tool – IBM’s Many Eyes. Takes in CSV data – can cut and paste from a spreadsheet in Many Eyes, many interactive chart options. Simple line-based charts, line graphs, histograms, can choose which columns, and so on. Also Tree Maps – representation of hierarchically-structured data. You don’t have to work out the hierarchical structure, just past in the CSV. Can get matrix charts, and others. Are all interactive, create a URL where you can embed the interactive widgets in your own pages so other users can interact with the data, and generate further charts.

Visualisation of networks – appear in many strange places. E.g. maps from the OU’s Linked Data source – data.open.ac.uk. For each course, can see courses related to it (courses that people who took this course also took). Took a dump from the SPARQL end point, generated a map, can zoom in. Took SPARQL data, transformed it in to CSV-like format – two columns: course, and course it’s related to. Loaded in to Gephi – a tool that lets you analyse networks. The OU course catalogue can be hard to navigate to see what’s related to what – but can see it as a network, a macroscopic view, you as the user can explore and interact with it.

Another example from BBC Linked Data, information around programmes. One example is In Our Time, which features academics from institutions. So Tony scanned for OU academics on these panels – not in structured/linked data form.

Griff: Can you correlate signups for courses with these public announcements on programs?

No. This program is a BBC production, but the OU doesn’t have direct involvement. The OU does with other productions. There, the traffic is watched to see OU traffic coming round . OU website uses site analytics, watch people moving through, to see whether they go on to request a prospectus and then sign up to a course.

Griff: Someone could say, I did this programme, led to this number of signups …

Almost there, but not quite.

Informal social networks. Extent to which users use consistent usernames across services, e.g. Twitter, delicious, and so on. Knowing one username makes it possible to identify others. A map between different networks – delicious, Twitter, github. Look up your Twitter ID, find the hashtags and URLs you’ve used. And then match hashtags to URLs. Including tags, and other people’s tags. Can discover all sorts of useful information – identify projects, users, and so on. Mining across networks.

Graph generated using GraphViz, using a templating language called dot.

Resource discovery in special interest areas. Hashtags on Twitter show who’s interested in it. Has created a mapping of people who’ve used the LAK11 hashtag, mapping the friendship relationships. http://www.flickr.com/photos/psychemedia/5482891802/

More general map – TweetMinster following social media behaviour of UK politics people – media, politicians, and so on. Gephi map of links between the clusters – they map on to the political parties and media groupings.

Further work – putting the data to work using a Google Custom Search Engine and blogroll. Personal profile data from Twitter includes a URL, can grab that list of URLs, dump in to Google CSE. From LAK11 hashtag, get homepages from those people, dump in to Google CSE – can search people who have expressed an interest in learning analytics.

Delicious lets you pull down the last 15 URLs tagged with a resource. So then can pull down the other tags on those URLs, and map the co-tagging.

A service that lets you see who’s tweeted a URL. Take that list, then grab list of all their followers – followers of three or more people who tweeted a link. Who had seen it about three times or more? That was the overview, get a sense of the audience size on Twitter.

Issue around cleansing data. Useful tool is Google Refine. You load in a CSV file, it lets you look for things within a certain string distance of each other – so e.g. ‘PA CONSULTING’ and ‘PA CONSULTING GROUP’. Helps with stray commas, semicolons. Identifies potential errors, can clean them for you.

Also Stanford Data Wrangler. Lets you put in less structured data, not quite CSV, then get it in a tabulated form. Can click on a row, then gives you options for things to do – and shows you results before you do it – e.g. delete all empty rows, ro delete rows where that column is null, fill from the left, and so on.

Finally: Google Analytics. Tool for tracking website behaviour. Can export data, so can do your own further work.

All written up on ouseful.info

Griff: Have you been contacted by the Egyptian Secret Service?

You feel what it’s like playing with this data. On one occasion have explored individuals, can feel quite uncomfortable, see patterns that you may or may not want to know – people they are connected to. No, they haven’t contacted me, but I have looked at those tweets – and you worry about invasion of privacy, mining structure out of what was originally amorphous. Friends-of-friends can seem quite intrusive. We can avoid this question. Google Analytics gives Google information about where your users have gone on your site – and Google knows who your users are.

–
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow