LAK14 Tuesday pm (4): Machine learning workshop

Liveblog from the second day of workshops at LAK14 – afternoon session.

Machine Learning & Learning Analytics Workshop – afternoon session

Silk Factory Machinery


Discussion of papers in previous session – small groups

Three themes:

  • Data driven vs theory drive and role of ML
  • Moving focus from learning outcomes to learning process
  • Incorporating pedagogical and contextual information into ML models

My group: on process.

ML models predict pass/fail; more interested in figuring out which ones I can’t predict. Factors you can do something about vs ones you can’t. Sensitivity analysis? Changing the pattern of activity. Metacognitive ability. Is directing a student to an effective resource the same thing as a student finding their way to it? Possibly not (underlying variable of learning ability). Exogenous variables too, hard to know what to do with it.Prior knowledge. ML still heavily content-centric; not so much what students’ current knowledge graph is; maybe start with better knowing the learner better. Need to get in to more detail of the learner’s process. What if the process is the outcome? What do you tell students to do when they go to a resource? Not just individual process, but the social one?

Feedback from groups:

We haven’t hit the goldmine yet – what’s the reason? Process vs outcomes – what scope of outcome? Learning outcomes are the same as learning processes, just at a different level of granularity. (!) Behaviourism, ways of perceiving things.

Phil: Learning styles! [was mentioned in feedback] It’s clear people believe they have them, also clear that what they believe has little correspondence to what they do or achieve. So phrase it in terms of what people believe about their learning styles, and how that may affect performance.

Chris: Learning styles for a digital age: clustering activity of student behaviour, say these people are alike.

Doug: Approaches to study, and how they are contextual, not a stable orientation across contexts. But great to do data-driven empirical enquiry.]

Q: It’s widely know, very actionable. Does LA community imagine something that could reach the same level of success, about the wide range of knowledge?

George: This is things to think with, like digital natives stuff. Once you’re done thinking with it, it’s a limiting factor. Developing and advancing the learner. Learning styles research, you play to strengths. The ideas of improving instruction, advancing learners, as long as its data driven and we get feedback on how well we’re doing it.

Chris: Will LA and DM lead to theories that are like learning styles in that they are broadly applied and understood. Not necessarily learning styles. But a lot of LA couches things in current educational thought and theories; but could use analytic qualitative methods like grounded theory. Even if implication for research is we’ve shown this works in the MOOC, we build a theory from the data, rather than pulling in a theory from elsewhere and say partially confirms it – maybe it partially doesn’t confirm it.

George: Issue simmering, haven’t got in to it yet. Role of theory and data. Camp that argues that maybe we don’t frame our data exploration through a construct in advance of it, we devote more effort to what emerges. It’s discomfiting. Peter Norvig, power of data – the unreasonable effectiveness of data. Anderson in Wired, maybe the challenges of data/theory need to change to more data.  Still a bit far off from that, we’re still arguing about learning styles.

Q2: Do we need taxonomies of different types of courses. Learning styles, they’re so easy to interpret. If you can find a learning approach, take in to account all the contextual factors, the size of the course, do we have something like that.

Dragan: Lots of research on learning design, some people documenting it. Martin Weller presentation at MRI conference on coding types of MOOCs. Still not much understanding, capturing the pedagogical context, and intent, is guiding students. Data driven; educational evidence we have some stuff – learning styles the effect size is so small. Already shown it’s not effective. Metacognitive skills, motivational strategies, approaches to learning.

Q3: Pulling up learning styles is a straw man. Data driven is so much better! But the straw man sucks! One of the constraints of a data-driven perspective – seeing a goldfish and a dog, given instruction to climb a tree, see some of them are not doing it. If the goal isn’t effective, the data isn’t helpful about how to change it. There is a level of IDing educational theories not proven to be ineffective, that’d be (useful to deploy).

Automated Cognitive Presence Detection in Online Discussion Transcripts.
Vitomir Kovanovic, Srecko Joksimovic, Dragan Gasevic and Marek Hatala

Vitomir presenting.

Automating analysis of content of discussion transcripts. Specifically, cognitive presence – community of Inquiry framework. Goal to see how learners are doing, could be good to construct interventions, give feedback to learners on their contributions.

Asynchronous online discussions. Community of Inquiry framework – three constructs – social presence, cognitive presence, teaching presence. Cognitive presence is phases of cognitive engagement and knowledge construction. Well established, extensively researched and validated; content analysis (?by hand) for assessment of presences. Cognitive presence – “extent to which participants are able to construct meaning through sustained communication”.

Four phases – triggering event (issue, dilemma, problem); exploration; integration; resolution. Coding scheme for content analysis; requires expertise. Extensive guidance on the coding instrument, need domain knowledge too.

Text mining as an approach; text classification. Work on mining latent constructs. Most commonly used features, e.g. lexical and part-of-speech N-grams, dependency triplets. Commonly Support Vector Machines and K Nearest Neighbours.

1747 messages, 81 students. Manually coded by two human coders, 98.1% agreement, Cohen’s kappa 0.974. (Wow, that’s impressive for any human coding task.)

Same corpus. Feature extraction – range of N-gram techniques. Dependency triplets – captures connections across larger ranges of words. Also ‘backoff’, where you move one part of an N-gram to the part of speech – e.g. is to <verb>. Other features too – number of named entities, first in discussion, reply to the first. Linear SVM classifier. Java, using Stanford CoreNLP toolkit. Classification using Weka and LibSVM. Java Statistical Classes.

Results: got accuracies up to high 50s%; Cohen’s kappa around 0.4. Backoff trigrams were best; but they had lots & lots of features there. But entity count, is first, or is reply to the first were moderately good, and are only a single feature each.

Working on nested cross validation for model parameters. Best accuracy 61.7%, kappa 0.43.

Plans for future work too, to improve the accuracy. Move away from SVMs, not giving and clues to interpretation; good for classification but not interpretation, and can’t get probabilities for classification. Logistic regression, boosting, random forests, which features are important.

All the features are surface-based.


Dragan: Why did you remove the resolution phase?

When we removed the resolution phase, the focus of the course was on integration, we removed that one because it made sense. It was not really important in this context.

Q: Accuracy?

Yes, human coders had coded the messages, so could classification errors and kappa. Prior probabilities of each class as the baseline.

Developing predictive models for early detection of at-risk students on distance learning modules
Annika Wolff, Zdenek Zdrahal, Drahomira Herrmannova, Jakub Kuzilek and Martin Hlosta

Jakub talking.

Identifying students potentially failing, and give timely assistance. Using all available student data (Demographic, VLE clicks), predict result of next milestone or final result.

Problem specification: start with demographic data, during progress of the course, gather data from interaction with VLE, and scores from tutor marked assignments, and then have the final exam.

Found students who fail first assignment in fourth week has high probability of course failure (>95%). Have to start predicting before first TMA. (Aha! This is what I hadn’t understood earlier. Yes, if you fail first assignment you’re likely to drop out – but a lot of people drop out despite passing it, too.)

Demographic data – static – gender, age, index of multiple deprivation, new vs continuing, student workload during the course, number of previous course attempts. VLE data – from virtual learning environment. Currently one-day summary data, updated daily. Forum activity, Resource, view test assignment, web subpage, online learning materials, homepage activity.

Predictive model – need data from previous years to predict this year. Using three models – decision tree, k-nearest neightbours, naive Bayes.

Decision tree – CART – on VLE and demographics. K-nearest neighbours. 3 closest students from prev presentation affects final decision; applied separately on VLE clicks, and demographic data. Also naive Bayes, discretisation of continuous variables (AMEVA), assumes they are independent.

Results. 10-fold cross validation. Prediction for TMA2 in week before it (10w data). Precision 66%, 68% recall 53%, 55% – good for CART and k-NN on VLE; but k-NN on demographics is bad at this stage. naive Bayes 47% precision, 73% recall.

The models vote, and if more than two votes for a student, predict that they’re at risk, if not, say Ok. From the at-risk students, generate a list, student support team makes an intervention.

How to evaluate the predictions? How do the interventions affect the models for predictions?

Preliminary results: percentage students who were predicted to not submit who did submit (precision): Average error over each week was 7%±0.4%.

Have dashboard – OU Analyse. Several features.


Ugochi: Four models, but they didn’t all work – could throw away demographic, in the cross-validation. Then determine by voting – why give a weighting for that?

Now giving them equal votes, but Martin working on determining which models should have larger weight.

Martin: This was first step.

Ugochi: Why not drop the demographic?

Student support teams like the idea that you have this. We need to compute it anyway because they want to see students from previous presentation, so why not put it there.

Someone: Validation – have old datasets?

Yes. We have lists of students we predicted, we knew which in each week. We have students who submitted their first TMA. From those lists, can compute the error of our prediction.

Someone: Must have old datasets?

Cross-validation performed on old datasets. For current presentation we only have this so far.

Q2: How do you know after you make the intervention, how do you plan to evaluate it?

That’s the big question. We were thinking about some blind study, split students, but how can you decide – if you know that students were at risks, how do you choose which students you don’t intervene. We have data from previous presentations where it performed better.

Q3: How about two sections. One had the system, the other don’t, they don’t know about it. Will see what happens at the end.

Who should decide who will be in one group?

Q4: In for-profit ed we do this all the time. We also did red/amber/green – amber are the recoverable ones. Red and green, ignore. We do controlled studies all the time. We realised motive for us to prove this.

When we speak about this with teams, they say we’re crazy. Why should we not intervene with all of them?

Q4: It’s like for the greater good – like with pharmaceuticals. You have to do this.

I have my arms tied.

Ugochi: Can also do crossover design.

Q5: Problem with these models. We come up with the model, predicts, kappa 0.8, but student adviser, if we can’t interpret it to give meaningful strategy, you’re going to hit a wall.

Modelling student online behaviour in a virtual learning environment
Martin Hlosta, Drahomira Herrmannova, Lucie Vachova, Jakub Kuzilek, Zdenek Zdrahal and Annika Wolff


Martin speaking.

Primary goal, to find at-risk students, advise them of best learning steps. Paper goal is to understand factors of student behaviour in VLE that determine failure of success, and find a model that’s easy to interpret by course staff and tutors.

The data are the VLE activity and student assessments, didn’t include the demographics.

1st TMA is strong predictor of success/failure in the course P(FAIL|FailTMA1) > 95%. Activity in VLE before 1st TMA. So only modeling before the first TMA.

Secondly, need to understand the course, separately, because each of them is designed in a different way. Some fully online, some they get books by post.

The analysis process was taking a course, ID important activity types/content types on the VLE, then student behaviour modelling, gives a course fingerprint.

Important activity type identification is by Bayesian analysis – as time flows, there is a greater difference between success and failure. Forum activity (F), Resource activity (R), view test assignment (O – OUcontent), web subpage (S-subpage). Two models – GUHA, Markov chain based analysis.  General Unary Hypotheses Automaton. Discovery of new hypotheses from the data. Markov chain – stochastic, next state based only on current state; graphical model representation. Better interpretation for target user.

Two types of analysis – activity intensity in the week, and content type in the week.

So four important activities, identified through Bayesian analysis, gives sixteen possible states. Map journeys through those states. Show graphical form of paths through the states over time.

Two approaches. Graphical is more intuitive. For future, have cumulative states, and enrich the current ensemble predictor.


Ugochi: Seems like you use a categorical way of looking at VLE data. What if more continuous – e.g. amount fo resource vs forum.

With continuous, you have to discretise it. In prediction models we use, this is it. Either the binary information is important.

Jakob: We perform experiments with continuous data, shows us there is not much more information than in the week summary. Some only engage on specific days, get too much information.

Tea break
Machinery Hall 1

Small group discussions

  • Scaling up qual methods with ML
  • Prediction vs description with ML
  • Evaluating ML results and implications

My group: Evaluating base rates. Some with kappa over 0.8, which is amazing. How well do these models do over human estimates? Some people doing just this. Simon had a paper at LAK11; like these essay grading systems, how close to human raters. Challenge of human-human kappas being lower than human-machine. Follow/not follow advice from recommendations; does someone who didn’t follow advice and did badly follow it next time? What’s the tracking of that, the receptivity? Longitudinal study needed. Hold up gold standards of grades – but these are not so good/reliable. Interpretation of failing first assignment. Shouldn’t be on the course, vs should support. What about a weeding course? What variables would you want to add for the 5% who fail the first assignment but still pass. Interest in the grey area, the false positives and false negatives. What’s the impact of adding a new variable – amplification beyond that. What’s overfitting in this context? Want to affect the practice – prediction is valuable and interesting, but the point is to change the outcome.


Not all participants have the same goals. Can have much data, but if you don’t have the important data, it’ll suck and so will your ML. Do we need predictions only, or explanations too? What is a qualitative method? – putting judgements on things as humans, using those. We need to make predictive models sharable objects in themselves; can build on each other’s work. Prediction vs description – not really ‘versus’, have to do both. Without description, don’t know what to do with the prediction. Results of probability, what do you do? – depends on audience. “We’re Ok with qualitative because we’re going to just force it in to quantitative anyway”; comfortable with that, but will it be strong enough?

Wrap up

George: Becoming more aware of machine learning concepts in learning analytics, learning sciences. Based on some of the conversations you’ve had today, spend time about what you think would be a next step to bridge ML and LA as disciplines. Personal – want to become better aware of ML, or of LA lit. Or work on pulling together a paper with people you met here. Discussion based on your experiences today, what are actionable next steps you’d be happy to pursue? Being dazed and confused is Ok as a strategy too. (Laughter.)

My group:

Having staff – machine learning people – get money!

One possibility (me!) – write a simple primer on ML techniques and what they can and can’t do, perhaps with an example(s) for each that make sense for educationalists. An overview.

Phil: People take the data that’s available because the software has that feature. Ask, how the software can be designed to ask what data could be useful. Also, confidence slider/radio buttons after questions- how confident are you that you’re right? What should we be measuring? That drives the instrumentation question.

Dragan: More people talking to each other. Repeat workshop like yesterday with Phil and Zach. Publications – tutorial type papers – special issue in the journal? ML primers for LA people.

Like ‘statistics without tears’.

Hands-on activity is good.

Shane: Want to make the data in his university available to the ML people.


Bridge HE and K12.Developing open standards in terms of the data – what metrics, what analysis can be done. Making dashboards available and evaluation frameworks. Education provides questions they want to answer, formulate them in ML terms. People don’t realise they need these techniques. But – has to be genuine dialogue going on.

George: This is where we torture LA folks, and in Doha we do something similar to the ML folks. So if feel it’s of relevance, want to introduce it to others. But it’s not a one-way street. Here it’s what can LA learn from ML, but the other half is important too. Confusion isn’t a good thing, but going through it is the route to learning.

State of the field type paper, what are the big RQs. Reference implementations and results. Moving beyond learning styles – identifying clusters empirically. List of things each community want from each other.

George: It’s trying to understand the world in a way that isn’t based on metaphor and narrative, like when we first encountered e.g. a new coding tool or stats work. As you get deeper, see the value as a way to understand patterns you can’t capture through metaphor and narrative. We’re in a similar space now for LA and ML. Lots of people at a similar stage. As you progress, there’s the prospect that you can gain a different level of insight in to large-scale methods.

Making the ideas more accessible, e.g. wrapped around a problem statement, easier to grasp the idea then, than when expression of a process or algorithm; so problem-oriented. Software redesign, optimised to capture the data you want. Continued interaction between the communities. Accessible concepts – making the ideas translated so can answer ‘What can I do with that?’. Finally, making data more accessible – ML experts haven’t had access to educational data.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.


Author: dougclow

Academic in the Institute of Educational Technology, the Open University, UK. Interested in technology-enhanced learning and learning analytics.