LAK12: Session 5B Empirical Studies (1)

Liveblog notes from Tuesday afternoon at LAK12: Session 5B on Empirical Studies.

Downtown Vancouver 1x1

Jóhann Ari Lárusson and Brandon White:
Point of Originality

Johann starts.

A tool for automated avaluation of student progress through analysis of their written assignments.

Take a course like, say, English 101 – two sets of stakeholders: learners, teachers.

A potential problem – the trend towards larger courses. Particularly in gateway courses like Psych 101. Environments less useful for higher-order thinking. Students are mere participants firehosed with information from the stage. At the same time, good things happening – technology playing a more crucial role, potential for new kinds of learning activities, and can create elements that we like of face to face teaching. This helps the students; the teachers’ main responsibility to monitor. If move learning online, what do the teachers see? A black box. (Nothing).

We map the students, 150 students who are blogging twice a week, met them, have an idea of where they stand. Then start an iterative online writing activity without any sense of its effects on learning. Then don’t see anything until you see the final product. It’s not just the final product, the process and the in-between products that matter. Without the right tool, it’s intangible to the instructor.

Relatedly, there’s a volume problem for the teachers. 300 blog posts a week on a single course, plus comments; continuous monitoring daunting or impossible.

Their project: the point of originality. Automated evaluation of content mastery in writings. It’s not an automated grading tool! Essentially looking at writing – recasting, that point in time when a student graduates from paraphrasing to more originality. It’s the same materials. The primary audience is for the students; enabling the instructor to do timely interventions – when they’re not understanding key concepts, say. In parallel for the instructors. If all the students flatline, maybe it’s not that they’re all lazy or stupid, perhaps the instructor needs to adjust.

It’s a web-based application which visualises students’ writing samples from left to right in chronological order. Takes writing samples, taken automatically in; manual entry of key query terms – the topics or subtopics the instructor wants to focus on, and then see how that is working out in terms of comprehension.

Methods and meaning

Brandon takes over. Imagine he’s an instructor; as he builds the course, some concepts should become apparent. If they’re preparing responses to the material, want to be sure they’re reflecting on those materials. Focus on higher-order thinking on those core concepts, for feedback, can spot the problem as an instructor and do something. Students might not use the same terms as I would, and that’s not a bad thing – want them to develop writing; mastery is likely to be shown in new expressions. Want to take in to account all possible versions of those terms. Recasting: put materials in to your own words.

They use WordNet, a lexical database, hierarchical thesaurus. Synonym and grouping/hierarchy – hypernyms, hyoponym, holonym, meronym, etc. – ‘is a part of’, ‘is a kind of’. Define closeness measure based on number of jumps.

The instructor choses a search word, and the system expands this in to a tree, which it then uses to search for broad matches. That gives a composite originality value. Can see when students were redeploying recast versions of taught material.

Demo from 2008 course, on Internet and Society, entry-level CS course at Brandeis. Students produced blog posts; 9weeks in, summative essay on a specific topic.

Paper set after blog posts. Students may have blogged on those before they wrote the paper. The 9-week window, the lead-in period, is when they could have developed mastery of those concepts. As an instructor, could explore whether students were prepared for the assignment paper.

Tool set up with the keywords/query terms, produces a series of circles for each student writing, which are larger/more red depending on their use of the core concepts in the query points. Can see student writing on topics, but not those relevant to the assignment (because they didn’t know). Another example – student with a bigger circle before the assignment; more original in terms of the query topics before the assignment – and they were one of the top performers. (Clicking on the circle gives you the text sample in question.)

Post on ‘commons’, but only talking about ‘group of people’, ‘community’, ‘volunteer’, and ‘support’.

Direct relationship between the originality variance before the paper and the grade achieved for the paper.

Update from paper figures. Now update it with a decay factor, so multiple uses of the same factor don’t get you increasing results. Improved their results. Highest average originality are most likely to succeed, but also those who have only brief bursts of highly original activity. Blogging was lumpy; they might delve in depth, or go for breadth.

The ideal use case for the tool is to provide opportunities for support.

Predictive modeling can interrogate learning, creates opportunities to make teaching better.


Someone 1: The blog in the wild is something that facilitates a dialogue between an author and an audience. I didn’t see any attention to that social, dialogical nature of the blogging text. As a writing teacher, it concerns me that this becomes drudgery like a problem set.

Brandon: From an English department, we often give them iterative exercises on what they’re reading. We’re using blogging here as representative of that type of activity. We did run separate tests (in proceedings) – found students with highest originality in lead-in period were also taking fullest advantage of the collaborative nature of the activity. Opportunity to relate the content analysis to the interactivity, but a lot to compress in to one window.

Someone 2: And limitations in terms of specialised terms, concepts that weren’t appropriately mapped?

Brandon: It does run in to trouble with specialised, scientific, medical terminology. When we started, WordNet was on 2.0, now 3.0. It’s maintained by Princeton Linguistics Dept.

Someone 2: If had a class, there are mechanisms to add relationships to WordNet?

Brandon: Conceivably. In psychology, maybe hippocampus might not have many associations. But when they talk about it, they talk about what it does, what functionality.

Someone 3: Draw concept maps instead of writing something? A more direct measure?

Johann: An interesting suggestion. Not pursued, where the work came from was overload of writing samples. We wanted to explore existing textual content.

Brian McNely, Paul Gestwicki, John Hill, Philip Parli-Horne and Erika Johnson
Learning Analytics for Collaborative Writing: A Prototype and Case Study

Brian talks in a pre-recorded video. Have worked on many projects together. Work supported by multidisciplinary team.

Project motivated by interest in genre theory. Writing as a powerful mediating means for learning and development. Shared interest in formative assessment. Engestrom, Vygotsky’s Zone of Proximal Development.

Real-time collaborative writing environments for software development. Google Wave, then Etherpad, then Google Docs. Seeing collaborators in real time might help learner.

Can learning analytics foster greater metacognition?

Uatu, their tool, interfaces with Google Docs, working on collaboratively-written documents. Google Docs’ suite of APIs, especially Docs Gdata and Google Visualisation APIs.

Uatu named after fictional being tasked with watching over the earth. Watches Google Docs shared with special user account, stores data about changes to the document. Basic visualisations as a front end – things like number of saves, document revisions over time. Users can adjust the visualisation. Simple visualisations, but this is a prototype. Not just a learning analytic prototype, but understanding its effect in use.

January 2011 started use in a class. Systematic qualitative case study over 15w; ethnographic fieldwork. Classroom observations (20), usability, stimulated recall of use, obsersvations of pair and group programming, semi-structured interviews. Total 24 interviews. 42k words of fieldnotes and analytic memos.

Student working on a requirements doc – heavily annotated by peers and instructors; found the commentary helpful.

Three themes in the findings.

Theme 1 – minor – as they worked, they preferred collocated collaboration. Varied from researchers’ expectations. Thought Google Docs would ease collaborative writing since they could work in distributed locations. Preference for ad hoc and planned face-to-face collaboration. Used e.g. diagrams on chalkboards as shared orienting object. Also alleviates programming anxiety. Convenience, writing face to face was more efficient.

Theme 2 – minor – how they generated prose. Most updates were from one person, Roy: ‘Roy copied stuff down from the whiteboard’. He was the designated typist. Often only one or two members of a group made saves to a collaborative group. Expected naively more distributed and asynchronous contributions. Many important contributions were ephemeral and therefore invisible to Uatu.

Theme 3 – what counts as writing? Roy didn’t contribute a corresponding portion of the documentation. It was more complex and nuanced. Other group members sketched ideas on paper, meaningfully contributed to the document orally. Gestural, oral contributions invisible in the edit history. Cannot measure this with traditional analytics.

Limitations obvious – if they’re contributing, but not in writing in the system, can’t analyse it.

Real utility is in visualising large, complex documents written by distributed participants over a period of time. E.g. project manager tracking a complex policy document. Online education. Even here, important ephemeral forms would contribute to deliverables, and difficult to capture.


Brian is live on Skype!

Someone 1: Was Uatu able to capture efforts that happened in parallel, at the same time?

B: Yes. The API logs commits and saves consecutively, as if they happen in order, even though we’re capturing it essentially real time.

Daniel Roberge, Anthony Rojas and Ryan Baker

Does the Length of Time Off-Task Matter?

Looking a data in a micro-fine-tuned timescale – at the mouseclick level. Now have data over a whole term.

Narrated by Ido Roll, who wasn’t on the author list.


Focus on off-task behaviour. It’s a problem known for a long time (1884), associated with poorer learning (Bloom 1976 on). Continues with modern educational technology – again associated with poorer learning (baker et al 2004, Gobel 2008, )

Defined as when student completely disengages from learning environment and task to engage in an unrelated behaviour – e.g. surfing, games, sleeping, reading magazine. Data in 2006, the students were mad about Transformers.

Carroll’s time-on-task hypothesis is a linear relationship – simply lose the time spent off task. 5 min out of 50 min on task, lose 10% of learning. Basic hypothesis. However! May not be so simple. Maybe short breaks improve cognition; longer may be worse since you completely lose working memory. Maybe a common cause for both. Changing length of observation periods affected strength of relationship.

Investigated relationship between off-task behaviour and outcomes. Linear and non-linear; percent time off-task and number of brief and lengthy off-task episodes.

Context of Cognitive Tutors, ITS system, mainly math and sciences – 500k students in US middle/high schools. Student does problems, Cognitive Tutor gives feedback and help based on task analysis and model of student cognition. Misconceptions built in to system, gives very exact feedback tailored to the error the student made. System traces their skill, knowledge – can adapt the problem to the students.


186 students, one NE USA city. Cognitive Tutor on scatterplots for 80m. Field observations, 2-3 observers with 0.74 kappa. Observation by peripheral vision – to reduce effect of perceived observation on off-task behaviour.  Coded 20s chunks as off task, gaming the system, asking for help, working in system. Some talk to other students or teachers can be on task, even though nothing showing in the system.  Gaming the system is on system, but not on-task.

Automated off-task detector built. Latent Response Model. Used field observations as the gold standard for how much gaming students were engaging in. Took metrics for each students, then vector of each students’ percentage time spent gaming. Search model space to find goodone. Correlation to off-task behaviour of 0.55; only has correlation of 0.16 on-task conversation – can distinguish off-task behaviour from off-computer but on-task (e.g. talking to teacher).

Sometimes detects off-task action a couple of actions after it occurs.


Pre-test correlated to post-test (r 0.3, p<0.001) – off task behaviour negatively correlated to post-test (r -0.3, p<0.01) – so similar effect.

Linear model is good, significantly better than chance (p = 0.017) for Off-task term.

Non-linear time used off-task time squared; also significant but very similar. Used Bayesian Information Criterion – two models are not significantly different, and are both significantly better than chance.

Median split on length of episodes of off-task. No relationship between long and short pauses. No evidence that any type of long or short episode affects output. Similarly with quartile split.


Percentage time off-task – linear and non-linear models predicted learning successfully. Unclear whether a difference.

No difference in prediction between ‘brief’ and ‘lengthy’ episodes of time off task.

No ground to reject Carroll’s (linear) Time-on-task hypothesis. May be worth further investigation in additional datasets.


Jon Dron: Task itself fascinating. Any evidence whether different tasks – e.g. some reflective or more creative learning processes – might even benefit from daydreaming?

I: I work on more creative tasks. Pauses extremely important. Not sure whether daydreaming is more important than being focused. When very constrained, looked at different interface elements – type of question is very significant predictor of whether they are off task. More challenging, more likely to be off task.

Someone 1: In school, during class, if you’re off task you’re not working through the tutor, end up short. If had longer time, had to sit there until you completed. Would you find the same relationship?

I: Great question. Haven’t looked but we could. Is mastery-based, only move on when completed. Haven’t compared to time-fixed system, or adaptive vs non-adaptive systems.

Someone 2: Looking at time not just in the big picture, but within learning experience importance. Look at when they take off-task breaks. Breaks at the beginning may mean thinking, at the end may mean phasing out.

I: I completely agree, the time is very important. Especially when more reflective, creative. When it’s time to execute, more focused. Constrained environment, studied gaming the system – guessing, looking at help. Not all students gaming were hurt by this, some learned. Some did that on skills that they know – things that were easy to them, they gamed it. Those who gamed it on the difficult questions suffered for it. It’s not just gaming is bad, it’s interaction between gaming and challenge level.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.


Author: dougclow

Academic in the Institute of Educational Technology, the Open University, UK. Interested in technology-enhanced learning and learning analytics.

1 thought on “LAK12: Session 5B Empirical Studies (1)”

Comments are closed.