LAK13: Monday morning (discourse-centric learning analytics)

Liveblog notes from the first workshop at LAK13.

1st International Workshop on Discourse-Centric Learning Analytics

Programme and papers.

Shared wiki-like workspace at: bit.ly/dcla13notes

dcla

Simon Buckingham-Shum: Welcome

Welcomes everyone.

Rationale: Lots of work is about writing text, reading, sharing, communicating. A focus on discourse is an obvious one for the community. Many ‘approximate proxies’ for engagement – best he’s seen in a mainstream VLE/LMS is “longest post”. Want better proxies for ‘deeper learning’.

Introduces workshop co-organisers – many people from the UK Open University. Also the speakers.

David Shaffer: Keynote: How Research into Epistemic Network Analysis Might Inform DCLA

Slides (PDF).

It’s about what happens if we combine ENA if we combine DCLA – A CANDLE! Acknowledges his collaborators and funding.

Traditional paradigm: students interact with teacher; teacher has lesson plan, sets assessments for students, which provide information back to the teacher. How do learning analytics and discourse analysis change this picture?

Example of an epistemic game. The platforms we’re working on are often the core of how we think. Shows a promotional video. It’s a game about Nephrotex, company that manufactures membranes. Simulation at the centre. Activity to develop a prototype to a brief. Prepare a presentation justifying the design choices. Built-in chat tool to talk to design advisers. Used 16-18, high school and university intro classes.

One argument – there’s an alignment between the ideas behind the pedagogy and the kinds of learning analytics we want to do.

There’s pre/post tests on the topic; there is an increase. But main aim is about disproportionate (low) numbers of women in engineering fields – they disproportionately drop out after the first year. Goal of game to give authentic engineering experience, so help them persist.  Then survey of commitment and confidence in engineering – no change in experimental, but control group goes down. Big effect size, particularly for women who start with low confidence.

Paradigm changed – teacher changed to mentor. Chat hub becomes a centre for the activity. Use the chat log as a data source. “Traditional” (last 3-4y) approach is data mining. Has a GIGO issue (garbage in, garbage out). When you do real mining, you don’t just stick a shovel in, you have some understanding of what’s below and what you will find. “We can only meaningfully measure based on theory”. Data mining is a mistake – we should do data geology. Have some theories about what we’ll find and why.

Architects, lawyers etc solve problems in meaningful ways. To prepare people to engage in complex, innovative thinking, we have to build schools to give them those kind of experiences. Imagine the future – not just to prepare for specific roles, but because those are the techniques they need to work in the world.

There’s a myth of inspiration and innovation – you sit and you are inspired. Occasionally there’s a flash of individual genius. But for the most part, people solve complex problems as part of a community of practice (Jean Lave). The communities take the stuff of the world around them and approach it from a particular cultural point of view – with basic facts and skills. But also a set of values, and an identity. These are tied together by epistemology – broadly, how people make decisions and justify actions. We can measure this in games! ‘identity/skills/knowledge/values around epistemology’ makes an epistemic frame.

Complex thinking happens in cultures of practice.

IBM experiment – Leadership in a distributed world- seasoned leaders, in a game environment, see if it’s like in a real environment. Set in World of Warcraft. It’s a complex system, you manage a team, face obstacles, dashboard of information, track things in real time. Hypothesis – there’d be an overlap. 75% of the factors can be applied from MMORG to the corporate environment. But no correlation – 61% said couldn’t transfer. Cultures are characterised by the pattern of thinking. Schön’s reflection-in-action. Ways of doing / ways of knowing -> practicum. Architects, doctors, etc. Take action, step back, reflect on that action. Cycle of action/reflection builds the epistemic frame. Builds the process of acculturation.

Epistemic Network Analysis explores this. Builds on Social Network Analysis (SNA). Example of a party. Snapshot: A and B are talking to each other, C and D are, E and F are not talking. Then later snapshots show different links. Then look across all those configurations and sum them up. Then can represent on a diagram with the people as nodes with lines between indicating the strength of the links between them. Measuring the ‘centre of mass’ of the interactions.  Want to characterise not just the presence of knowledge, identity, etc, but their relationships.

They have much information about how these are used by players in the game. For today they mostly focus on the chat. Could well be applied to the other data, but haven’t yet. They take the lines of discourse; use semantic coding. It’s hard – AI Complete problem. People talk in ways more complex than machines can analyse. They break it up in to structural functional linguistic codes (SFLC) – SKills/Knowledge/identity/values/epistemology. Also domain codes – when restricted in the game. There are something like 37,000 common English nouns, but in this particular game, there are more like 400 that are actually used – which is much more computationally tractable. Set up a content matrix. Use regular expressions to match. SFLCs also based that way – e.g. ‘why’ vs ‘why don’. Broken in to sentences, then ID verbs, then nouns (and adjectives), then people (from list of names, pronouns). Put the SFLCs and domain codes to do conjunctive coding. Drives down false positives – drives the kappa up. Not all conjunctions are meaningful. There is the identity of an engineer, but not the value of a surfactant. Compare auto connections with expert coders; automated is doing fairly well, especially when tweaked.

Trick is to do all this on the fly, rather than having a team of graduate students working on it for months. Coding is done in R (stats) and their games. Can code in about 1.5-1.7 s. Pretty remarkable, and picking up some fairly subtle differences – “I think this is a better choice …” versus “This is a better choice …”. But codes differently for “… so we should choose it”. Values are defined as unjustified assertions – e.g. ‘we should X’, ‘we should try to do Y’ without justification offered.

If you run 50 queries, also takes about 2s. The overhead is firing up the coding engine, not doing the coding.

Q: If delay in chat is more than the coding time, not a problem. Close to real time.

More complex than that – designed to feed back to the mentor who takes action. It’s on the right order of magnitude – not taking weeks of coding to get back.

What do we do with all this live-coded data?

Make an adjacency matrix – show when things are co-present, accumulate them over time – gives you a model of the structure of the thing you’re analysing (the party, or the discourse in the game). Convert the matrix back in to a vector (only need top half because the matrix is symmetric.) Create a map of the structure represented as a vector (of adjacencies) at each moment in time. Gives you a vector space. (All pointing in the positive quadrant). Pointing in the same direction is the same semantics; the length is the length of discussion. So they norm the vectors to a unit circle. Then slice the vectors to group them.  Can look at individual trajectories over time.

An epistemic frame can be quantified using a network model.

Show a visualisation with dots (for people) in 2D frame – redesigned the game and novices were more like experts.

Factor loadings – if 20 underlying epistemic elements, have about 20^2 dimensions, a lot of factors. Each represents the conjunction of the elements. Equiload projection on to the space. Can see how the epistemic nature of the discussion develops over time by a line moving across a 2D space.

They break the talk in to stanzas, identify topics. Accumulate across the topic. Not all within one turn of talk. Theoretical problem in isolating an individual in a discussion.

Q How do you handle e.g. someone listening to a long utterance and responding “right” vs “right [repeat/revoice of first utterance]”?

System handles it fine for revoice, assigns that to their epistemic frame. For generic ‘right’, can decide they’ve endorsed it. Or not because they’d have said it if it was important. Also not interleaved in text chat – can’t always be sure what the ‘right’ refers to.

Q What about irony, emoticons. Yeah, right, vs Yeah, right.

This chat system doesn’t have emoticons. If it did you can just code the semantics of them. When we started, were worried people would talk in chat-ese, but in this professional-like environment they tend to write in straight-up English. Irony – you are what you establish. Even ironically, you are establishing a connection between two things.

Outcome measure they cared about was commitment over time. Want to tie the discourse analysis to the commitment measure score. Use the ENA tool. Broke students down by quartile (for commitment/confidence in engineering). Looked at upper quartile – fall in the bottom half of the space, lower down than the non experts. Can then look up the structure of those individuals’ discourse, and see that there are different connections (and more) made.

[Strikes me that correlation and causation aren’t clear here – but it doesn’t matter: the epistemic network analysis is a leading indicator of the final thing you care about so you can do something about it sooner.]

So – they use the utterance table (data log) as source for ENA, which creates student model, which feeds in to an auto mentor. With suggestions created by rules – scope/trigger/action. When they’re used, can analyse how they work, so the model is incorporated in to the data. The ENA creates a summative assessment, but using it as feedback makes it in to formative feedback. So it’s fummative assessment, or sormative assessment (!).

The model makes some important assumptions vs trad psychometrics. Trad is focused on individual nodes – do they really have this concept, skill. You have multiple measures of the same thing, trying to triangulate on some hypothesised underlying thing. ENA says the action is in the connections – to understand X is to understand its connections. It’s not that you warrant X by being sure it exists in some place, but that you warrant it by its connection. So you don’t have to define the nodes this way. Can take anything where you can identify discourse tokens, and ENA still applies. Need automated coding system to do it on the fly. But this looks at connections between things – acculturation in to practice, however anyone’s defining practice. Can use brain scans, teacher interviews – doesn’t have to be games, or skills/knowledge/values framework. It’s a way of looking at learning as acculturation – adoption of particular patterns of discourse.

Q When you created your first vector space, normalised, if same direction different length is just a matter of time. Are you losing something? Is it the same topic? Issues about confidence, agreement lost if you normalise?

Yes, of course you lose information whenever you normalise. If you hypothesis length of discussion is significant, you are losing that. An obvious consequence – assume have two groups – group A says almost nothing, group B talks and talks about all the aspects in great detail. The location of those two points in vector space will coincide! So are looking at recovering that information – the total length of it. Tease it out along another dimension. Will be in next build of the toolkit. Any tool has theoretical assumptions behind it. Which is why you need to do data geology! Not plain mining.

Q In practice in chat, they make typos, etc – how vulnerable is the coding?

We were very worried at the beginning. It’s an empirical question. In this context, the gramatical slips are far less than in other chats. Partly because the mentors write in full sentences. Yes, whatever coding system you use has that issue.

Q Compare tagging text manually, throwing it at LSA, is it any worse?

We’ve looked at it, more for the coding. LSA makes different assumptions about the relationships – happy to discuss. The more direct comparison is to say forget the connections, look at whether they’re talking about X, Y, Z and do the counts. That does give different results.

Q How much chat has to take place to get an accurate model for feedback? You could mislead. How ethical?

We’ve solved that because there is no ethical feedback, it’s just suggestions to a human mentor. So dodging that issue for now. More important, the stability for the parameter estimates out of JNA, use jacknife analysis – throw away potions of data, see how stable of it. Can throw away 5-10% and confidence is OK. Sensitivity analysis. So far they seem fairly robust to these deletions – e.g. deleting just one turn of talk.

Q How much training is needed for the mentors to understand the nonsense you’re sending them to be useful to the students?

At the moment, it’s specific suggestions for what to say. Want to visualise the results of ENA so mentor can see if conversation is going on or off track. But at the moment we just say “you might want to say [this]”.

All That Love All Those Mistakes
(cc) Thomas Hawk on Flickr

Greg Dyke et al: Analysis of Discourse and the Importance of Time 

Gregory Dyke, Elijah Mayfield, Iris Howley, David Adamson, Carolyn P. Rosé (University of Lyon, FR and Carnegie Mellon University, USA)

Research précis PDF

Three strands:

  • 1. Souflé framework for analysing discourse
  • 2. Challenging assumptions with sliding window visualisations
  • 3. Enhancing scientific reasoning adn discussion with conversational agents.

Assumptions:

  • We can find learning in discourse, in the linguistic context.
  • It’s a collaborative activity
  • The processes happen over time, have to look at time in analysis.

Idea of a discourse analysis cycle – support effective participation in conversation as end goal – from facilitating researchers’ analysis, asynchronous feedback, real-time.

1. Souflé framework for analysing discourse

Many coding schemes. Very few strictly linguistic. This is theoretical learning sciences stance: all theories are enacted through participation enacted through discourse. Makes sense to look at the language. Pragmatic language technologies stance: look at words, sentences, understanding the deeper, cognitive things is more difficult than finding linguistic elements, which are directly there in the language. Linguistic is easy – low-hanging fruit for automated analysis.

Systemic Functional Linguistics – Souflé. Multi-dimensional – three axes: negotiation (positioning wrt knowledge, agency); engagement (positioning wrt other participations and ideas – e.g. absolute vs alternative viewpoints); transactivity (positioning in the conversation – use previous utterances to move forward using reasoning, or present new ideas).

Negotiation. Simplified – am I a source or a sink, and are we interacting about knowledge, or about action: four possibilities. Get a measure of authoritativeness from ratio of added scores. Authoritativeness measures display how students respond to aggressive behaviour, and predicts learning and self-efficacy. Detectable readily.

Engagement. How present to another – Nuclear is a good choice vs I consider nuclear to be a good choice, there’s no denying nuclear is a good choice. Again doing well on measuring it automatically. Human coding is difficult (inter-rater reliability is low). But correlates with reasoning. A computer agent using sentences that are more heteroglossic than monoglossic, it causes more articulation of reasoning in group chats.

Transactivity. Building on idea expressed earlier in a conversation, using a reasonin gstatement. Moderating effect on learning, knowledge sharing. Can detect automatically (not quite so well as the others).

Example of applying all three. Conversational agents in 9th grade biology, discussing in groups of 3. Three conditions – two supported conditions (direct, indirect) and a control condition. Learning better in supported. Transactivity – overall low reasoning, particularly poor in indirect condition. Engagement – monoglossia particularly high in indirect condition. Correlates negatively with reasoning. When you make monoglossic statements, tends to low reasoning. The agent in indirect condition were “this is stupid”.

2. Challenging assumptions with sliding window visualisations

Rather than counting them all up over whole time, use a sliding window to see how it changes over time. Issue to scale up to large data.

E.g. several phases, where the computer intervenes with knowledge construction dialogues – a cognitive path. Used to move from 1:1 agents to collaborative settings. Three conditions – no, low and high social participation. Coded turns for social, offtask, tutor abuse. Found low social better than high and no social on learning gain. (smileys, encouragement). Low+high social -> more student social turns. No social -> more student offtask turns. High social -> most student tutor abuse turns “it’s really stupid”.

Interesting visualisations of turns over time based on a sliding window analysis – can see more social participation by the students – can also see that it happens mainly at the beginning, doesn’t happen in hte middle. And in no-social there’s lots of offtask. (Some offtask is good, but too much is bad.) Not distinguishable during task but clearly towards the end. Tutor abuse – can see it’s increasing – e.g. at the end when it says ‘Ok you can wrap up now’. A bit disappointing from intervention – it’s all happening at start and end. But interesting from analysis perspective.

3. Enhancing scientific reasoning and discussion with conversational agents.

Real-time feedback. Take what they’re saying, analyse it, use to have agents intervene appropriately. Working on agents adapted from Cognitive Tutor agents (trying to guide students through a line of thinking). Add in performance of social turns (Kumar, 40 papers!) – increases credibility and student attention, and improves student social participation. Improves learning if there’s a “right amount” of social participation. But! Doesn’t leverage creativity. Domain transfer is hard. [as always!] Takes years to write a 20 minute script. Different aspects of the agent are difficult.

Use ‘Academically Productive Talk’ (APT) – what good teachers do in the classroom. Lessens teacher’s role as a knower, more as someone who arbitrates between students and doesn’t tell them the answer; empowers students to listen, build on each others, relate their reasoning to evidence. Productive moves – revoicing, asking to restate, asking to apply own reasoning to other’s statement, prompting participation, asking to explicate reasoning.

Have found some learning gains (APT vs non-APT). ‘Could you explain more’ and ‘move on’ gets confusing when happen from the agent at the same time. So tried differently in year 2. Sentences that makes sense (are true), compare student sentences with those, try to use as prompts. Many theoretical issues, but practically it seems to work effectively. Also worked on coordination – so different aspects of agent coordinate, allowing time for response, what can cancel what, etc. Now, adaptation – they’re usually Ok to say, at worst they add in something true about the situation. To adapt to a new domain just need to write a bunch of true sentences about that domain – so exciting. Bazaaar framework for agent coordination.

Conclusion

Trying to analyse what students are saying, how collaboration is happening, and give feedback in real time – Souflé framework for automated socio-linguistic coding. Dealing with temporality, and coordination between agents.

Can’t always automate analysis. Need more knowledge about discourse that’s productive and how to detect, and how to scaffold effective discourse. And more integration of the analysis with coordination.

Q: You said ‘feedback behaviour bad’?

In the year 2 study, had revoice behaviour, plus another – if we see an explanation, we say ‘nice explanation’. That didn’t produce good learning, we hadn’t understood why. It wasn’t beneficial.

Q: Maybe they didn’t understand the feedback – ‘nice explanation’ doesn’t tell you if it’s right or wrong. ‘The stuff went in the bag’ – maybe try something like ‘what do you mean by stuff’?

Because that’d be orders of magnitude much more difficult. This is very simplistic – just using ‘the stuff went in the bag’.

Q: Eliza wasn’t that complex and it did that.

We’re trying to avoid telling students when they’re right or wrong. Public schools in Pittsburgh, students find that sort of feedback frustrating.

Q: There was a lot of stuff in your talk not in the precis?

Hopefully it’s all in the references.

Q: You talked about teams across time. You had a high social, low social. Talk about how teams performed? Collaborative learning goal is not to be social – that might influence how it develops and how it learns. But as an educator, the performance is important. I missed that aspect of performance across time.

With the representations (sliding window), taking time in to account we can discover things we didn’t see when we look at it overall. Took existing paper with learning gains over time, and analysed it. Related conditions to social participation. Complex effects with learning. Phases – participation in different phases in different conditions had different effects on learning.

Q: E.g. tutor abuse, can be a measure, might be an over-representation of poorly-performing teams, who are blaming the tutor. Without taking performance in to account, I don’t get a good picture.

We took performance in to account and found almost nothing. We found students who talk while the tutor is telling them stuff do worse than the ones that keep quiet and listen.

Alyssa Friend Wise: Leveraging CSCL Research Analyzing Online Discussion to Inform DCLA

Alyssa Friend Wise (Simon Fraser University, Canada)

Research précis PDF

This area is huge. Two projects. One we’ve turned in to learning analytics, one not yet. Examples of CSCL to inform DCLA.

Online discussion environment – Phorum. Threaded. Discourse and dialogue, but always talking about online discussion. There are other interfaces – trad threaded vs network/spider diagram/concept map type visualisation.

One central concern: Connecting the development and use of DCLA with theoretical understandings of the qualities of discourse. Theoretical models of online discourse.

Two key points: First, there’s a lot of research in CSCL looking at online discourse. Not straightforward – need additional novel knowledge base to leverage this to inform DCLA. DCLA > LAK + CSCL.  Second – creating a clear thread between goals for discourse, actions by students, and analytics is critical to use DCLA to support productive discourse. Have to share logic with learners and teachers. Some will be automated, but goal to inform (human) participants. Transparency and integration.

DCLA > LAK + CSCL.

There are many well-developed models for analysing online discourse. Most are process models, though some analyses don’t take that in to account. Temporal analyses gaining ground, this is critical. Trying to support the development. Dangerous to leave it out.  Most are group, not individual models. We need something more to do DCLA>

Example – Knowledge Construction Model, widely used. Theoretical pattern. Sharing information, exploring dissonance, negotiating meaning, testing and modifying, agreeing and applying. Most research has aggregated this – e.g. this group had more in this phase than that, so better, etc. Misses a piece – the notion of time and flow. The original theorisations aren’t too specific on this. Graph out over time – could be stepwise going up, or other patterns; could be not in sync, go back, skip stages.

[Note to self: check this out for Fairy Rings parallel!]

What do students actually do? Trend to the higher levels, but some popping back. One pattern ‘Failure to launch’ – only clear on stepwise development. Another pattern – ‘quick consensus’ – again not so productive.

Issues: Automating post coding, in real time. Evaluating ‘approximations’ of good discourse patterns – we know about desirable end states, but not so much about paths for change from a bad trajectory to a good end state. Evaluating discourse patterns in progress – the quality of a post pattern is partially determined by what is yet to come. How do we make these analytics actionable? Group vs individual levels. Pivotal posts. Interpretability – some graphs are tough for teachers and learners to work with. Also looking at posts in context – sequential analysis; time is critical. Some CSCL work heading in this direction.

Transparency and integration.

Students don’t have the perceptions we do (necessarily). Learners (and teachers) need to see connections between goals of discourse, online activity, and analytics feedback.

Need to move from theoretical model to discourse guidelines, then related analytics, goal setting, data, reflection. Students don’t see the calculation, but why we’re doing what we’re doing.

Example.

Theoretical model – social constructivist, sharing, supporting ideas. Two basic processes: Speaking (externalisation), Listening (taking up others’ externalisation). So avoids some of the semantic coding. Dan Suthers has done work on evidence of uptake. We’re looking more directly.

Learners have control over the timeline and pace, but challenges of time management; need to help students.

Speaking – sharing ideas with others. Value in speaking that is: recurring responsive rationaled; temporally distributed; moderate proportioned. Listening – often invisible. Broad, integrated, reflective.

Develop metrics showing students acting in distinct ways, according to their idea of the goal of the discussion.

Embedded analytics – information on what’s going on within the interface itself. Visual discussion forum (like a concept map) – node/connection diagram. When students come in, can see general scope of the discussion, see where it’s active, where they’ve been active. Idea to improve metacognitive responsibility.

Extracted analytics – metrics based on clickstream. One – breadth, % of posts read. Students don’t read the posts in the first place. That’s different to what you see in the visualisation – turns red to blue if you click on it; but this metric relies on you spending enough time to have read it there. Some take the time to slow down, some say they read faster than the metrics.

Discourse guidelines issued at the start – a little set of guidelines, with how it connects to the interface. Rationale for metrics and why you should care. Students have sense of agency around the analytics. Work with many metrics, diverse, so not a single goal to optimise. Students set them themselves.  Create a space of possibilities rather than trying to constrain you to optimise to a single outcome.

Got a lot of buy-in to the metrics; different students found different things interesting. No ‘big brother’ issues, because transparent from the start. Students had understanding of when they were or were not making process.

Q: Comment. I’m following up on an earlier question. Strikes me there are two different ways in which analytics are analysed. We determine whether they’re good on the process,dialogic analytics. Whether we can see changes that are better in the quality of the discourse itself, or good in terms of an outcome outside the discourse – did they solve the problem. Both are important, but it’s hard to escape needing something that brackets the discourse and says, by some measure it achieved some end. Are there lessons from CSCL that might apply?

I agree it’s problematic. Being able to dialogue intelligently in a field is in itself a goal. As epistemic frames, see development from first to last, indication of improved ability in that discourse. So discourse itself could be an outcome measure – key piece there is the time. But if you think about discourse, you can also look at this idea of transfer. Not in this course, but another, look for discourse pattern. After engaged with others, how would you do it by yourself? Students learning to argue in a rational way, debating a dilemma with evidence to reach a conclusion. Then a solo essay. Try to honour the notion of discourse not just as a means to an end but those people who want it have something that stands apart. There are some definitions where you can look at discourse processes as one track. We have to be clear about process vs outcome measure. Scale – is very much a process, not outcome measure.

Q: In your examples, one that didn’t lift up and one that did. They look kinda similar early on, can’t intervene too early. Have three turns where it doesn’t take off, comes back to level 1 (lowest), may not necessarily detract from higher-level. Is it even possible for a human to tell the difference until it’s too late? How can you differentiate them?

I don’t want to take humans out of the picture. In this case, the goal of getting students to think about their dialoguing is the goal. Two things going on – they’re participating in the conversation themselves. Point is, in the ‘pretty good’ pattern, the perfect pattern of lockstep isn’t expected. In terms of intervening too soon, put it in tutor and students’ hands, powerful because you don’t choose the time of intervention, you’re asking the students to do it and take action. Teaching them about how to monitor what’s going on.

Simon Knight and Karen Littleton: Discourse, Computation and Context – Sociocultural DCLA Revisited

Simon Knight and Karen Littleton (The Open University, UK)

Paper PDF

Comes from a philosophy of education background, in a school-based context. Full paper has references – though is missing many that are relevant; they are indicative.

1. Research around discourse for learning

Break down a range of these and what you may want to track: individual learning – subject language; psychological development – argument behaviours; structures, language; group understanding (small group, whole class) – social interaction; sharing of ideas that can be improved together – co-constructive artefact development.

Common knowledge. Shared perspective built through discourse and joint action – common knowledge is the environment we’re working in, but changing as we talk. Temporal context is important, but often glossed. Education is not a set of discrete events; dialogue mediates the development over time. Temporal is less empirically studied and theorised, particularly over longer periods.

Two types of common knowledge – Background: historic, based in communities of practice; Dynamic: fluid, built on co-construction within groups.

2. Particular emphasis on context

Exploratory Talk and Accountable Talk are associated with improved outcomes. And other models broadly aligned. Critical constructive engagement, justifications given, reasoning is visible, talk is accountable – participants interthink. Co-constructing, creating joint artefacts.

Example (fictitious!)

3. Current work in DCLA

Taking dialogue from e.g. classroom, CMC, can map various things – subject vocabulary use, rhetorical marker use, social network analysis, and temporally drive discourse analysis.

4. Remaining challenge

Want domain-general analytics. For peer-peer talk, not peer/teacher or peer/agent. With focus on common knowledge, exploratory talk.

Combining DCLA – want to get from here to then. ID the concepts, the network of interlocutors and contributions, and understanding how they are related across time.

Want to move beyond understanding where someone is, to understanding how they got there (and with whom), and the joint artefacts the created.

Disambiguation is a basic challenge in natural language processing. Image search for DCLA is great. Context is a problem – same utterance from child or teacher or at different points in the lesson can be profoundly different.

Documents versus conversations a key distinction. Analyse sequences of replies to understand word clusters, how long they’re talked about, how they emerge, split, co-occurence. Rather than modelling what’s happening based on a corpora or dictionary.

The methods we use are context in their own right – the data we train on, the classification processes. Also for students understanding especially if we give feedback on this basis. Also the outcome measures we give them too. Make sure the things we are measuring are the things we know are educationally important. This building of common knowledge, exploratory talk, is a domain-general framework associated with positive outcome. When we ask humans to code things, the computer has to understand that, which has an impact on how we ask them. Document splitting may gloss temporal aspects.

Two things – transactivity, how knowledge flows (domain-specific, also possibly social network specific), and exploratory talk markers (domain-general). Both are important.

5. Moving forward

Not just ‘you are here’, but ‘your route/landscape’, and how you might go forward.

Parallels in Pragmatic Web. Move from syntax (logical forms), semantics (standardisation based on ontologies), pragmatics (evolving contexts and practices around epistemic artifacts). Language in action – shift from language as representation of the world to means of interaction with the world.

Caution! DCLA has potential for new assessment practices. But caution in developing tools with limited but perhaps unstated views on the nature of language use for learning. Assessment is incredibly important. And once we can detect it – how do we support it?


This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries.

2 thoughts on “LAK13: Monday morning (discourse-centric learning analytics)”

Comments are closed.

%d bloggers like this: