LAK13: Thursday afternoon (9) Discourse analytics

More liveblogging from LAK13 conference – Thursday afternoon.

Discourse Analytics

Harmless Lover's Discourse
(cc) Iñigo Elimay on Flickr

Analyzing the Flow of Ideas and Profiles of Contributors in an Open Learning Community

Iassen Halatchliyski, Tobias Hecking, Tilman Göhnert,  H. Ulrich Hoppe

Ulrich? presenting.

  • Problem – characterising evolution of knowledge in knowledge creating communities.
  • Approach – modification of ‘Main Path Analysis’
  • Care – Wikiversity
  • Technology embedded in ana analytics workbench
  • Zone of Proximal Development – next steps.

Not an empirical paper. Development of a computational method.

All this is on open data – Open Source Software developers. Collaborative knowledge construction at KMRC. EU project SISOB on science analytics. And network analysis techniques.

Knowledge building – epistemic artifacts – Bereiter and Scardamalia 2003 – modality of learning is the same as in science. Main argument to look more deeply in to methods to analyse science production, transfer to the learning area. Re-uses scientometric methods to analyse knowledge building in communities.

Relations around epistemic artefacts – entities are actors and artefacts; links are  direct (create, cite, communicate); induced/mediated links [inferred?] e.g. between two documents  if there’s shared authorship.

Main path analysis: given a graph of dependencies between items, how to identify the main pathways through which the knowledge evolved. It’s not there in scientometrics; there’s a question of validation of these heuristics. In concrete terms, use citation links (references in publications), as relations between publication. Generally acyclic graphs, no cross references (mostly – except in e.g. chapters in edited books cross-referencing). Hummon and Doerian 1989 on DNA papers – derives a main line of development.

Citation networks are directed acyclic graphs. Links are inherently noted with time – sources are oldest, sinks are newest (have no successors). DAGs have at least one source and one sink. Add one virtual source and one virtual sink. Then can calculate edge weight according to a weighting scheme. Draw all trajectories from the artificial source to the sink. Count of times a link was passed; select the path with the highest edge weights. The paths may be multithreaded. Then cut off artificial source and sink – so could be disconnected.

Example from Wikiversity. Use the author names from the pages and their separate versions. 228 Pages in philosophy. Complication compared to citation networks: they’re not acyclic! Web pages often cross-reference bi-directionally. So no longer DAGs! So for each version of the page, introduce a new node – even when operation is only to link to a new page. That then becomes DAG again. Use swim lane diagrams, filter out redundant edges. Some redundancies. A link persists, but isn’t repeated. If links are deleted we’d need to do something, but we haven’t seen that; so we just accumulate them. Filter out redundant edges. Can then run main path analysis.

Medicine is rich on Wikiversity. Two different types of application of the method – one with minimal branching, only when ambiguity forces a branch; other times you might want richer threading and so allow for branching.

What insights do we get? A better understanding of the contributions in context. Show by example of user/author profiling.

Show user ID, with number of contributions  – three users are about equal. So can’t distinguish. But methods gives indicators to distinguish – number on thread of the main path (which can be threaded) – can see that some have a more on the main path.

Three descriptions – Connector, creates many contributions on the main path, receives many refs but not on the main path. Inspirator creates many contributions on the main path, receives many refs including on the main path; can be self-cites, but Ok if on the main path. Worker creates many contributions, fewer on the main path, very few references received.

Technology: SISOB Workbench.

Used for many different purposes – workflows. Visual representation, based on pipes and filters approach (Yahoo pipes). Filters as agents (mostly in R). General architecture for cascaded processing.

Looks very like Yahoo pipes. Have a data source, processing, outputs, transformation.

Next steps – Zone of Proximal Development.

Have used MPA as heuristic method. Question – what dataset do you use to find main paths – e.g. all CSCL, or AIed; there isn’t one main path, there’s thematic subnetworks. Need a specific seed. Can start from the end nodes, backwards fan, then from old to new, can get a more coherent picture. Want to explicitly manage the seed.

Then extending the range of the method – in Masters thesis, applying it to chat data. Replicating Dan Suthers’ analysis on Tapped In and added main path analysis.

Blend method with other network analysis methods – improve ‘link prediction’ approach in scientometrics (predicting future co-authorship, or future topics for authors); main path analysis could further elaborate certain conditions. Can interpret figures, adding main path figures, that’ll help you profile productivity, use as role indicators like from SNA (e.g. brokers, betweenness centrality).


Q Thanks for interesting presentation. In both examples, citation and Wikiversity – are connectors much more likely to be interdisciplinary folk? If viewing this, in citations they’re likely to be interdisciplinary folks?

Depends on your dataset. Here (in Wikiversity) it’s relating gynaecology and diabetes, so it’s interdisciplinarity within medicine.

Q Working backwards – it strikes me you’re more likely to miss out some small paths that die out quickly and go nowhere. Could work forwards to be sure you don’t miss out.

Usually you’d do both. Preselection problem. If you don’t preselect, you get lots of pathways, it falls apart in to very different pathways, each one a large branching graphs. If you want to ask more focused questions, that’s the seeding question, preselection. So work backwards or forwards.

Epistemology, Pedagogy, Assessment and Learning Analytics

Simon Knight, Simon Buckingham Shum, Karen Littleton

Simon Knight presenting.

Wrote this in first two months of his PhD.

Thinking about this middle space: LAEAP-ing the divide. LA buy in to ways of thinking about epistemology, assessment and pedagogy. For theoretical, practical and ethical reasons we should address that. Those considerations have practical application, and that’s where the middle ground is.

Introduction to epistemology, pedagogy and assessment.

A triad – foreground the relationships between them. What does it mean to know? How do I know someone knows something or not? And how do we get people to come to know? (to learn)

Truth, accuracy and fairness have a very practical purchase in the context of assessment. LA buys in to ways of thinking about this, flexibly enough that we may be able to move forward.

LA is data mining, on digital traces, deployed for pedagogic purposes. Thinking about LA as assessment – AFL – assessment for learning, formative assessment.

Example – Danish case

Video. New system in Copenhagen, taking an exam, allowed to use the internet. Can use Facebook, but not communicate beyond the classroom. If you send a question and get an answer back, that’s cheating. Educate students about that. Penalty explusion. Spot checks by invigilators. 28 students in a room, can’t watch all at once. Some will try, some will be caught, some will get away with it. Questions make it hard to cheat – not facts and figures, but analysis. Danish government proud to be first to allow it, other countries will follow. Is the UK system out of date? Stephen Heppell saying don’t cap them, arm them to be ready for the C21st, right now they’re not ready for the C19th.

The policy documents (mostly in Danish), they’re making epistemological claims. “If you allow communication, discussions, searches and so on, you eliminate cheating because it’s not cheating any more. That is the way we should think”. In university context, this would also allow auto-grading, area for LA.

Looking at student’s own judgements. Knowledge communicative, discursive. Picking out tokens of knowledge in decontextualised ways may not assess knowledge but something else.

Assessment should nurture knowledge development and practices (through AfL). Particularly where pedagogic assumptions on the subject, not the assessment practices. Schools teaching to the assessment.

Other side of the coin

That was theoretical stance. The other side is what students do, apply it to a context. Epistemology in action – what they do.

Interested in things in action, learning in action, what they do that. Analytics gives us unprecedented access to that. Shift from standardised assessment to knowledge in action, not in a vacuum.

The lens of epistemic beliefs or behaviours – how they conceptualise certainty, simplicity, source, justification for knowing. Information seeking behaviour is particularly salient. Not suggesting that probing behaviour unveils beliefs. In action is key, language to do things. Psychometrics are useful, but standardisation is problematic (and they’re poor in this area now, poor reliability).

Exploring various behaviours – how students choose sources, which experts they turn to, how knowledge flows between agents in a domain.

Probing it by looking at structured knowledge building – e.g. in OU Cohere tool, building on CSCL tool on making things explicit in structured environments like Knowledge Forum, Belvedere, etc. Students tag, make semantic ties; can analyse this.

The next class of those tools – that LAEAP the divide – are built on this E/A/P framework, emphasise dialogue around knowledge building (see DCLA paper from Monday). How dialogue is used as a frame for decisions. Provide a structured environment, with a time flow. Understanding how students co-construct the frame. Pragmatic stance. Will use NLP tools – MALLET, WEKA – exploring the discursive context. Discourse not just as a context, but as a creator of context. Subject domains, fluid property that discourse allows. The structured environments mean we can mark off sections and give feedback – e.g. say of a node, you might want to consider this, or use ideas from another node, etc.

SOCL – Microsoft social research platform, released the dataset. In SOCL a search creates a post or a node, you pin objects to it, in a public space. We can access the tags, what they’ve accessed, who they’re linked to – the usual social network data. There’s also commenting. It doesn’t allow us to see how they were framed prior to the search. Example of two brothers debating whether aliens built the pyramids. Will explore this dataset using exploratory talk, and accountable talk, to see how people build knowledge together. How they engage in interthinking – Littleton and Mercer term. That kind of talk has particular epistemic implications. It doesn’t preclude disagreement, but they must be accountable.

Analytics questions addressed: How do users interthink to construct and solve problems? How do they co-construct their information needs? (In structured and free-text environments) And how do they use each other?

LA and assessment regimes implicate epistemologies. Use of data for particular conversations/assessment/AfL is key; it’s not context-free. View of LA to start conversations, encourage learners to sense-make around their own data. Highlight some design implications of exploring that epistemological assessment, pedagogy triangle.


Chair: Looking at explicit and implicit behaviours?

In the context of the structured platforms, get students to make explicit their thinking. Those kind of platforms encourage students to think in particular ways, prompt them towards accountable talk, or making particular sorts of connections, explicitly. Implicit, we now have access to some of that data, can trace that much more. Interestingly, in structured systems that ask to make thinking explicit, it’s harder to track, it encourages thought in particular ways and to interact with others.

Jim Simon(?): Reference frame on epistemic beliefs and behaviours. Environment that encourages certain kinds of collaborative or intentional interactions. My work has a model to encourage classroom to work as a community. Not uncommon in Toronto. Intrinsic interest in getting students to believe and act that way. Challenge walking in the door – they’re not coming with that epistemic set. Some want to beat another kid and get a better grade. Implications in this LA to help them adopt new epistemic positions to learn about why they might be coming in to the classroom and repurpose themselves in the learning narrative? Not so much how we measure epistemic beliefs, but how can we change them? Need to understand first or they are mistfit to the design. Why they think they’re in science class. If you have this measure, you should be able to use that as input in to an epistemic treatment – that’s what I’m after.

Answer depends on whether what we’re looking for is LA that are radically disruptive, or are evolutionary. There are answers for both. On the former, it could be that the assessment system is wrong – they come in thinking that way because the system is set up that way. Our job is to challenge that. I think that has big legitimacy, but others might not think that. Fits well with the triad. On the medicine, is it evolutionary side. Exploring providing prompts – e.g. you’re talking about this, have you thought about that; or previously, you engaged in this kind of discussion, maybe you could go for that. The lit around exploratory talk and associated dialogues – Rebecca may talk about this – it’s pretty strong as a pedagogic tool. Students learn better when their talk is accountable to each other, when they explain their reasons, they try to build common knowledge. That’s the case now within the current educational system. And system that’d support and encourage that more widely would be fantastic. Trying to encourage more dialogic education within the current system.

An Evaluation of Learning Analytics To Identify Exploratory Dialogue in Online Discussions

Rebecca Ferguson, Zhongyu Wei, Yulan He, Simon Buckingham Shum

Focusing on how dialogue, speech is used not just to convey knowledge, but to build and construct it.

We’re looking at discourse. Why? From a socio-cultural perspective, it’s crucial in the development of knowledge. How people engage, relate and explore ideas. Neil Mercer’s work, began in schools in f2f, now going online. They identified three main ways of talking:

  • Disputational dialogue – don’t want to see it, no meeting of minds
  • Cumulative dialogue – more positive, put ideas in, come away having learned something, quite happy to see
  • Exploratory dialogue – most educationally valuable – sharing ideas but also challenging, combining, exploring them.
  • How can you find exploratory dialgoue automatically, and how can you encourage it.

Paper to LAK 2011, took chat from 2d online conference in Elluminate. Manual analysis, finding cue phrases that might be indicative. We could distinguish some parts, but very labour-intensive. Not scalable, which created a problem. So talked to computational linguists, asked them how could move on.

They spotted three problems: The dataset is too limited. Text classification problems are typically topic driven, but this isn’t. Nevertheless, both dialogue features and topical features need to be taken in to account. It is topic-based, could have a good exploratory dialogue in context, but not useful for followup.

Need a self-training framework. One you can do self-training from instances – give examples to classifier, say go find things like this. It starts to go wrong. “I think …” is fine “I think that this is … a proof” vs “I think you’re stupid”. So need to use features. Look in more data, split in to unigrams, bigrams, and trigrams. Average across examples, get more detailed. More reliable. But not taking topic in to account.

Taking account of the nearest neighbours of every turn; it’s the ones with the most unigrams, bigrams and trigrams in common. Explore the k nearest neighbours. Calculate value for support for the pseudolabels generated.

Also look at cue phrases from pilot work – 94 – very precise, but recall bad (human coders missed examples).

Needed more data. Picked up chat data, 10,500 turns, about 9 words per turn, from MOOCs. Got to postdocs to code it. Label obviously exploratory, and obviously non-exploratory. And subcategories for exploratory. Subcategories not useful, they overlap, and it depends how far you’ve got in the conversation. Coders did agree on exploratory/non-exploratory (mostly); only took examples when they agree.

Applied method to that dataset: train initial classifier, apply the trained classifier to un-annotated data (swapping around through all combinations); use self-learned features to find exploratory dialgoue, use cue-phrase mathcin, k nearest neighbours etc – redid it repeatedly.

  • Accuracy – Pilot 0.5, new 0.8
  • Precision – Pilot 0.9 (manually coded), new 0.8
  • Recall – Pilot 0.4, new 0.9

Fine in computational linguistics … but what does it mean in LA terms? We can take a big dataset – 2, 3, 4 hours of conference. Want to know where the interesting things are happening. Could act as a reflection tool for people involved.  Example vis showing exploratory/non-exploratory over time, bunch 10 turns at a time. At the end, the last 20 turns are not exploratory – that’s all the goodbye/thanks. Similar at the start – lots of hello/good to see you. Bumpy bit in the middle, then a long exploratory bit towards the end.

Another vis – individuals plotted exploratory turns vs total turns – shows who’s individually doing more exploratory dialogue. Issues though – may be discouraged, might be e.g. had issue with a microphone. Or saying I was the nominated welcome person, I did my job perfectly. There are nuances within this. So need to know how to uses this to motivate and guide, not discourage. Also need to build visual literacy. It takes some getting in to to understand them. Also participatory design – not just researchers making decisions but learners and teachers.

Working in the middle space. I come to this from discourse analysis and education, others from computational linguistics. We found this tough. It’s not just tough framing the questions, it’s tough to interpret what you’ve done. Finding out how to present that to a mixed audience is a challenge. It was difficult, but really worthwhile.

We’ve developed and tested a self-training framework. Built annotated corpus. Ideas for how to support educators with it.


Greg: I used to be at CMU, did a lot of research on this. Difficult to compare, very different methods. Hard to replicate. In paper have kappa for agreement, seems kind of low. What kappa did that get for the precision, recall figures, what kappa does that translate to?

The only kappas I know are in the paper. The linguists were happy to go for the exploratory/non-exploratory, but wanted it higher. Didn’t take all their examples, only where two coders agreed.

Greg: If take computer as a third rater, what kappa does it get?

I don’t know what kappas it’d get.

Greg:Like essay grading where computers are doing as well as humans because humans aren’t very good at it.

Yes. Could be like that. Room for improvement in the human coders.

Q: Using markers other than dialogue for classification? You might use language differently to me. Might be things about the way that Rebecca’s talk seems more exploratory, not just n-grams but phrasing. Vs tone of my messages might be very different. Considered those?

Yes, that’s a possibility for the future. When Neil Mercer and team have worked on this, was across several countries. Including Mexico. When worked on developing exploratory dialogue, grades do go up. Though it’s to some extent cultural, to some extent it’s about more. It’s in English at the moment. We’re pleased that often in language processing you have to take quite formal correctly-spelled language. But this has nonstandard spelling, usage, emoticons, but still gets significant results. But take your point.

Chair: In Mercer’s work, does he talk about how other sorts can lead to exploratory talk? Or are there elements in the others that lead to exploratory talk?

Neil would say cumulative is valuable in and of itself. Doesn’t have to progress. Happens a lot. I can sometimes see value to disputational dialogue, others can’t. I say sometimes you need disputational dialogue, others strongly disagree. Usually Neil’s team have found you need specific support to develop exploratory talk – a few guidelines can move learners and teachers on.

Chair: Automatically detecting those other sorts of talk?

Not for now.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries.

2 thoughts on “LAK13: Thursday afternoon (9) Discourse analytics”

Comments are closed.

%d bloggers like this: