LAK12: Tuesday afternoon – 6A Empirical Studies

Liveblog notes from session 6A – Empirical Studies (2) at LAK12

Vancouver B.C Aquarium "Beluga Whale - Making contact"

Germán Cobo, David García-Solórzano, Jose Antonio Morán, Eugènia Santamaría, Carlos Monzo and Javier Melenchón:
Using agglomerative hierarchical clustering to model learner participation profiles in online discussion forums

German talking, from UO Catalunya. Distance ed, adult learners, web-based environment; continuous assessment; asynchronous.

Looking at workers (active, visible), lurkers (active, invisible), shirkers (inactive, invisible). SNA – great diversity of parameters are useful. Chan et al (2010) clustering users – popular initiators, populat participants, etc.

Goal of this project is to identify participation profiles among learners. Number of patterns is unknown, so use Agglomerative Hierarchical Clustering Algorithm to find them.

Two stages – first, activity in reading and writing, group similar. Second, group clusters in both domains.

Key in the clustering algorithm is how to decide distance between two clusters. Output is a dendrogram-type graph. Not a final partition of the data – can still see the structure. Can cut or process the dendrogram for the final partition. Need to manually set thresholds and/or parameters.

Example of clustering in a synthetic dataset. It’s hard to justify automatically. Calculate inconsistency of the links – set a threshold for that, ad that generates the clusters. But again, have to set manually. Calculate consistency of links within the dendrogram. Identify the cluster with the most consistency, then isolate that cluster and recalculate, select most consistent cluster with that, cut again. That isolates the original grouping without manual setting of thresholds.

Dataset – 672 students, 3 semesters, Telecoms Engineering. Participation in discussion boards was not mandatory. Withdrawal rate about 37%.

Set parameters to explore the writing and reading domain, mostly ratios. (Not clear how selected.)

Found two dendrograms – one for reading (5 clusters) , one for writing (3 clusters). Main cluster of writers was those who have done no writing at all. Second stage was joining students in the same clusters, get a final set of clusters – then labelled as workers (high, mid and low level), lurkers (high and low level) and shirkers – who did very little. Match clearly to withdrawl and passing rates. (High is good.)

Two stage-analysis stragey proved suitable. Different kinds of working and lurked were identified.

Want to increase to cover more domains – look at rhythm, latency. And develop the tool.


Someone 1: How quickly can you start on the clustering after people start working? WHen can you cluster them?

G: We consider the whole semester. You can take the first week, and then try again. (Not analysed difference between those.) Interested in a soon-as-possible predictor.

Someone 1: Can you find the trolls? People who write the bad things.

G: [laughter]

Someone 2: Similar to what we’re doing in our research group. 1. With cluster analysis, found a lot of student activity is not just different in quantity, but in quality; what the breadth vs depth of messages shows different understandings, what they see as their purpose was. What are they thinking they’re getting out of it? Exmaining these clusters as different ways of approaching it?

G: We could take other parameters, like length of message; we haven’t done that, because we have no access automatically to the content of a message. We chose the parameters because they were interesting from a SNA perspective. It would be interesting to add more qualitative parameters.

Someone 2: If you look at quantiative parameters, some indicate different things. Deep vs broad, interpret in different ways. Might be something interesting in there.

Someone 3: Curious why consider this an experiment? You weren’t manipulating anything. Also, how much measurement error – when recommending not requiring that they do it?

G: It’s not mandatory, but due to the nature of the subjects, we propose activities, students ask and answer questions. If you don’t do that, you are missing the rich information in the classroom. People who do that usually fail.

Someone 3: Would be more of an experiment if you have three different casses – recomend, highly recommend, and compare.

G: We have more subjects we could explore.

Agathe Merceron:
Investigating the Core Group Effect in Usage of Resources with Analytics

Motivation around the LMS. If we know our learners better, and how they use those learning resources, we can improve our teaching and their learning. Building a tool to analyse usage data stored by the LMS. Independent of LMS, with own data model, and be useful for different stakeholders. Including non-computer science specialists. Need to understand the analysis, so stakeholders draw the right conclusions.

The tool is in the process of development.

For many courses, see a picture where there is a drop in usage of optional self-test exercises over the semester. Observed elsewhere in the literature – Hershkovitz and Nachmias – not a uniform drop.

Investigate core group effect – is it the case that those who use an exercise are a perfect subset of those who did the previous one? (Conversely, is equivalent to asking whether any students skip exercises and come back.) Is the usage of the exercises predictable, or not?

Pragmatic hypothesis – no need to check all the rules; just check local rules (this exercise vs previous) – no mathematical justification, except that the global rule will give a lower bound.

Dataset: two courses – Introductory Programming, Formal Basics of CS, (57), 46 in both.  See the drop-off.

Local rules confidence about 0.6 – predictability about 60% to 80%. One course (60%) doesn’t have a monotonic decrease – there is a small increase for some exercises, so need to reverse that rule.

Global rules – about 0.7 to 0.5; students adopted different strategies in both. One course quite highly core-group-effect; the other less so.

To persist with the exercises, depends on the course – it’s a state not a trait. Individual students taking both courses persisted with exercises on one course but not the other.

When should the teacher intervene? These rules not enough, look also for impact of using resources on success in learning.

Someone 1: When do students have access to these tests? Any time during the semester? Can they catch up late?

A: In principle, in one course possibly, but they were only open in a time window.

Hanan Ayad: [I didn’t hear]

A: Looked at impact of self-evaluation test on final mark. In one course, where there was the trend, impact of self-test is higher than in the other course. May be also students adjust strategy to how they feel the

HA: Perceived benefit by the students, if they feel there’s an impact during the course progression, that may motivate them to continue.

A: There were assignments on one course and not the other.

HA: You mentioned viewed vs submitted vs finished. Do they get something in return when they submit it? Do they get correct answers?

A: They get feedback and good explanations, only if they submit it.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.

No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries.

One thought on “LAK12: Tuesday afternoon – 6A Empirical Studies”

Comments are closed.

%d bloggers like this: