LAK13: Tuesday morning

Liveblog notes from Tuesday morning 9 April at LAK13.

John Stamper – Tutorial: Learning Curve Analysis using DataShop

Programme. DataShop.

John’s the technical director of DataShop, with a team of people working on it.

The Village Shop (Explored)
(cc) Alison Christine on Flickr

DataShop is two things: a repository, and a set of tools to analyse the data in that repository.

Repository was conceived as a store for PSLC research activity, but now has many more. Three types of research – primary analysis of study data, exploratory analysis of course data, secondary analysis of any data set. Many public datasets; some are not but you can request access – DataShop acts as a broker for contact/permissions.

Analysis and reporting tools focus on student-tutor data – log data. Learning curve reports, error reports, aggregate tools (performance profiler). Export through web services to tab delimited and XML formats. Also open to suggestions for new things.

They have 410 datasets, 104m transactions, 250,000 hours of student data. Biggest open repository of transaction-level data. Some people may have more data, but they’re not making it available. Focused on intelligent tutors and similar, but open to other stuff.

Sales pitch: it’s free tools, free researchers (especially if you make your data public), great place to collaborate (with private datasets – you have control over access), can validate ideas across multiple data sets.

Hands On

Need an account but easy to set up. Agreement includes promising not to try to de-anonymise the data!

Q Is it just ITS data? Or if it’s not a tutoring system – a CMS, say – does it still make sense?

Yes, it does make sense, and it’s not just tutor data now. We have a lot of tutor data, and the data model was conceived from tutor log files. But it’s pretty much transaction data, so if you have that, it’ll fit in. We have a lot of OLI data, it’s more tutor-lite than most LMSes are. But it’s still transaction logs. We’re getting online game data – those are so varied.

Q Are these PSLC games?

Some, but mostly others.

Public datasets (left-hand-side menu)  – many to browse. For Private datasets, you can see the project name and its individual datasets, and the PI. There’s a ‘request access’ button. For the most part, you’ll only want to ask for view access; you only need edit access if you’re involved in the project. Generates an email to the PI, who can grant the request or contact you by email. DataShop gets a report on the emails on how long the requests are sitting there, and prod people if it’s been a few weeks so they at least reject the requests. Steve Ritter has a lot of projects up, mostly math, really large, not for an experiment but for exploration.

There’s External tools that use stuff in DataShop export format (or with a little edit). Only one up at the moment.

Walkthrough – go to Geometry Area dataset.


Problem – a task for student that can involve multiple steps. A step is an observable part of the solution to a problem. A transaction is an interaction between the student and the tutoring system. KC – Knowledge Component – piece of information that can be used to accomplish task. KC model – cognitive model or skill model – mapping between correct steps and knowledge components.

Student working through an intelligent tutor – illustrated with a video. (Exercise of area of the end of a can stamped from a square of metal.) Each student action creates a transaction (tx) – has row ID, student ID, problem name, the step, count of the attempt, what they input, evaluation (as correct/incorrect – can be others, but DataShop tools assume correct/incorrect), and whether it matches a KC. Student-steps table – an opportunity is a chance for a student to demosntate that they have learned/achieved a KC. Observation is a set of txs for a student working on a step – which is this table.

Next, go for Learning Curve (this bit still tabbed interface). Select ‘all data’ – because some datasets are so large you wouldn’t want ‘all data’. Shows the learning curve for that data, and how it fits with various different data models. This example has 20 or so models; most more like 4 or 5, or just the one they were generated with.

Display has error rate on the y axis, opportunity on the x axis. Would expect that it will smoothly, exponentially, die down. Typically see a U-shape – it’s adapative system, so the ones who have learned it drop out of the dataset, and by the higher number of opportunities, it’s the ones who really aren’t getting it. Typically see a set of students who are going to get kicked out. Most adaptive tutoring systems have a fall-out route – a small number of students last to that limit. Can vary the y axis from error rate (most common) – some prefer ‘success rate’ so it goes up (this is in a feature request!). Step duration, number of hints.

Q Can you accommodate e.g. hints when you time out, or hint when you hit the hint button?

No. Can have custom fields on the repository side – completely flexible. But on the tool side, it’s specific to their schema and hints means that. Can create your own tool!

Q Good fit, bad fit idea?

We use a bunch of fitting values. AIC, RMSE, cross validation – stratified/unstratified. He has a paper to EDM on which is best – but they’re pretty similar, really really close. AIC is probably the best if you’re doing something fast, most closely reflects the other RMS errors. In the EDM community most people use BIC over AIC.

Q Is it true by definition that the number of students monotonically declines? (as ‘opportunity’ increases)

(debate about terms) Yes. Can’t leap in at opportunity number 5, can only drop out of the dataset. [here drop out is good! means the student has succeeded]

You can explore individual points – click, and it gives you the data – x/y but also e.g. how many students, which KC. Useful for when you get e.g. a spike in the learning curve.

Can overlay a secondary model too, so can compare the two.

Play/explore session.

[Putting the error bars on is good – almost always instantly explains strange spikes as low sample size.]

All the models starting with ‘LFA’ are machine-generated models, and they tend to be rather good, and better than the human-created ones.

Q A model is a coding of the steps – e.g. this steps represents this KC, this step represents that KC. When you say a better model, you mean better coding of the experiment?

Yes, better coding of the steps. What is the right granularity of those Knowledge Components.

Q Doesn’t necessarily break up what the original design had as different steps in to smaller ones, with different transactions? Say I coded it as 6 txs to 1 step. These machine models, might they reformulate them as 2 steps rather than 1, or just recode?

Just recode the steps. You may have two different steps coded as one KC, it might say one of those is different, or have a different KC. Within any model that’s created, include two KC models, created automatically. One is the single KC model – only one skill in all of the steps. Usually doesn’t fit well. The other extreme is to say every step is its own KC – the unique-step model. Those represent the limits of the KC modelling space. The real model is probably somewhere between those. Tricky because unique-step is a 1:1 mapping, but there could be more than 1 KC per step, so could go further. Additive factors model on that, it turns in to IRT. Trying to find the right mapping within that space.

The learning curves are done on the KCs. There is another assumption – that each step tagged with the same KC is equal. So if you’re doing find circle-area in one problem step, and this other problem requires circle-area, the tool assumes those are exactly equal opportunties to apply that skill. If one is harder than another, that’ll bias the analysis.

Q Could argue that it’s harder, tag as separate KCs.

Do have models that split that out – e.g. find circle circumference in context vs find circle circumference out of context.

Q It’s like adding design elements in a Q matrix

Because it’s adaptive (in most datasets), the problems turn up at different times for different students, in multiple places.

Q Domains where it works – e.g. project-based.

Different opportunities to apply skills, it’s harder to make those equal.

Q Your additive, logistic regression, only supports correct/incorrect answers. A limitation. Coding expts that are designed to be assessed on a binary.

Don’t know that so big a problem.

Q Say I’m dealing with fluency, seeing how fast they attain that, speed of response, it’s on a continuum.

Yeah. Ordinal values, make some correctness assumptions and it does work.

Q If coding teamwork, or leadership, really very constrained. Math problem is either correct/incorrect.

Q In math problems, assuming all are equal difficulty, but that’s not true. In a math course I start with scaffolding, then set again with less.

These do exactly what you’re saying – have to code those steps at a discrete level.

Good to focus on what we can do with it, rather than what we can’t.

Spotting good learning curves

Generally, good learning curves slope down, starting from a highish level. Flat suggests they’re doing a lot of work but not learning anything – whether the error rate is high or low. (High = they’re not getting it, Low = they’ve got it but you’re making them do it over and over anyway)

Also, a U-shaped curve suggests the ones that remained didn’t get it, whereas a flat-at-the-abscissa curve suggests everyone got it.

Sample selector

You can create your own samples – add a name for it, whether to share to other users, what columns, selection by operator (e.g. column X equals Y, column Z greater than 0). But you may need edit access to do this.

You can show error bars, also truncate the dataset. Sometimes worthwhile when there’s some possibly dodgy data at the end (e.g. very low numbers compared to others).

(jokey discussion about data exploration and data presentation – it’s a much better fit if you truncate the data to the bit where your model fits well!)

There’s a LAK2013 dataset that’s marked as private, with John Stamper as PI – request access so can play in the afternoon session.

[I had to duck out at this point to deal with some other matters, not because it wasn’t a good workshop.]

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Experienced project leader, data scientist, researcher, analyst, teacher, developer, educational technologist and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in the education field and beyond.