Liveblog from Tuesday afternoon at #LASI2014 – the workshop sessions.
Microgenetic methods for learning analytics in the R language
Taylor Martin (Utah State University), Phillip Janisiewicz (Utah State University) and Ani Aghababyan (Utah State University)
Taylor – microgenetic methods started 30-40y ago, not just pre/post, want to work in the black box in between. Did interviews, talk to kids, solving problems without feedback. Interview 10x over 10weeks, solve e.g. basic addition, mental math, 1+4, 3+22. Developed shifts of strategy over time. Simple things, kids first starting, give them 1+22 vs 22+1, they count all the way up to 23 all the way. More sophisticated is count on, start at 22, count 1. But if start with 1+22, count on much more. But count-on-from-larger is even smarter. They see these curves where you see a strategy appear, and then go away, then maybe the best one appears, and plateaus. Interested in role of variability. Right before a change of strategy, you see a lot of variability in strategy. Was previously doing the same thing, then lots of variability – could predict they’re going to change into a new thing. My original interest in factions learning, now CS, lots of things. Usually still STEM. Exploring methodologies to study this. Here today, sharing some of the microgenetic, over time analyses we’ve developed in R. Then if time at the end, if you have your own data you can use that, or we have a shared dataset. Everything participatory, but doing it step-by-step.
Hands-on workshop based on outline above, with downloadable data.
First source libraries.R for the packages they use.
Git works nicely with Rstudio. [There’s an integration thing that I’m not seeing, an extra tab ‘Git’ after Environment and History in the top right. Later: Online instructions here. You need to use projects in Rstudio. Easiest way (for me) is just set up Git in your working directory, then create a new project in Rstudio there. Nice integration. Or you can create a new project by checking out from an online repository. Also useful practical points about running your own server as Git host (various available, or just use ssh), because you really don’t want to do this stuff routinely on GitHub because it’s too easy to accidentally release data that shouldn’t be released.]
[A lot of these notes probably won’t make much sense unless you’re exploring the data yourself.]
Their dataset, looking at affect. Observed 5th-graders playing a digital game. Observed them for a certain amount of time, 9 days, collected affect and behaviour data. Affect – concentration, bored, delight, surprise. Behaviour – six possibles – on task, off task. Spatial game that got more complicated over time. Behaviour: OT= on task, OfT=off task, OOC – other or conversation, RH = receiving help, GH giving help. Affect: C= concentration, CF=confusion, D delight, S surprise, F frustration, B bored, E=Eureka. ?= hard to code at the time, e.g. because child noticed observer, or left the room, or whatever. Classification built on Ryan Baker’s one, but dropping Gaming since didn’t make sense here.
They’re interested in the low-frequency stuff here, which is unusual for this sort of mining. They’re just not confused and frustrated very often! These are naturally low-occurence states. Colour-coding to help visualise it. Blue=concentration. Confusion is red, delight yellow, frustration is pinkish. Grey is missing. Salmon is bored.
Plot entropy, see that it tends to converge – most useful to check that you haven’t cocked up your data.
Arules – if someone buys X, do they also buy Y – hamburgers, diapers and beer. If you buy a computer, do you then buy a printer, and then ink. Order is important is main difference from ordinary association rules. Here, it’s in the order you purchased it. cspade – several algorithms that’d apply, generally spade, this one is cspade. Set support and maxgap. Normally, trying to find the most frequent stuff, e.g. support of 0.7, we don’t get any confusion or frustration – mostly they’re concentrating and on task. maxgap is how far apart things have to be; if large, normally more than 1, but because our support is low. If you set it large, computationally too demanding and takes forever.
Clusters of sequences – need to set ylim, could do programatically but basically just do it with max(table(wardsN)).
New dataset! Refraction. (It’s not actually refraction, it’s beam splitting game, that’s about fractions.)
Game is available here: http://centerforgamescience.org/portfolio/refraction/
The data, musical chairs version. They do things, do something that changes the state of the game. First, look at what’s going on with the math. There’s splitters, and benders – right fraction, in the right place. 1/3 splitter, bent all over the place, same state. Within any level, the top number of states is around 70-80, lowest is about 20 or 30. We call it a different state when they take 1/3 off, and put a 1/2 on. Interested in node depth – do they make 1/3, or 1/3 of 1/3 of 1/3, = deeper node depth. Another thing, because of the specific problem, to power a 1/6 or 1/9, if you don’t start with a 1/3, you can’t get there. So data concatenated the same way: for each board state, in order, for each state (and only one level of the game), node depth and whether they started with the 1/3 splitter. Some go right down. Modal node depth is 3.
[Much mucking about trying to get R to do what it did on the last dataset on this dataset. And I got there! (Actually no I didn’t, ah well. But very interesting)]
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.