Liveblog notes from parallel session 8B Predictive Modeling at LAK12.
Bjorn Gunnarsson and Richard Alterman
Predicting Failure: A Case Study in Co-Blogging
What do we want to do? Calvin from Calvin and Hobbes is the model student. He asks creative questions, and disengages if you’re being boring. Will help me along with the presentation. Will take off his hat when he’s not being a grader, it’s his grading hat.
They want to monitor student activity, model, early detection, before deadline. It’s moving online – hard to watch when you’re online.
Context is a collaborative blogging environment.
Student gets assignment, logs on, reads and writes things down. Not sure what teacher meant, goes to materials, doesn’t help; looks at other students’ drafts. Learning by observing peers. Turn that one in. Get assigned some tasks from the graders – go and critique posts from other students. Make comments on their contribution. Incentive to create quality content, once course is 75% through material, you start the final assignment, go back on previous work.
We built a basic co-blogging system; have list of posts, get previews on hover. Get information about what it contains before you click in to view the whole thing, comments and all. Notification if someone comments, more participation. Basic filtering. Collaborative learning environment – list of posts, task numbers as tags, tutorials. Want to track participation in the knowledge community. Know participation is a risk factor for failure; promoting participation helps student outcomes. Can we monitor participation while and before they submit their assignment. Small data.
Simple grading of posts – 0-3, comments 0-2. Multiple graders, random posts/comments.
Every time user clicks in to read one post, that’s one participation; each time you click through, another participation. Even count multiple views of the same thing. Don’t make assumptions about closeness of reading. Based on that, identify those in danger of failing. See who writes and reads what and when.
Asking the question – why read the blogosphere, vs reading course materials, vs content on the net?
Student perception, did a survey. Generally thought valuable; non-native speakers particularly keen.
What’s relevant? Could track writes, edits, reads. Some composed offline, then just cut-and-pasted in. Reading is reliable online; they don’t save it for offline reading. So that’s the activity that matters. [Is this treating what data you have as the right data?] Also tracked previewing.
Model is previous grades and current participation. Effect (predictive power?) of previous grades becomes higher as you get more of them.
Linear regression model – grade vs participation. R = 0.7, p very low. For each task, correlation positive, significant and consistent.
Creating/applying the model
Full model, grade prediction. 26% get an alarm; 10% of students get a false alarm (would have passed); miss rate 6% (failed but no alarm); actual miss rate 0.83% – explanation that they missed components out-of-band, late turning it in, etc. [not sure about that]
Tracking low participation, works while they’re working on the project; not after, it’s before.
Someone 1: Based on your prediction, expect that if you told/encourage them to participate, they’d succeed – have you done that?
B: No. Early work.
Someone 2: Of the people who got the alerts, 90% failed?
B: No. 2 people wouldn’t have been notified and still failed.
Someone 2: Of those who got alarm, how many passed?
B: 10% of entire class, got alarm but didn’t fail. We didn’t give the alarm, that would’ve biased the grade. [!?]
Diederik M. Roijers, Johan Jeuring and Ad Feelders
Probability Estimation and a Competence Model for Rule-based e-Tutoring Systems
Diederik (at U Amsterdam) is new to being on stage; used to being looked down on in a teaching arena.
There’s a 130-page document with a lot more information than is in this paper. Will show the system, what model the built, how they applied it, then discussion.
The Ideas system
A live demo. It’s a logic tutor. Has a logic expression, asks you to simplify, and it gives you feedback.
Generates a lot of data on student activity and correctness. Intelligent e-tutoring system. Gives hints, present/take a step, complete, step back – and tells you whether the exercise is solved.
Behind the system are a series of rewrite rules, the basic components. These rules are what they want students to learn [to be able to apply in practice].
The system combines rules into strategies using combinators – functional programming idea. Also want to teach these to students.
Ideas doesn’t provide information about competence, learning speed, experienced difficulty of the different rules, how the rules discriminate between students in terms of competency. Want to give that as feedback to teachers and students. Can select exercises appropriately.
Want to include difficulty and discriminativity of rules; wants to be predictive model to help teaching strategy. Don’t want to be too difficult, or too easy, for the learners.
Model to be descriptive and predictive. Involved teachers and domain experts.
Possible models – Bayesian models without latent skills (Desmarais 2011), good at prediction but doesn’t operationalise the concepts; Item response theory, well-known in psychology, reasonably good at prediction, but does include 3 of 4 concepts wanted – so went for the latter. 2PL IRT.
A formula shown! Probability that student will (get rule right) is a sigmoid function relating competence with item difficulty and discriminativity as parameters. If student competence equals difficulty, probability of success is 0.5, perhaps unexpected. Doesn’t include learning.
Could introduce a start competence and a learning rate, let them vary with every rule for every student; or have start competence different for every rule but rate differs for each rule, or whatever.
To select, used simulation. Assume the model to generate data, and ‘recover’ parameters by maximum likelihood estimation – if model is true, can get get the parameters back. Very strong problem for all parameters depend on student and rule; couldn’t recover at all. Letting start competence vary per rule was better but a lot of variance. So for practical constraints, had starting competence and learning rate depend on student only – doesn’t vary per rule. Recovered very well. [I worry this is taking the empirical approach way ahead of what makes sense.]
Another problem -can recover parameters from very large datasets; real-life data was too small – 23 rules, 15 students (only 14 usable), and about 8 problem instances per rule. (Simulation was 100 students, 100 rules, 50 problem instances per rule.) Ran simulation on that – discriminativity all over the place, learning rate very high variance, other parameters easier. 16 would’ve been great, but they only had 8.
Application to IDEAS data
14 students at Utrecht. 23 rules, ranked for difficulty by domain experts. Discriminativity and learning speed too small to draw conclusions.
Start competence as expected – students they expected to have highest got highest score from model.
Difficulty fairly well aligned, though some surprises. One rule ( p ^ T => T) found very unintuitive, unexpected, harder than experts thought (recovered from interviewing).
Is preliminary work only. His thesis has a lot more, and all the data are available as well.
Someone: Hints in the system, how does that effect the difficulty estimation?
D: If they asked, we assumed they didn’t know it, and assume it was applied incorrectly.
Rebecca Barber and Mike Sharkey
Course Correction: Using Analytics to Predict Course Success
Mark is Director of Academic Analytics at UPhoenix. Rebecca now with Arizona State University. Also part of WCET/Gates funded project, presented last year at LAK11.
EDM folks think the LAK folks are too applied … I’m more applied than that. What are the actual impacts on students? We have a large body of students. I’m a data guy. Example xkcd.com/833.
Background to University of Phoenix: private institution, accredited. More akin to Community Colleges – adult learners, not traditional. 70% online, 30% spread across 40 states. 380k students! Centralised curriculum – economics course is designed out of Phoenix, but wherever you study, you have the same core; faculty can tweak. Have an adjunct faculty model. I teach once a year.
What is student success? Engagement (satisfaction), progression, learning. They are orthogonal. Learning is the most important one. Second is progression – do they make it through. Third, are they happy with the experience. Someone could learn, but have a horrible time. The could progress but not learn much. These are different things, with different impacts. Business finance person, operational person, academic person, will have different ideas about success.
Our backend data, we can pull most of it together. What do we do with it?
They want to predict:
- Learning – but don’t collect data at that level. Only have grades; just rubric. Want to get there.
- Programme persistence – survival analysis. Here courses taken in serial, not a semester-base. Often cause is family crisis, others. Could be indicators, symptoms – but we don’t collect data about e.g. children’s health. Can’t do that either.
- Course completion. More confident, aiming there.
What will you do with it? Simplified process is take the data, convert to information, put that in front of someone who can do something about it – student, instructor, counsellor.
Question: can we predict if they will pass their current class? Built model using demographics, schedule, course history, grades. Output on scale of 1-10 – 10 likely to pass, 1 most likely fail. Feed score to academic counselors who can intervene – by phoeing at-risk students.
We don’t have the answer about the best intervention. Can tell model says it’s because of grades, attendance, whatever. That personal bridge will do better than nothing. We haven’t figured it out. We have these counsellors, let’s make them more efficient at their job, target resources.
Used a naive Bayes model, 400k students, 6 months, >1.5bn records. First iteration with linear regression, had a big grey area, middle very large. But naive Bayes did better. Had to split it up in to different models – by program (Associates, Bachelors, Masters), by online vs classroom; week 0 vs week 1+; new vs continuing students (new is first 3-5 courses; behaviour known to be different). Grades become more powerful as predictors over time (of course). Well over 80% accuracy – not great, but good.
Good predictors – scores in current course, credits earned/attempted; delta between past & current scores, prior GPA, financial status. No contribution – date of first course activity (everyone posts on day 1, because required to), gender, military, ethnicity.
To operationalise, put in to data flow. It’s a cycle! (Just like I said in my paper!)
At this point, have done the math; now need to pilot the model, and prove that it helped students. Working with counselors, asking for a controlled experiment – 8 counselors, 4 and 4, same sort of students. One stick with current policy, others use this methodology, see whether this methodology helps.
So can’t yet say it has had an effect on retention. Wants to talk to people.
Someone 1: If I said students can get through without learning.
M: Do you agree, though?
Someone 1: Off the record? I don’t think I really can. Students learn different amounts while still completing. What you’re doing though, looking at completion, has little to do with identifying how much they’re learning.
M: If you got a degree, I assume you know something. It’s a bell curve. For some institutions, it’s true. Others, it’s certification that they spent time there. Love Western Governors model – rather see what you know, rather than what you’re supposed to. Like the competency model.
Someone 1: Researching here is not to do with competencies, it’s about completion.
M: Absolutely. If I had all the data, I’d predict learning. We have our own LMS, building in rubric-level scoring. When I have that, can get much closer. That’s where I’ll shift.
Someone 1: Not only refusing to claim completion guarantees learning, you’re saying it has no predictive capability at all.
M: Not totally independent, they are correlated. Can be.
Someone 2: [inaudible]
M: As you go through, in the class itself, the current information – the scores – starts to get louder than everything else. Credits earned vs credits attempted is a good predictor.
Someone 3 (Chris?): Quick step to side. Apollo Group runs U Phoenix, also have ITS. Talked about EDM and ITS; not seen much here. Some synergies here? Don’t know the relationship between the two.
M: For me, having the amazing work from Carnegie Learning is beneficial. Trying to adapt to writing and grammar. They get down to measuring learning. Pilot in 2010, they have 94 markers. Algebra, can say whether student gets it or not. Gets close to measuring learning, where we want to go. Happy to get their expertise, use that kind of model.
Someone 4: [inaudible]
M: We included discussion activity, not as full clickstream as I’d like. Discussion as an example – our policy is at a minimum, have to post 2 substantive things on at least 4 days, so don’t get much variation. New system will give much more detailed, so may get more dispersion there.
Matthew Pistilli and Kimberly Arnold
Course Signals at Purdue: Using Learning Analytics to Increase Student Success
How Course Signals works – not going in to the model. Three classes of components, weighted differently by predictability. Basically – take student characteristics/academic preparation, plus student effort (sessions, quizzing, discussions, etc), plus student performance (grade book data). Behaviourally-based model. Generates signals – red/yellow/green – which trigger messages and interventions. Displayed to student and faculty member. Interventions vary drastically.
Preparation + Performance + Effort -> Risk level
Results they’ve found – introduction of Signals to otherwise-largely-unchanged course boosts A/B scores by about 11 percentage points; and reduces D/F/W by about 6 percentage points. [Wow.]
Matt – grades are one way of telling whether we’re helpful. Another one is long-term impact on retention. This is not a direct causal relationship!
Effect on retention: No Signals courses – 70%; 1 or more Signals – 87%, 1 only 87%, >2 Signals 93%. Students in the 2 or more category have lower four year graduation rate, but expect to see that flip when they reach five-year mark from baseline. Lower SAT scores are retained more with Signals. Really impacting students coming in less prepared, taking larger gateway courses, and struggling.
Course Signals growth – now at 6.4k seats affected; aim 15k in Fall, 20k later. Mostly at incoming level at the moment.
Usage at Fall 2011: 115 instructors, 80 courses, 180 sections, 17k unique students. Goal to get 20k student seats impacted in one semester. Not yet ubiquitous at Purdue.
Kim switches back – do need to mention that is that we got a grant from the Gates Foundation, they’ve been very helpful.
Grades and retention important, but for adoption also need to know faculty perception, student perception. Aim at empowering them.
Surveys with students who’ve experienced Signals at each semester end, and interviews. Feedback broadly pretty positive. 61% said they thought they got a higher grade as a result. Do they think it’s too big-brotherish? 86% said benefits outweigh the drawbacks. No single student has raised privacy or security concerns, but a few faculty have.
Faculty feedback (interviews) – helps them keep students aware of status; anecdotally they say they can see a difference in failure, performance. Some negative things – mostly I want four colours, or to add another criterion. Mostly about the system itself, not what’s behind. Videos of faculty views online.
Also a Jetpack – search for Jetpack reader (www.purdue.edu/jetpack) – download LAK12 Learner Analytics.
Malcolm: When they get a red signal, do they get additional information? Did mention faculty pushback about increased workload?
K: Interventions delivered by faculty, they do the interventions. Students get signal at their home page, with reasons why, and here’s possible resources to help you. Emails, pulling people aside, is in instructors’ hands.
M: One said in video yeah, get more feedback. It’s a 15 minute orientation. Get more pushback on how system works, want more control at the course level.
Someone 1: How did you pick courses? Check not involved in easier degrees?
K: First of all, chose faculty who played nicely. In STEM disciplines, high DFW courses, not easy per se. Biology course, standard instructor set, needed 90% or above or in massive danger. Was for majors only, if not getting an A, wouldn’t survive major. Was instructor opt-in. Did recruit, talk to different disciplines.
Someone 2: Interested in your speculation. If Signals successful, retain, admit to capacity, what’ll happen to characteristics.
M: On a steady upward trend for last 15y. As an institution, concerted effort to increase quality of students coming in. 4y of math is required.
[had to stop to get to next session where I was on the panel – were more questions]
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.