Plenary: Making a market for learning analytics
Chairs: Taylor Martin
Speaker: Stephen Coller (Bill & Melinda Gates Foundation)
Wants to share perspective of someone who sits with the Gates Foundation and looks for areas for investment. Talking later about opportunities in the analytics field. Introduction to field came through Roy and the open/closed systems architecture needed. Closed as in feedback loop, open as in multiple contributions. Many conversations led by private companies with black-box solutions to the problem. Want to invest in this space and this community, see lots of opportunity. Web-scale learning will not occur unless we better understand students and their learning, and that requires analytics.
Not trying to be provocative and saying LA is all about for-profit. But markets-based thinking might help the Foundation contribute to the success of the community.
Essential assertion from a (paper/book in the 90s?) by Elmore ? – there’s really only 3 ways to impact learning gains and the performance of an education system. Three areas, and they’re interdependent/cyclical:
- change the rigour of the content
- increase the knowledge and skills of the instructors
- change relationship between student, teacher and content, level of engagement
Learning analytics is essential to improving this cycle.
We have not focused enough attention on what students are actually doing (in K12 mostly) – traditionally mostly instructor actions to instructor actions, but very little on the content rigor and student performance. Bias away from understanding the outcomes of the process. A mismatch between the ‘buyer’ of an educational experience. Opportunities for improvement in the content too.
The task has to match the challenge. LA has an opportunity to help. The fidelity and the rigor of the tasks for learners is important. Also instrumenting the task to lead to improvements in the task itself.
How might market forces contribute? Microeconomics diagram – relationship between students and institutions and the products (degrees, other sources of credentialling); resource markets (includes data). Demand for greater evidence of the return on investment.
Working with NYU, launching a centre for digital research. Asking what they should stand for. They’re looking to engage with commercial vendors. Universities taking high ground morally by taking the interests of the students. Learning gains, student experience are key.
Use case/scenario. Students saying ‘I am paying a Sxxt-load. What are my gains? Where am I heading?’. Institutions need to make this case better and reflect it back to the audience. Product market – ‘tasks predict performance’ – specific, aligned product to address satisfaction. Includes degrees. Looking at jobs market, graduate challenges – degrees, preparation for entry in to the market – is going to become more diverse. Signs of a market developing. The MOOC space – though could debate whether they’re a new mode or just better courseware, we’ll find out. Minerva, credentialled degrees but different premise. Cohort based. For-profit, aimed at high end of market, for individual fee-payers. We’ll put together design challenges, cross-disciplinary teams, select campuses around the world. Deluxe offering. We think that type of approach will trickle down in to underserved segments, like e.g. community college. Somewhat duplicative of 4y college. Opportunity for students to go through tasks co-developed with e.g. individual employers or employer groups. Not a bifurcation between general and special like in Germany. But more alignment between university experience and the experience in the job market. Institutions need to respond to the demand – ‘How do I evaluate student outcomes? How do I assess and improve instruction?’. Fundamental questions they should’ve been asking all along, but incentive wasn’t there. But online learning offeres opportunity to at least equal the current experience, if not surpass it – because you’re generating data that you can use to improve your instruction. Resource markets – e.g ‘What is good content? What is an effective task?’. An increasingly tight relationship between user experience and mediating role of data in that process.
What role should foundations play? 2×2 matrix – Economic incentive for suppliers low/high, vs demand nascent/established. Established demand, low economic incentive is ‘invest directly in demand and supply. Nascent demand, high economic incentive – foundation do little. Established deman, high incentives – strengthen incentives for performance and innovation. Nascent demand, low incentives – increase connections between demand and supply. Learning analytics is down in the nascent/emerging demand space with low economic incentives for supplies. So foundation looking to increase connections between demand and supply – e.g. buyer consortia, content and application sharing tools and sites.
LAK at Leuven. Topics were really interesting. Not a lot of discussion around application of this tremendous work to particular problems in the classroom or the study hall. Interesting and puzzling. Opportunity for the community to take an applied mindset, show how work is contributing to retention, learning gain, etc, over time.
- Where are the quickest wins in the user experience? Where can LA demonstrate gains?
- Who is the principal customer for this service? States? Institutions? Households?
- Is this an embedded or stand-alone product? Could there be a market for LA and diagnostics, direct to a student who’s struggling. Josh Jarret grant for Ed Ready tool, simple, brings together scores for 4y colleges in an area, help student understand where they stand relative to the scores that’d qualify them for entry to that institution. Over time, service might get smarter, not just suggest scores and here’s your probability of entry and time to study left, but might also suggest tasks you might take to reach expertise Or is it embedded – LA as contribution to a broader offering. MOOC activity could go two ways. May look like correspondence courses in a different guise. LA may just make MOOC environment smarter. Will always be embedded in an experience.
- How might the Feds help? As a foundation, nervous about two things: One, are we causing harm (like medics)? Two, the government, who have same effect for better and worse. Perhaps hold system accountable for e.g. learning gains. First shoots in Tennessee. Funding formula tying evidence of outcomes for metrics (access for minorities, retention, employment) to portions of state funding.
- What are the barriers to innovation holding the market back?
- How can communities like LASI and platforms like Globus add up to a Commons? What steps remain before we have an enabling environment? Apply it back in to the learning experience.
Marie, SRI: About your trickle-down theory. Things aiming at the higher echelon of students somehow trickle down, be viable players in lower socio-economic area where there may be no market. How do we adapt those tools when maybe students at the lower level come in without the motivation or engagement to engage with those resources.
I didn’t mean to suggest it’d function as a trickle-down, though often markets work that way. Will see it at both ends. Minerva trickle-down. But activities like Tennessee see innovation at the low end. Disruption in community colleges. Couple of organisations seeking funding for alternative credentialling in community college space, combo of remediation and acceleration. Your point about engagement and motivation is key. The students are not bought in to the idea of the value of the degree. Secondly, institutions don’t account for metacognitive challenges. Early work in lower California, dealing with lower income groups, affinity building, spend 4-6 weeks on common task, take them through metacognitive and behavioural tasks. e.g. suspicious, defensive, inadequacies, perceive lack of skills. Take them through a storm and norm period, as a cohort, basic skills around persistence and engagement. Then specific academic subject matter. Would like to see growth in that space, universities and colleges aren’t doing it now. Want to see what others can inform the engagement experience. Set up a prelude to a traditional academic course.
Yvette: Interested in how you talked about data being distributed at the household level. South Korea, crazy about education, especially in southern Seoul. People move to certain neighbourhoods to get better education, have students from better households. Huge polarisation in terms of neighbourhoods, drives up real estate costs, large disparity between districts. An application being developed showing average scores in their neighbourhood – positive implications, but could it encourage a disparity.
Yes. We had a debate about this point with Bill. On the one hand, audience sold a bill of goods, you don’t get much information. Engage a child around how to finance education increases chances they’ll apply. Requires investment, but correlation is there. Audience has been underserved. At the same time, the American Dream has shifted. Book out that says middle is being hollowed out of the economy, which has profound implications for the reality of your prospects. Not all will become Silicon Valley billionaire. We indulge ourselves in this belief and underserve people who just need a leg up. What would be the impact of launching Ed Ready and showing how they relate to cut (?) scores – would it bum them out? It might. But current service is giving them a dis-service. Conversation has to become more data-based and honest. How it’s delivered is a separate issue. Lot of work that non-trad colleges could do to tackle engagement and motivation before they get in to the subject. Finally, nature of the neighbourhood. Mercer Island in Seattle, outpost for Microsoft. Have a nice place for cheap rent. Everyone around lives different lifestyle and expectations. Have higher tax base, local schools very good. Wonder about experience children have – homogeneous, generic. Not convinced it prepares them for complex problems they’ll meet. Tramlines towards qualification for trad 4y private education. Not sure prepared any better. Look at Waldorf Montessori system, very different approach, could be comparable or superior. Spend equal time focused on development of the whole child, social skills, complex thinking. Curriculums much more project-based. Lot we might learn from the alternative school system. Should the data get out? Yes. How? Begin a conversation.
Phil ?, author: The principal consumer of the service. Last chapter of my book on the value exchanges around data, teachers, parents. Important to look at how people exchange data. Who is the principal consumer? There are multiple ones. How do we increase the perception of value for different stakeholders? Different values for teachers, parents. A way to move beyond the testing good/bad to differential values.
Yes. Particularly if you can tie it to funding. Has to be some economic incentive to act on the data presented to you. In our funding, link to performance. Linkage to actions on the data – doesn’t exist today. Differential values is an important one. Immediate locus for improvement – are the resources in front of learners effective? How do you define it? Take Khan Academy, brute-force A/B testing to determine e.g. sequence. Over time gain information about which are important at which points in the instruction. Are resources proving effective? What is an effective intervention? How do we define pre/post gains and the contribution to it? Much to do before we can instrument it. Skills model and progression. Very interesting work, lot of logic to be written in that area. I’m very interested in how this community can stand up examples of improvements to the isntructional experience. Interesting things will then happen. Students will be highly motivated. Current experience is pretty lousy. Instructors notice, they care about what they do (or their TAs do), will be excited about contribution data can make to improvements in learning gain for students. Finally institutions, except progressive ones that are looking at how to profitably distinguish themselves from others. Platform ecosystem, win over one provider, the others topple. Adaptive tutoring will have a big impact, will become more data-dependent. Experiences increasingly rich and sophisticated. Looking to jump the gun and establish themselves as data-driven. Students really do need a better deal. Debate about student debt will be a big contributor.
Plenary: Data mining & intelligent tutors
Chairs and speakers: Ryan Baker & Ken Koedinger
John Behrens introduces the speakers.
Two quick talks. Thanks to many funders and colleagues. Big effort Pittsburgh Science of Learning Center.
Why Learning Analytics? Most of what we know and learn is outside our conscious awareness, thus ed design driven by intuition is potentially flawed. Data is the answer to that problem, need data-drive models of learners. Cognitive models can be automatically discovered and improved using data mining. Use in A/B tests to improve learning. Close-the-loop experiments test and enhance.
You’ve had lots of experience with English, you know English. Do you know what you know? Our brains are capable of tacit pattern learning. Richard Clark – task analysis of experts, experts only able to describe <30% of what they know. You can’t design well for what you don’t know – iceberg.
Algebra cognitive tutor systems – early study, paper-based quiz, set algebra problems – one story problem ‘as a waiter, Ted gets $6 per hour …’ vs word problem ‘Starting with some number, if I multiply it by 6’ vs equation. Math educators say story or word is hardest. But it’s the equation that’s hardest for students. Expert blind spot – algebra teachers especially think equations are easy.
Used that to create cognitive tutors. Starts with authentic problems e.g. cell phone plans. Fill in tables, ITS follows along and gives them advice if they e.g. forget a constant. Asked challenging Qs like where do these plans intersect. Can get hints, feedback messages. Two forms of adaptive instruction – adaptive to student’s strategy. Having fine-grained interactions in the gui makes for very rich data vs q&a items. Students progress on skills are tracked, used to adjust learning process.
Widespread use – 2d/week typically, 600k students per year, evidence of enhancing student learning. Still too many decisions driven by intuition.
Ed tech plus wide use leads to basic research at scale. LearnLab – support social processes to help researchers get access to schools, design high quality studies. 370 ed tech data sets in DataShop. 280 in vivo experiments – A/B testing. Lots of good data-driven learner modelling. Not just better prediction, but understanding. Better assessment, models of cognition, metacognition, motivation. Better models of what’s hard to learn.
Close the loop – Design, Deploy, Data, Discover. Example – problem decompostion planning. Learning the math but also about learning to be better at how to do it.
Use the data to build a learning curve – opportunities to practice a task (on x axis), vs error rate (on y axis). Doesn’t work if ‘geometry’ but finer level of skills – e.g. circle-area – a smoother learning curve suggests your model is better. Data is problem steps, what competence that links to. Logistic regression model with extra slope component. Shapes of the curves are diagnostic – downward sloping good, low long curves – remove busy work. High rough curve – something wrong with the model.
Method that automatically discovers the models – punchline was improved RMS fit on cross-validation set. (at last EDM).
Same method but different difficulty – different planning needs. Get better fit to data with improved model, which suggests ways to redesign tutor, which suggets how to create new problems to work on knowledge components.
Compare different models in general use – more efficient 25% less time, and better outcomes. Paper at AI-Ed conference. Learning analytics does make a difference. Intuitive design is not reliable. Need data-drive learning models to improve the science and practice of learning.
April: If we’re playing in the DataShop, and we see a high rough curve. Can we look at the problems and specify our own theoretical KCs and split apart?
Yes. YOu recode data, upload it, can see if it’s better.
Judy: Curious whether these analyses differed across cohorts.
Yeah, great question. 400 datasets, domains, settings. You should try it out! There are differences.
Student engagement in an ITS, and how we used it to predict other stuff that matters.
Context is ASSISTments. Web-based math tutor for middle school math. Breaks down tasks with hints. 50k kids used 2012-2013, growing exponentially. Great evidence of efficacy – better than trad homework, classroom practice – large scale RCT ongoing.
Also engagement detection for other things – InqITS, EcoMUVE, SQL-Tutor, Aplusix, BlueJ, Cognitive Tutors for Math, Genetics.
Many perspectives on engagement – looking at affective and behavioural. Infer it solely from log files. Great sensors – galvanic skin response to fMRI – coming soon to a classroom near you! But avoiding those makes scalabilitiy. Fancy physiological sensors break, especially in schools.
Off-task behaviour – completely disengages from environment/learning. Gaming the system – intentionally misusing software. Careless errors – making errors despite knowing the relevant skills. Affective states – engaged concentration (flow) boredeom, frustration, confusion. Method – get human assessment, sync to log files, use data mining to develop models that can do it automatically.
Human assessors. PROMP 1.0 protocol, Android app HART – reduces student disruption. Observe with periopheral vision, side glances, hovering over, 20-second round Robin. Inter-rater reliability kappa about 0.8 for behaviour, 0.65 for affect. 43 coders trained.
Building automated detectors. Sync log data to field observations. Distill meaningful data features, develop classifier.
4 algorithms tried – assess using A’, Cohen’s kappa. Student-level cross-validation for generalizability. Also population-level validation.
Detector creation data set – 505 students, 3621 observations, 6 schools. Diverse population – highly important. Detector may not work in other contexts. Have evidence showing if we built model on suburban students, it didn’t work on the urban ones, and vice versa.
Models are Ok – about 1/3 as good as humans. But can use it on everything.
Nice to be able to detect these thigns – but does it matter? Engagement and learning. Apply detectors – affect correlates with state scores over a whole year. Interesting patterns – if they’re bored when they get scaffolded, that’s a sign of them being smart. Gaming the system is important. Also longer-term outcomes like who goes to college. New England middle-school students in 04-06. 500k problems, predict whether students attend college 6y later. A 9y longitudinal study in 3y! Off-task behaviour didn’t matter. Gaming fairly strong effect. Attendance matters – if you don’t go to school, you don’t go to college. Could predict 69% of the time whether they’d go to college. It’s not best predictor ever, but it’s actionable – e.g. if they’re gaming system, we know they can do that. Carelessness interacts with knowledge in an interesting way (smart careless people still get to college, but not so smart careless ones don’t).
Pilots of interventions now.
Can detect affect, without sensors, in real-time, moderately well. Models generalise to reasonably diverse populations. Models predict end-of-year tests and college attendance.
Costs is about 19 cents an hour of data coded, vs whatever it’d be to do by humans.
Goal is to say, which constructs do we want to use to make a difference.
Sharon: Curious whether analysed temporal dimension of predictiveness – in what time, with what level of reliability. If do richer data collection with sensors, interested in rapid predictive power.
Great question. In some of our other work, have looked at how quickly can be used. Great step for future work.
Neil: Asking how long it takes for computer to make predictions?
Sharon: Yes. Whether, people looking at multi-stream sensors – can I predict 3-5 minutes. Implications for immediate in-class intervention.
To make an assessment of affect, it’s a 20s clip. So we know boredom we can infer in 20s, can give real time feedback. haven’t looked at whether data from 1st week of class. Being bored is bad, haven’t looked at the timing.
Stephen, Gates: Work on affect, have you seen examples of how it influences modes of instruction and actions teachers take. Are you working on visualisations to make it more consumable?
Two great questions. Giving affect to teachers, done a brief pilot. Did visualisations on gaming the system to teachers – punitiveness though. They do it when they don’t get it, so indicator they need help. Visualisation is a good point. Collaborating with SRI folks who know that.
May?: Mentioned model trained on urban doesn’t apply to suburban. Thought of training separate models based on their profile – e.g. different models for urban/suburban, or other characteristics.
Figuring out best approach is a good question. Urban-trained is not better than the diverse-trained one. Special needs might be different, open empirical question.
John Behrens: Ken, how do you see the future of your work, DataShop, very interesting because it’s about moving the structure and tools of the field forward in a way that the Gates Foundation, has overlap with what Steve talked about. How do you see the need for enablement, CMU and other organisations might need to go to move the community forward.
Ken: They’re related. Engage more folks, on university and industry side. Barriers I see, get more involvement. Help to have conversations around standards – e.g. in DataShop, are they adequate. Even in industry, have great discoveries, but are they picked up on. ‘Not invented here’ syndrome. Product and research groups are separate. Overcoming that is a major thing. Don’t want research results just sitting in libraries.
Mike Tissenbaum: Agree the notion of intuitive design is not reliable. There are points where we don’t have data. Some designs are innovative, have to be intuitive.
Ken: I’m usually more careful to avoid either/or. We can’t throw intuitions out. The hare of intuitive design and the tortoise of data-driven.
Simon Knight: On behalf of Xavier in the eroom – how do you think sensors could be used in more open and messy environments when they’re not using ITS.
Ryan: We have implemented those – affect detection built in to e.g. EcoMove. harder challenge. Cognitive tutor data is more structured, easily decipherable semantics. Takes more work, but many successes inferring this things. Process of feature engineering. That’s the secret sauce of this entire project. More thorough feature engineering, involving teachers, affective experts – almost invariably leads to a better model. Workshop tomorrow talking more about this.
Janet: Ken, around intuition, can train expert to be more intuitive. Lit shows expert is the wrong person to develop curriculum. How do we build intuition in the teacher in the classroom.
Ken: Data-driven cognitive task analysis to get insights. Next step is to communicate those. Not just designing more technology. Richard Clark’s demos redesigning med school courses are really powerful. Those aren’t tech implementations, changes in materials and f2f instruction. Data help there too.
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.