Liveblog from the first full day LAK14 – Wednesday morning session.
Matt Pistilli welcomes everyone. We are international – 25 countries, all 6 inhabited continents. 140 institutions, 237 registrants – largest yet. 13 full papers, 25 short papers, 11 posters and demos, 9 doctoral consortium posters. Thanks to sponsors: blue canary, Purdue, Desire2Learn, Intellify Learning, SoLAR. Supporters: SoLAR, Wisconsin, Gardner Institute, Purdue. Thanks all those who have helped make it happen.
Abelardo Pardo takes over to talk about the submissions. 45 full papers, 35 short.Expanding in posters, demos, doc consortium posters.
Stephanie Teasley takes over. This is our fourth conference. Many – most – of the people in the room are first-time LAK attenders. We’re not from the same tribe, but what we can do together is bigger. Introduces Art Graeser, who is bridging boundaries already.
Keynote: Art Graeser
Scaling Up Learning by Communicating with AutoTutor, Trialogs, and Pedagogical Agents
Asks how many people are going to ITS in Hawaii – none. Educational Data Mining in London – a few more.
Thanks for inviting me. Discourse processes, building computer agents to hold conversations with learners in natural language. Apprenticeship learning, between the apprentice and the apprenticee. Industrial revolution, put people in the classroom. Information revolution, have conversations with computers. AutoTutor and similar systems.
Building systems where you can have agents that help students learn. They don’t have to be visible, could just be recommendations or suggestions in print, but I’m interested in the visible, embodied ones. How do you built these? AutoTutor been built for 17y.
Challenge: Design dialogs, trialogs and groups of agents to help students learn that are sensitive to SCAM – social cognitive, affective and motivational mechanisms. Not just cognition, knowledge and skills.
Functions of agents – help when learners ask, but people aren’t good at self-regulated learning and don’t ask for it. Conditional prob of seek help given they need it is about 3-5%. Also used as a navigational guide – complicated interfaces, agents suggest and help with next step. Pairs of agents modelling action, thought and social interaction – people learn by observing. Also adaptive intelligent conversational dialog participants, that’s AutoTutor, understand what the students say and respond intelligently. They can be peers, tutor, mentor, and others – demagogue, adversarial.
Overview of talk. Examples of agent learning and assessment environments. AutoTutor and other Memphis agents developed in Inst for Intelligent Systems. Learning gains and emotions. Finally, trialogs in Operation ArIES and ARA – games being commercialised by ?Pearson.
First example. Lewis Johnson, Tactical Language and Culture Training System – learn language in Iraq, also in French. Have a scneario with many agents that helps you learn a language, so you know how to socially interact. Speech recognition, game environment. 30 scenarios where you interact, learn the culture as well as the language. This won the DARPA 2005 outstanding tech achievement award.
Betty’s Brain – Gautam Biswas and Dan Schwarz. Try to create, as the learn, Betty’s Brain, a causal structure, involving physiological systems. You build a concept map. You read in order to build this concept map. You can ask the system, Betty, questions, and see if she answers right, and if not, modify the concept map. There’s another agent that can give you guidance if you’re lost. Then you take a test, quizzes, if perfectly, you’ve constructed the brain right; if errors, have to go back and modify it. Impressive learning gains, many layers of agents, activities.
Center for the Study of Adult Literacy study. In the US, there are 70m adults not reading at the right level, can’t get a decent job with decent salary. Big center to help them learn better. How do you help them read? A portal where they can go and have training to help their reading. At a deeper level – 3rd to 8th grade level to be even deeper. Interface, handheld iPad/Android, with agents around it. A tutor agent, and another student agent, who interact with the person – talking heads trying to guide and interact and help them learn at a different level. Maybe they don’t even know how to use a keyboard. Also practical activities, like reading a drug montage, asking them what is the use of a drug. Examples.
Video: visual hover cues and drag and drop. Who is the protagonist in this story? Click the name. “John, do you think that’s right?” “I don’t think that’s right.” (Quite slow and stilted language, it’s text-to-speech stuff.) Another sample: adaptive interaction, a game – Jeopardy version. (All quite simple yes-no type stuff.) Uses relevant examples, so things like signs, drugs, job application. There’s about 150 of these.
Testing these out across the country. Ready in 15 months.
Also doing K12 work, working with David Shaffer on AutoMentor in Epistemic Games. Human mentors, want to substitute having a computer mentor instead. Land Science is a computer game, urban design planning firm. Virtual site visit. Use a mapping tool, change the zoning of parcels on a map. They create a plan, and make a formal proposal. Live mentor conversation in a chat tool, who model how professional planners work, and push students to reflect on their work. One point, if students interact by themselves they might have fun, but you need a mentor to get them to do more productive stuff and justify their reasoning. Automated mentors will allow this to scale up.
From learning environments to assessment environments. Educational Testing Services, ETS, to make VR environments with the agents – three amigo environments – you the human plus two agents. You interact with each other conversationally. Question is, testing on English language learning, whether they can read signs and take actions reflecting the signs. E.g. carrying a water bottle past a sign saying no food or drink. There’s a doofus agent, and a smartypants agent. Hold the conversation through the world. This is how they’ll assess speaking, listening, writing and reading, all in a 20-30min interaction in the virtual world.
Idea, is low ability person has short, inaccurate or irrelevant responses, and violate social norms. Higher ability, lengthier turns, accurate, socially appropriate.
Also doing it in science. Example of volcanoes, placing seismometers, looking at data, making decisions, responding to alerts that come up. That’s science assessment in the future. Interacting with agents as you wander through the world.
VCAEST doing this in the medical field.
This is reaching PISA and PIAAC – Program for International Assessment of Adult Competencies. Assesses countries by literacy. Some countries invest in education based on placement in this list, if go down may put more money in.
PISA – Program for International Student Assessment – 15yos – US low on maths literacy, but average for literacy.
Focus on problem solving in these – theoretical frameworks on how to assess problem solving. One on collaborative problem solving, to assess it, have two or more agents. It’s a process where two or more agents (human or not) solve a problem by sharing understanding and effort to come to a solution.
Case is: agents have permeated the world, starting with 1:1 conversation all the way to assessment at the international level.
Was skepticism – how can you have a computer simulate a person. Computers are different from people; people are different from people.
First system they built, to help students learn by holding a conversation in natural language. Have to comprehend it and adaptively respond. Started out analysing what human tutors do. 10y of research analysing what actual human tutors do: videotaping, analysing in excruciating detail. Don’t want to do it again. Looked a tutors in middle school, college, tutors in math, science, published studies.
Some of the things they don’t do: fancy pedagogical strategies – e.g. Socratic tutoring, modelling-scaffolding-fading, reciprocal teaching, building on prerequisites, sophisticated motivational techniques, scaffolding self-regulated learning strategies.We just came up empty (looking for these).
Tutor communication illusions: they don’t have a deep understanding of what students know. It’s approximate. Tutors don’t ground-reference, the feedback accuracy is not good at all – typically positive fb after major errors rather than negative because it’s polite. Not aligned in discourse – more the tutor throws out information. Tutor thinks they articulate somethign that’s understood, but it often isn’t. But not even very good tutors are effective at this.
So think we can do something with a computer tutor.
AutoTutor – main question, requires 3-5 sentences to answer. Have agent, student inputs the language. Two versions – one speech recognition, one typing, doesn’t matter.
AutoTutor asks questions, presents problems, evaluates meaning, feedback, facial expressions, nods and gestures, hints – very difficult, prompts, adds info, corrects bugs/misconceptions, answers some questions – but it’s not good at that; beyond computer capabilities as yet. But students don’t ask many questions.
Example in physics, chosen in part because hard to find teachers of physics. 3700 teachers in Memphis, 0 who have majored in Physics.
Question – eg. does earth pull equally on the sun, and explain why. Not just answer, but the explanation. Have expectations – set of sentences you hope the articulate, also misconceptions you may see. It may take 50 to 100 turns back and forth to get there, while that happens, all the vague, scruffy fragmented information from the student is collected, compared to the expectations, and respond appropriately. Do have parsers, but they’re not vey useful – over half the utterances are ungrammatical anyway. (!)
Main question, answer, also pump; give hints when missing something. Or prompt to get them to say one word. Students often don’t say much. AT is based to get the student doing the talking, not the computer doing the telling. Maybe they give misconceptions, correct them.
Need to classify speech acts – asking a question needs different response to expressive evaluations, or ‘I don’t know’.
Managing one turn in a conversation is of interest. Follows a structure, typically just responds with short fb – Yeah, right, Okay, uhuh, no, not quite. Then advance the dialog, then end turn with a signal that transfers the floor to the student. (Needed to keep it going.) Question, hand gesture, other.
Demo of this – the fishics tutor – Big Mouth Billy Bass doing the talking to a human. Absolutely hilarious seeing the fish saying “What else can you say about the packet’s velocity at the point of release?”. Eliza for learning – easy to give the illusion of comprehension. (This is actually pretty good that way, I’ll totally buy that level of claim for it.)
iSTART as an example, Writing-Pal, iDRIVE – ask good questions (most questions are shallow), Philip K Dick android, HURA advisor on ethics, MetaTutor on self-regulated learning, Guru for biology, and DeepTutor for physics. Also including simulation.
The learning – we’ve assessed these systems in many studies. Metaanalysis, human tutors have effect size of 0.42 even when unskilled, compared to ?school. AutoTutor 0.80; ITS about 1.00 – more realistic about 0.8. Skilled human tutors, not enough studies. Meta-analysis of skilled human tutors, there’s less than 20 studied.
Favourite study, in physics, random assignment to four conditions – read nothing, read textbook, AutoTutor, Human Tutor. Learning gains – Human Tutor and AutoTutor well in the lead, human just edging it. Read nothing beat read textbook! (NSD) Test was the force concept inventory, requires deeper reasoning. Do learn shallow knowledge form the textbook, but not deep.
Replicated in other areas, comptuer literacy, critical thinking – again, do nothing is same as read textbook, if you assess for deeper learning.
AutoTutor 0.80 sigma compared to pretest, nothing, textbook; 0.22 compared to reading relevant textbook segments, 0.07 compared to succinct answer to question. 0.13 compared to delivering speech acts in print, 0.08 to humans in CMC. No real difference between AutoTutor and a human tutor. -0.2 vs enhanced with 3D simulation (i.e. adding that is better).
Tracking mastery of a core concept over time and tasks – see whether a student on e.g. Newton’s 3rd Law, over 10h training reliably. Map concept over time, see whether correct or not. All or none learning is rare – go from not learned to learned – that’s a fiction, rarely happens. It’s more variable, contextually influenced.
Emotions and learning
Track those, look at a lot. Looked at mixture of learning environments – Incredible Machine, others. Big six emotions – boredom, confusion, delight, flow, frustration, surprise. Not much happiness, sadness, rage, fear. These are learner-centred emotions. In assessment environment, also get anxiety. Track those, have AutoTutor be sensitive. Confusion is the best predictor of learning. That’s the point of a teachable motion.
Track these automatically – face, speech, posture, dialogue. Most of these can be done by just 2 channels – discourse channel, and the face – esp at the point of confusion, and dialogue history, that helps you ID the emotion. Most systems have cameras so this is possible.
If you take discriminating boredom vs flow vs neutral, the face is not where it’s at. Face is the same for all of those. Teachers think students are in the flow experience when the student is bored – that happens a lot. Teachers aren’t trained on this.
Half-life of emotions, dynamics – e.g. drone who’s always bored, hopelessly confused, emotional rollercoaster – have tutor respond.
One experiment, compared unemotional AutoTutor to a supportive, empathetic, polite AutoTutor, vs a rude Shake-up AutoTutor, confrontive and face-threatening. Many votes for the shake-up tutor. Adults like the shake-up one. Best to time it – go for standard until problems. After 30 min, go for affect sensitive one – 30 minutes for low domain knowledge. Shakeup after 1h for high domain knowledge (maybe).
Operation ARIES and ARRA
Trialogs in learning – two agents plus a human. Three types: low ability, vicarious learning; medium ability, tutorial dialogue; high ability, teachable agent. Want to know which agent is right at what time.
Jerry: Curious about how you gather content to feed this system, the efforts of a real live teacher moving in the direction of training these systems?
Art: Where do you get the content? Build it aligned to standards, Common Core, variety of others. Practical world, teachers aren’t going to use this unless deals with NCLB etc. Other answer, we have authoring tools, right now spending time perfecting the authoring tools to where a human teacher can build this content easily. We have, we’re almost there. They build it in English, they don’t have to be programmers. One step requires expertise – building regular expressions. Symbolic computational linguistics. Vision is you have people create these materials in English, questions and answers, then a computational linguist in there to form the regular expressions.
Q: Efficacy of the 3D element, better than AutoTutor. Personal VR headsets, Oculus Rift bought by FB. Presence rather than gaming may be the killer ap for VR headsets. With Facebook involved, could AutoTutors merge with humans, a hybrid, tapping retired demographic combined with AutoTutors.
Art: Yeah. These, like Google Glasses. They don’t interpret the environment, they just store it. A lot of what goes on in the real world is uninteresting. Videotaping a party in California with a stationary camera. It was a great party. Watching it, just a bunch of people at a table talking, could only watch it 5 min because it’s boring. A lot of the environment is unexciting. Human tutors can help you interpret it. How a tutor would, as you watch a world, be commenting; could automate it – computer recognises something and suggests on that basis. Computer tutors can help human tutors – if they can interact with a system and see it, good for professional development. 900 tutors analysed, a lot of them aren’t well trained – most of them. The computer can help for professional development.
Stephanie: Take in the answer, don’t watch the party, be the party!
1A. Discussion with Art Glaeser
Q: What’s the end game with this? Closing 2 sigma gap? agent on in every remedial institution, solving the scale problem? Where’s it going?
A: Many directions. Mini Khan Academy thing, how many people used this thing. Khan clips with agents, each one might last 5 or 10 minutes. Out there for anyone to use. Another is more coordinated, a course like MOOC-based courses, or even commercial courses. I like the smaller chunks of stuff, dynamically used, adaptive. Could be free to anyone, or part of a commercial course.
Q: What’s the scale – 1000s of them? Physics, takes 100 units of work, American History, how much work is that? 99 units? Or what?
A: Working with Army Research Lab on scaling these up, Generalised Intelligent Framework for Tutoring, GIFT. First book has appeared – gifttutoring.org. Next one is authoring tools, then assessment, then scaling up. Authoring tools are the key challenge on how to get people in any different field to author these materials. Have a script, licensed by ETS and others, to get people who are not computer scientists but subject matter experts to create these materials in English. Two major challenges, can get them to do a lot. One is regular expression problem, the other is rulesets. Examples – you might want introductions. Suppose want to introduce each other in a group, the number of rules that guide that is about 40. So if four people, say what’s your name, what if someone doesn’t volunteer their name, or no answer; takes about 40 rules for just 4 people. Normal materials developer is not able to unravel those 40 rules. So now, we have about 20 conversational modules where they can just drag that in there, all of those rules and link them to referents. Maybe that’ll be easier, that’s what we’re developing now. Another, asking a deep question and having an answer. 20 conversational modules they can drag and drop and then tweak.
Maria: Human tutors aren’t as good as you thought they were. How do you get them to develop the rules that are better than what humans do?
A: If you do have tools, with good exemplars, they can learn from that. We build these systems to help students learn, fascinated by the possibility of teacher training, professional development. Can do this subtly – you’re a teacher, want you to critique our materials. But in the process, they actually learn new expertise. That’s a model we’re thinking about for professional development. Mark, dealt with 900 tutors in Memphis with no training. There’s evidence that some tutors do more harm than good. Question is how you get them trained. This is one way of thinking about it. You have the authoring tools, ask them to advise us how to improve them. They learn from that while they’re doing it.
Q2: One opportunity – talking heads reading from a script but no captioning. Was that a design decision? Make the experience more like a person, rather than accessibility?
A: What would go on the caption?
Q2: Dialog caption, like closed captioning. You could have the text the tutor is saying.
A: We do have histories and captions in some of them. Two answers. One, there is some research that having both printed and spoken version can create split attention effect. They don’t work well with each other. If learning materials, with spoken voice over is better than print up there at the same time. We’ve done experiments on that. It could turn out the opposite – two different modalities is better than one, some like spoken some like reading, or they reinforce. However, some research indicates it’s not so good. It’s an option. We struggle with this with our adult learners. If all spoken, they won’t learn to read. Worry that sometimes we put the stuff in print, and the agent says take a look at this sentence – that forces them to read it, but if they can’t, they’ll articulate it orally. Right now struggling with that issue. It’s a big issue.
Q3: How many international students do you have? They have accents. Have to translate their own language. When I used voice machines at the bank, they don’t understand. Oral tutors, so …?
A: We’re doing work with Educational Testing Service, worries a lot about that. It’s English language learning assessment, have to worry about that. They have all possible languages and dialogues and how to accommodate that. Provocative parts of speech recognition, if taught naturally, better than if oyu try to emphasise things. Overcompensation can harm recognition on speech recognition systems. On a different project, we’re looking at speech recognition in classrooms between the teacher and students. Tried many forms of speech recognition, Google is the best and most resilient to different dialects and languages. It is an important issue. Google has the best system, and ETS, they have to worry about it in a high-stakes way to make sure nobody is compromised.
Leah: Language, international issues. Question about slicing the ideas in a different way, thinking about cultural difference. Cultural use of technology. I was aware as you gave examples, the rules for introductions, which vary within even English-speaking cultures. Rules for introductions for NE US students probably wouldn;t work well in Scotland. Or styles of feedback. We both thought Billy the Bass said it was right when the students gave the wrong answer, because we misinterpreted the response. Done any work on this?
A: We have. Short feedback and politeness is a big part of it. There’s a tradeoff between accuracy of feedback and politeness. We’ve struggled a lot with this. We’ve found in tutoring that lots of tutors are polite, so when student says something wrong, they say “Okay”, not the verbal form, it’s the emotional reaction. “Yes!” vs “Yeeeaaaah” (drawn out), really that’s negative feedback. Students vary in how responsive they are to that. The intonation, we’ve looked at short feedback in excruciating detail. Cultural differences too. Japanese business transactions, in Japan, when they say yes, that’s being polite.
Leah: Depending how it’s said.
A: US people think they made the deal, the person from Japan was just being polite. These are all issues. In NY city, can tolerate a lot of negative feedback. In parts of the South, you’d never say that. These are all issues. We’re approaching it, in the beginning stages, see how they respond to different types of feedback, and if their emotions change and the seem insulted, shift to more neutral or positive feedback, and give more content. What if you have all neutral feedback, with conceptual fb to guide them better. Do you really need the valenced pos/neg part, could be just ‘Okay, I hear you’ – but many students want the feedback, just tell me right or wrong! Different styles. Haven’t built one that’s perfect yet. Want to be sensitive in the early phases of the conversation, how their emotions respond, and adjust accordingly. Sometimes you want to push the envelope and be abrasive, face-threatning.
Caroline: Mastery of core concept over time. Integrate in to that about where they should be at 1st, 2nd stage; is there commonality?
A: Imagine in physics, 100 core concepts. During course of physics tutoring, we track those.
C: have some concepts before you go to the next stage
A: Yes, common concepts and misconceptions. We have a big list. As you give a new problem, it may lend itself to some core concepts and not others. But if it’s on deck, maybe concepts 12, 42 and 98 are relevant, you see by their acts of articulation/actions whether they’re correct. Can keep that big map over time. Hopefully, statistically they’re going to be honouring those over time. We have a big grid of correct concepts and misconceptions. Over time as they work on different problems, track it. Operation ARRA is on critical thinking in science. We have 21 core concepts – e.g. correlation does not mean causal, you have to operationally define your variables. Tracked over 20h of the game, there’s 6700 measures. Funnel those in on how much students get that accurately, apply it when a concept is relevant, how much they can articulate. How much time it takes to instantiate the right concept. We look at time on task, discrimination, generation, scaffolding required, over a 20h period.
Q4: A new set of data around students and their reaction to their learning experience. Facial recognition really exciting for me. Is that done automatically? Are there tools that would code these human emotions dynamically? Or is it human coders going back in?
A: Early, did human coders to train the automated classifier. It homes in only on confusion. That’s the most important. We don’t have a general one. Looked at assessments on the automated one. For things, there are two emotions worn on the face, confusion and surprise. Surprise you can’t do something about – maybe you want to keep them surprised. (laughter) But confusion – there’s a zone of optimal confusion. That may vary with their traits – some you might want to keep confused. Others will get frustrated, and then in to boredom, and they tune out. Want to be dynamic and adaptive. Confusion we have nailed, we don’t worry about surprise. Frustration is usually hidden. Prefrontal cortex puts inhibits, if you start throwing things around people don’t like that, so you hide frustration. There’s a smirkiness to frustration. Dialogue gets to frustration quicker, so you rely on the dialogue history and action history for that (e.g. hitting buttons quickly). It’s boredom and flow, we can detect flow by fluent activities. If your productivity is going, on a roll, you can pick that up. Boredom we’re trying to get good detector. Timing is relevant. We have a nice boredom detector on the decoupling of the complexity of the material and the time you spend on it. If you’re reading, it takes a certain amount of time. If they’re real quick and it’s real complicated, they’re disengaged, want to do something, ask a question to get them re-engaged. Each one has a special case. The one that needs the face is confusion.
Q4: Our courses are async, online, don’t have students f2f, but that’d be another dimension we could just track with a camera.
A: Lots of systems have a camera, not quite ubiquitous. There are elements of the dialogue that you can detect confusion, but the performance is not so good.
Q5: Across cultures, found mapping between facial expressions and emotions to be consistent?
A: Eckman’s work on universal emotions, fear happiness surprise, some say more culturally specific. I’m convinced that some of the emotions are universal. Confusion is one of them. When you’re confused, you (wrinkle your brows, eyebrows), like you’re looking at things more precisely. Like reading something in small print. Confusion in the face has vestiges of that, maybe biological history to that. Surprise is universal. You don’t have to train a child if they are surprised. Wider eyes. That’s universal. Things like how to deal with frustration, is socially influenced. As before, people often hide frustrations if they’re properly socialised. Maybe in other places it doesn’t matter, perhaps in NYC. Boredom, people try to hide their boredom. Some people just (flop, slump). They haven’t been socialised in hiding their boredom. Other people fidget. It’ll be culturally sensitive.
Q6: Zone of optimal confusion. I know you’ve done work on creating confusion. What about learners who don’t (thrive on that; I missed a bit here).
A: We’ve published on this. One is, it’s good to plant confusion. Contradictions between agents – that act can cause higher learning. There’s a role of confusion, whether it’s a mediator or direct cause is under debate. Creating cognitive disequilibrium is good. Let’s say they’re confused. Question is how long to leave them there. If they’re an academic risk taker, high self-efficacy, you want to prolong the confusion and hope they solve it on their own. For those who may be more impetuous, low self-efficacy, you may want to come in there sooner with a hint. Right now we don’t know. We don’t know the exact parameters and the timing of how to manage the confusion productively. We want to create circumstances for confusion. Impasses, problem solving, want them to self-regulate to handle any situation at any time. But there’s a long route to that point. That’s at the horizon of research right now. We know confusion helps, best predictor of deep learning, not shallow. Manipulating can help that process. Finding traits of their cognitive aptitudes, how that interacts, we’re trying to figure that out right now.
Q7: Can learners tell agents shut up, you’re confusing me?
A: Interesting. Some frozen expressions – like shut up, get out of my face – store those and know how to respond. If you think they like you, you can come back and say, you shut up and listen to me. But if not, you can back away. Need production rules for handling all that, to accommodate the situations. We haven’t had circumstances where the human has actually said that. We have had students who get up and yell at the agent. But they haven’t said shut up. But have said “you’re wrong, you’re wrong!”. If you can get to that level of animation, you’ve succeeded in some sense, but what you do about it is a question. There’s a role of a bantering, playing interaction with the agent. Cultivating that would be great. For engagement, a typical prude tutor, very matter of fact, very factual, just gives the right answer, affect blunt. That gets boring after a while. However, if you have a way of engaging, playing with them that can keep the dialogue alive. How to do that adaptive to their personality is a key part. Different styles of tutor, easy to change with short feedback. Have a prude tutor, a rude tutor – the adults like the rude tutor more. You could have crude tutor. You could have a nude tutor, learn by striptease. A lewd tutor. (?!) It’s not too hard to modify the system for these styles. Our IRB probably wouldn’t allow the nude and the lewd tutor. You can imagine that tailoring it to their style would be good for keeping the dialogue going.
Q7: Strip poker tutoring!
Q8: When tutor gives feedback, does it take in to account the mastery of concepts, and if it does, how does an educator define those rules when they’re building content?
A: Works fine if system already has expectations and misconceptions. Novel things, generated by the human that’s not on that list, it’s not handling. Someone has research human tutoring, human tutors don’t pick up on novel things from students either. Poses an interesting question – how does a tutor learn? I’m convinced they learn from experience. What they do, they get a case they haven’t figured out, take it home and puzzle it out, realise student had a misconception, takes time to deconstruct it, then they’ve learned from that. Always wanted to build an AutoTutor that learns from experience. There’s so many projects we haven’t been able to pursue it. I’d like it to learn. E.g. when it didn’t do a good job with a student. Find the features it’s missing, add it to the tutor. Or even do it automatically. Students may want to articulate Newton’s law, F=ma, can express it in many ways. We have LSA and other analysers to analyse the meaning. If high enough threshold, can store it as another exemplar, so over time get bigger corpus of how express F=ma. Then periodically reorganise its semantic space, so the AutoTutor is learning. If it’s new concept from whole cloth, not clear how it’ll get that. Human tutors don’t get it, takes a lot of reflection.
Q9: More on educational psychology. Lot of work in the animal behavioural world about interspecies different personality traits that may be different evolved survival strategies. Some people more reflective than others. Talked about individual learner differences. I believe more we understand neurological differences, we’re going to do a better job of handling them. Are you familiar with this? Very little overlap between this work, there really are people wired more as hunters than gatherers. Few bringing this to learner realm.
A: Military that looked in to learning styles, found they didn’t predict learning, ended up being a dead end. Not clear they looked at the right style – e.g. visual or verbal, that didn’t predict much. Might be a style on biological proclivities on whether you’re more an exploratory type, maybe that’s a hunter. Some people have higher situational awareness, bigger span of attention, that’s different form people who are more focused. Does have a relationship to emotions. When in a positive mood, have broader span of attention; negative affect it’s more narrow. Mood can predict whether you’re going to do well on a task. If you need precise analytical concentration, maybe a negative mood is different. Biological analysis of species, similarity, I’ve listened to talks but not done things directly myself.
Stephanie: Excited about info about learners when they come in to the class. Example where appropriate style depended on higher or lower skill students. Future where tutors ingest data we know about students before the task – the tutor could know someone’s math SAT score, high school GPA, which courses already taken. A lot of data we could feed in to personalise the learning. What level can the tutors handle this information?
A: Great question. Computers will do a lot, lot better than humans. It’s the combinatorial problem. Say 10 attributes, high or low, that’s 2^10 options. Computers can track that, but humans can’t do 30 students times 2^10th. They may end up perserverating on just a couple. Personality theory, people see people on only 3-7 dimensions.
Stephanie: Human tutors aren’t the gold standard. Humans aren’t going away. Is the increasing ability to input this data about the learner help to get the tutoring system to learn? More personalised?
A: I do know the army is interested in this, the Dept of Defence. They want to keep history of the learner, personalised profile based on the past. Want to store it, use it to guide them in the GIFT tutor. Also promotion, career development, use that information to recommend the next step. Making use of this information would be tremendously useful. Hasn’t been enough research to see how that knowledge helps. Have seen what human tutors do. Digital teaching platforms: you have all the information there, would think teacher could adapt, but they’re so bewildered they don’t have time to do it. That’s where the computer can help. The whole history, digital learning portfolio, that’d help. It’s stuff to be mined and minded.
Q11: Combinatorial explosion, only works if you can list the important factors. Humans good at abductive, incorporate new factor on the fly. How do you specify the right factors ahead of time? Either going to invent abductive reasoning in a computer. Or what are the factors going to be?
A: I wanted to say, take humans analysing other human’s personalities. Research finds any one person only uses 3-7 attributes to classify people, you could use 100.
Q11: Humans are good at it!
A: Are they? They agree about it. Have implicit personality theory. Classic example, describe others, reflects more the describer than the target. The accuracy of these is suspect. When a teacher evaluates a student on attributes of the student, there’s a question of what they’re looking at, and what the accuracy is. I’m not convinced their accuracy is spectacular. It’s a small set, and the accuracy is suspect. Computer can do better. What we know from research on tutoring, students can express misconceptions, and they miss it most of the time because it’s not on their radar. It’s not, people think human tutors can pick up so much more, they can work with them individually. THat’s not what the data say.
Q11: Human tutors can pick up things that are not there, outside the system.
A: My claim is human tutors do that every once in a while, but they don’t do it frequently. The fact they do it some puts them at an advantage. Takes a very vigilant human tutor to say, I realise now that student has this misconception, I’m going to pay it more attention in the future. I don’t know whether a computer can do that. Maybe, if you apply certain data mining procedures, would it deduce new misconception it didn’t know about that maybe it should. Or maybe it’s not capable.
Q12: What limits do you use for agent tutoring, if any? Helping students write better. Cna the system help with those procedural tasks?
A: My colleague Danielle Macnamara ?), worked on Writing Pal, helps them with that. Also looking at emotions that help them write. How many people here really enjoy writing? (A few.) How many excruciatingly painful? (Many) Especially under time pressure. Group activities, summarise in 10 minutes. That is hard. It’s a good place to study emotions. Writing Pal helps people. LSA essay graders, very promising. In olden days, the current essay graders do it equally to expert graders. Anyone skeptical about whether computers can grade essays well, if you don’t know they do it very well, you’re not reading the right literature. ETS and ?Pearson have *Very* good ones. Students spend 2.5x more time just with simple feedback. Analyses cohesion etc, not just mechanical spellchecks. Used to be, turn in a paper, by the time you get feedback, have forgotten it. But here, immediate, can also figure out if people are gaining. That’s very motivating.
Caroline: Assessment would move on to a dialogue, to show the best they can do rather then one-off first time?
A: I hope so. In the future, you’ll never know you’re being assessed. It’s always learning and it’s adapting, you don’t have to go to places where you’re Being Tested. Whether it does is an interesting question. Sometimes high stakes is good, people are at their best. But for most of it, not reason not to track, open environment, see how good they are, improve themselves. That’d be good. Are we there yet? As long as there’s politicians I don’t think so.
2A. Panel. Setting Learning Analytics in Context: Overcoming the Barriers to Large-Scale Adoption
Setting Learning Analytics in Context: Overcoming the Barriers to Large-Scale Adoption. Rebecca Ferguson, Doug Clow, Leah Macfadyen, Shane Dawson, Alfred Essa, Shirley Alexander
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.