Liveblog from the first full day LAK14 – Wednesday afternoon session.
Session 3A: Alternative Analytics
Xavier welcomes everyone. A diverse set of presentations ahead.
Sleepers’ lag – study on motion and attention. Mirko Raca, Roland Tormey, Pierre Dillenbourg (Full Paper).
Goal of this research is to give people better idea of the classroom – “with-me-ness”. Signals in the classroom, inferring attention and motion – if I move my hands, am I attentive? Not just immediate attention. Talking about how to exploit the signals.
Comes from eyetracking studies on MOOCs, looking at with-me-ness. Giving just the information, some people are more or less able to identify where to look, but with teacher reference, much more do so. Eye tracking in the classroom, know the position of the eyes, it’s vague in the classroom, especially if just sitting and listening. What signals do we have here? Signal theory, space where informaiton is traveling between participants. People think of information from teacher to students. But students are not sitting in a vacuum. With 50 people in the room, information travels between students. Distractions outside. Also from a person to themselves.
Signal-oriented view of the classroom. Students made a ‘sleepy FL’ vs ‘EPFL’ on Facebook. Social media content, showing student sleeping in the classroom. It’s clearly a signal! Trying to formalise this in to signal processing.
Started of with motion analysis as the basic form of analysis, before judging pose or gestures. Feature-tracking application, motion intensity measure of students over time. Not clear in and of itself, but testing. Have a huge amount of data, it’s overwhelming for teacher. Use modern algorithms to get more meaningful.
Annotated regions for each student, Lucas-Kanade feature tracking, group in to motion tracks. Then to associate with person, fit 2D-guassian probabilities. Person most likely to generate whole motion track is IDed as the source. Every motion associated with a single person, not distributed over multiple people – an issue with overlap of sitting.
Analysed two classes, N=38 and N=18. Four cameras, questionnaires, 10-min intervals, on attention etc. Student reports level of attention (nice normal distribution). Percentage of activities reported – six choices, three productive, three counter-productive. And on-task activities, listening/taking notes etc. Off-task activities are observed even at high levels of reported attention, but it does capture rise of on-task activity. (More attentive is correlated with more on-task activity.)
Teacher-centred positioning, amount of motion from students vs distance from teacher. With the distance from the teacher, the motion intensity decreased. Interesting! Mentally less active, and physically?
Then turned to interpersonal view – student-centred positioning. Proximics theory (E.T. Hall); personal perception of classroom. Three categories: immediate neighbour, visible neighbourhood, other. Classify each pair, randomly in the room, and analyse behaviour.
Synchronisation – borrowed from eye-tracking. Didn’t want to be overly intrusive: complement, not change how it’s taught. Stuck with cameras, not sensors stuck to people. Dyadic analysis of classroom – whether two people act at the same time or not. Like eye tracking, if two people act at the same time have high levels of comprehension and agreement. And how that translates to the motion. Often one person’s motion happens with another person’s, only slightly shifted timewise. Plot one graph vs another and get a matrix. Look at the diagonal, gives you times of motion synchronisation.Try to model the exact and relative synchronisation – is there a lag?
Tested immediate neighbour influence. Saw significantly higher probability of synchronisation between immediate neighbours and any other pair of students. Reasonable finding. No reason to coordinate, just sitting by someone influences your actions. (p<0.05). NSD in sync between visible and non-visible part.
Sleeper’s lag, is the synchronisation idea. Dominant signal from teacher, says take notes, people take notes. But if external signal that both people react to, they start taking notes, some realise as a cue from their peers rather than teacher. On high levels of attention, the average delay was smaller – they were quicker to react. Which makes sense. It’s marginally insignificant (!) and relatively flat. At the resolution of 2s. Maybe need even smaller resolution here. Similar trend in other class but insignificant because smaller numbers.
Signals as a view of the classroom, not going above socio-economic view, sticking just to little box of the classroom. First results of observational system are going in the right direction. Demonstrated an idea for non-obtrusive performance metrics.
Future work – re-test, semester-long experiment with video, questionnaires, interviews. Also focus on face tracking, face orientation, as the social indicator.
And they are looking for a post-doc! In education and machine learning. http://chili.epfl.ch
Xavier: Movements and lag. How sure are you that the delays are due to attention, not because maybe I started to take notes before.
We start with idea that they are telling us the truth in the questionnaires. We’ll see on the large sample. Different things that can be interpreted. The motion is very vague. Do we want to capture hand position, posture as defensive etc. Get as much from this feature and then move on. We’ll see.
Q: What camera technology do you use? Price, obtrusive?
As low end as it can be and not get crappy. Aiming to produce something applicable anywhere. Not high resolutions. Just moved on to HD webcam. Low end section. Tried using more exotic fields, emotional bracelets, we might use in capturing and assessing, but at the end it’ll be webcams.
Q2: Any theories about interaction with the academic content and how that affects their motion?
Started doing pre/post test. One thing in the questionnaire, it looks complicated, it’s just measuring four parameters. One of them was, the attention, classroom perception, teacher perception, and material importance. Correlate between all of them. If there is a correlation with the material importance, we’ll pick it up. Not sure in the long run with just the camera.
Malcolm: Any plans to experiment with other classroom design types – this is benches in rows. New environments, any plans?
Again, started similarly to webcam tech, started with typical layout. Not sure we can venture with other types, we don’t have many at EPFL. Broader, the standard composition of the classroom, typical layout seems to be more productive, others seem to be inhibitive of discussion, may have a reference.
Malcolm: Would love to see that.
Clustering of Design Decisions in Classroom Visual Displays. Ma. Victoria Almeda, Peter Scupelli, Ryan Baker, Mimi Weber, Anna Fisher (Short Paper)
Why classroom displays? Teachers spend a lot of time thinking about what to put on wall displays. They’re available for hours to the students. Hence important to ask whether they’re visually engaging or distracting.
There are large amounts of sensory displays, but no evidence it helps learning. Know that there are links between off-task behaviour and visual design – visual bombardment., distraction.
Literature tends to focus on visual aspect, and features in isolation. So want to examine the semantic content of visual displays. And go beyond that to investigate the systematic design choices by teachers, to find patterns that best support it.
30 elementary K-4 classrooms in NE USA. Fall semester 2012.
Coding scheme – photos of walls using Classroom Wall Coding System. First code each flat object, then classify as Academic/behaviour (e.g. star charts)/non-academic (school policy, decorative).
Labels – English words, descriptive. Also content-specific stuff related to academic topics. Procedures as well, hierarchy of steps. Decorations included e.g. welcome board, or the Cat in the Hat. Student Art. Finally, other non-academic – calendars, fire escape plans, stuff solely for teacher eg. own personal calendar or picture of their diploma (!).
Analytical method – K-means clustering. Systematically tried different numbers of clusters. Used k=4, more just had more outliers. One n=1 cluster present even with k=2! So four clusters.
Then determine distinguishing features for each one. Mean and sd vs overall average of each visual feature. Decorations and content-specific explained most of the variance. Cluster 1, decorations, labels, other nonacademic – help navigate. Cluster 2 similar, but in opposite direction – low decorations, low labels. Primarily private schools. Cluster 3, high content and decorations, only group with lots of student art. Teachers use student art to motivate learning; most likely they regarded visual displays as tools for learning. Singleton n=4, interesting – high content specific, procedures, other nonacademic. Probably a teacher who viewed them as tool for recalling information.
Monte Carlo analysis to see relationship between cluster and type of school. Charter schools overrepresented in Cluster 1, private schools in Cluster 2 – they may think visual displays distracting. High amount of decorations came with charter schools, curriculum emphasised literacy development, so promoted print-rich environment.
There is systematicity in how teachers decorate walls. First clustering outcomes on this, better than features in isolation. Private and charter school teachers decorate walls differently – does that impact engagement and learning.
So future work to look at student achievement, off-task behaviour.
[Limited liveblogging as I get ready to present.]
Data Wranglers: Human interpreters to help close the feedback loop. Doug Clow (Short Paper).
[My paper! So no liveblogging.]
Ace question at the end about what’s happening to get stuff in to the hands of students. I said yes, absolutely, that’s where the best benefit is likely to be, and we’re just starting to get working on that.
Toward Unobtrusive Measurement of Reading Comprehension Using Low-Cost EEG. Yueran Yuan, Kai-Min Chang, Jessica Nelson Taylor, Jack Mostow (Short Paper)
Traditional assessment, reading comprehension, asks you questions about what you’ve been reading. Age-old technique, very important. But problematic, the questions have to be written, they take students time, and they have limited diagnostic ability, especially if just scored correct/incorrect, and we don’t know what.
So looking to build better. Automated generation of questions, detection of comprehension by reading speed or other unobtrusive work, and diagnostic multiple choice tests. This is something new.
Electroencephalography, EEG, brain signal detection, put electrodes on people’s heads, record brain signal activity which is indicative of mental states. Commonly used to study neurophysiology of reading comprehension (where in the brain is activated when you read), and also can detect semantic and syntactic violations (! wow).
But problem with this – lab EEG looks useful, but it’s expensive, expert-operated, and require 30 channels, takes time and gel and faff to put on. Not suitable for classrooms.
New innovation, inexpensive EEG, a hundred dollars or so. Operated by just about anyone, single channel, easy to put on. But tradeoff for signal quality. What can we do with these devices? Is the important data from lab devices lost?
Research labs looking in to this – early study in AIED13, looked at use of EEG devices in a Reading Tutor, looking to see what we can and can’t detect. Primary success was text difficulty, significantly above chance. But couldn’t tell comprehension (whether they were getting questions right). Why not? Small data size, only 9 were children (adults don’t tend to get the questions wrong) with 108 questions. The devices tend to be noisy.
So improvements – methodological, improved stimuli – questions better – and bigger dataset, to 300 questions. And some algorithmic changes to pipeline – new features.
Cloze questions. Story given, fill-in-the-blank sentences. Multiple choice: right answer, plausible (but wrong) distractor, then implausible distractor (grammatical, but silly), lastly, ungrammatical distractor. “The hat would be easy to ___.” clean / win / eat / car.
Deployed ~900 questions, 300 remain after filtering of poor signals – aggressive about that to maximise performance. 78.7% correct, 13% plausible, 6.1% implausible, 2.2% ungrammatical.
Tried to predict from EEG signals whether they got something correct or not. Second analysis, looking at time reading each of the answer choices, trying to predict which type of violation was there. Alas, no significant distinction.
Original was just the context, then the context and cloze question, then choices, then the lot, the segmented, then 4-second segments.
Machine learning pipeline – generate EEG features (alpha mean, beta variance); cross-validated experiments: balancing (undersample), feature selection, evaluation. Two schemes – trained classifier on all but one trial, test on remains; other trained classifer on all but one subject, test on last subject.
Significant results are the segments over the parts where they were reading, but nsd over answering time. Better performance on between classifiers than within – probably because not enough data on the within. Only 60% – significant, but not good.
Why poor performance? Not ready to replace assessment. Maybe not enough data, over-fitting/noise. Maybe the low-end devices aren’t sensitive enough, or the pipeline isn’t making good use of the data. Working on all three of these possibilities.
Conclusion: Significantly ID correct/incorrect above chance – but accuracy not great. But can’t predict the type of answer choice they were reading.
Workshop on EEG in education, at ITS 2014.Toolkit and dataset available!
Xavier: I know you don’t have data, but on instinct, which of the three are the reasons? Devices, data, pipeline?
Honestly, I think it’s a combination of all three. More data would be helpful. We have, we believe it’s possible due to the device. We find a lot of reading-relevant EEG information comes from sensors that are not at the front of the head, so different sensor locations, or multiple sensor locations, or bilateral differences.
Q: What’s the use case for this. I have the fear for students. (laughter)
We invested in a lot of taser companies (laughter). We’re investigating how to improve the accuracy here. Whether we can actually grade the students, that’s not directly we want to replace. But we could aid assessment, it’s likely the student needs help right now. So say get 60% indication they need help here, we might deploy a question and catch this rather than miss it. Something a bit more friendly.
Q: Maybe more formative assessment.
Related work by colleagues too. Looking at how to use this not as a binary distinction but to help other distinctions.
Q2: Why would you not have the option that the EEG data won’t be that accurate? How would you know you could get better than 60%?
We don’t. We haven’t done the same setup with expensive EEGs. It’s infeasible to collect this amount of data with kids in an environment anything like a classroom with all that crazy scifi stuff going on. Other distinctions are out there. Brain/computer interfaces are making strides, we hope this is lack of development right now. This is picking up among labs – hopefully we’ll have more exciting findings.
Q3: Students metacognitive abilities. Here, predicting accuracy of answer. But how confident are the students in their answers? That could be another outcome measure to predict.
Yeah, that’d be good if we had data for that. Looking in to predicting other tasks, other behavioural tasks and labels beyond correctness/incorrectness, but we don’t have the information right now.
Firehose session: Posters & demos
Chaired by Zach Pardos and Anna de Liddo.
Zach introducing. Intellectual entertainment! Stealing this idea from a neighbouring conference. Hard to get a sense of all the posters, exciting new work, established work, easier to convey with hands-on demos or poster presentations. All presenters have one minute to tell you about their work, then go to the posters and demos you want. 18 presenters. Demos, posters, doctoral consortium. Everyone has a poster. There’ll be an award for a poster, and one for a demo. You’ll see paper on the social tables for the reception, with a pink slip and a neon green slip. Green for demo, pink for vote on anything other than demo.
#7 Ed Tech Approach toward LA: 2nd generation learning analytics. First gen, focus on predictions, how soon we can provide an intervention. 2nd gen, more focus on the progress and how we can have student improve learning achievement. We have state of the art medical examination, you will die in one month, what would you do? Not enough to improve your health outcome. Want to see what kind of variables we need to look at.
Patterns of Persistence – John Whitmer. First gen research, in to MOOCs, deconstructing what achievement is, beyond pass/fail. Remedial or basic skills English writing MOOC, seeing if constructs apply. MOOC interaction data, 48k students, entry surveys, exit surveys, see how those inform level of participation. Used methodology of today: cluster analysis for patterns. Looked overall at correlations, use, testing, exam, participation.
Effects of image-based and text-based activities – Anne Greenberg. Post on investigation of potential differential effects of text and image activities in functional anatomy course. Text, image, control – participation on image activities had better performance on outcomes. Image questions less mentally taxing.
eGraph tool – Rebecca Cerezo. There’s a graph, nodes and analysis, represents student interacting with Moodle during a week. Tool developed, eGraph, #1, can demo on your computer. Really easy and intuitive.
Peer Evaluation with Tournaments – E2Coach. U Michigan. Engaging students in creating content, evaluating content. Peer evaluations. Poster is studies where students submitted a game to a tournament, videos, solutions to problems they got wrong on the exam. Do they do a good job? Do they learn? Can leverage a lot of students to generate good content?
National Differences in an international classroom – Jennifer DeBoer. Looking at student data using MLM with students within countries, EdX MOOC on ccts. Result were sig diff in performance for different groups of countries.
Visualising semantic space of online discourse: Bodong Chen. Toronto. Online discussion systems. Using topic modelling techniques, time series analysis, to provide alternative representations fo online discussions to help make sense of discussions better. For e.g., see how topics evolve over time. (From his slide, he’s an R ggplot2 user!)
Chris Brooks, demo on Analysing student notes and questions, for Perry Samson. LectureTools. Backchannel for classroom, clicker, voting tool, tons of analytics. Lots of research, opps to use this in your classroom.
HanZi handwriting acquisition with automatic feedback. Learning Chinese is hard for students and teacher. Tool, HanZi, gives fb, captures sequence and direction errors. Version in Windows, developed in to Android.
OAAI Early Alert System Overview – open source academic early warning system. SAKAI, release the model in standard ?XML format. Data from student information system, LMS data. Data integration processes, then predictive model, identifies students at risk, deploy interventions.
Learning analytics for academic writing – Duygu Simsek, OU. Write to communicate in the academy, deliver our thinking to the research community. All aim to write in academic way. Challenging! Especially for undergraduates. A tool, dashboard to help us improve our writing. Have your paper analysed by a machine like this! (Xerox Incremental Parser)
Zin Chen, Purdue, Social Media Analytics framework for understanding students’ learning experiences. Using social media in the classroom, engagement, collaboration. Studying in the wild, how students are talking, emotional wellbeing etc. Are you using social media? Vote for my poster!
Wanli Xing, Interpretable student performance model in CSCL environment. New rule-based method, machine learning. Theory too. Next gen learning analytics!
Data and Design in online education. Michael Marcinkowski. COIL at Penn State. Use of data by designers and instructors in MOOCs. Dialogue between instructors and designers and students. How they use the data from students to design their courses. Qual research, interviews, trace data. MOOCs!
From Texts To Actions – Maren Scheffel, OUNL. Application of methodologies form linguistics to patterns of actions in usage data. Study she’s doing: bit.ly/LAK14 – participate in my study on quality indicators for LA!
Usage based item similarities to enhance recommender systems – Katja Niemann. Usage data in VLEs to enhance recommendations.
Identifying at risk students in a first year engineering course. Farshid Marbouti, Purdue. Predictive modelling, performance data. See which technique is more accurate. Also how accuracy changes during the semesters, how early can we have good accuracy in predicting grades, so they can do something about it.
Ani Aghababayan -Utah State University. Student affect, ITS, digital games for learning widespread, but very little support systems, this study aims to inform the development of such systems. Frustration and confusion. LA and data mining as the means. No results yet, but time-based sequences of frustration, confusion as indicators.
Sponsors! Marketing manager for their analytics platform. D2L and research foundation. They have free mooses from Canada!
Shady Shehada. Excited to attend.
Crossroads, intersection of learning analytics research, theory and practice. Bridge ML and education. Some people so excited, they rented a car in Canada and drove here to Indiana.
John Baker, CEO, feels privileged to play a role in recognising the LA research community. Foundation to drive important decisions.
Are committed to support community, bridge gap between research and vendor worlds – through support of the research community, and delivery of research-back solutions to the market. Also to being a good vendor role model for LA research. And to support research in a cross-section of education areas, to transform teaching and learning.
Interested in ML, DM, text mining, NLP, social computing, visualisation. BI, analytics, usability, systems architecture, QA, product dev. All integrated.
Research program – Science Behind Success. Two programs.
Research & Field Study Program – kitchen where we bake research until ready to apply.
Early Adopter Program – is where it goes out first. Three products: Insights Student Success System, Degree Compass, LeaP.
ISS system predicts grades, early as possible, so can intervene. 2y ago talked about the predictive engine. How about we build it in house, give the model to the client, make it generic and predictive. Worried assumption is not valid, can’t have one model that covers all courses in one client. So ended up, we provide the chef who cooks a plate for each course. Customised model for each course in each organisation. Challenges and efforts in doing this, especially with the data – cleaning for the model. Outstanding clients helped us bridge this gap. Degree Compass – recommendation engine, personalised curriculum for students, based on interests and strengths. Also LeaP, learning path, recommends a learning path and makes sure it’s adaptive by recommending efficient routes to reach their goals. One looks at the whole curriculum, one at each course, one within the course. Trying to combine.
Products based on research, more research going on. ML library to generate analytics, provide feedback about the product and how it’s used. Accessibility – not just standards compliant but also intuitive and easy to use. Research in content analysis, discussion, course content, using text mining/ML. Link this, with social graph, to the learning objectives.
Looking forward to LAK2015!
Zach takes over, shows the cards for voting. Tempting food or drink, until 4.30!
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.