LAK14 Fri am (9): Keynote and other things

Liveblog from the second full day LAK14 – Friday morning session.

Keynote: Scott Klemmer  – Design at Large


Abelardo introduces Scott.

Scott thanks everyone for inviting him. Interested in learning analytics and online learning, is a johnny-come-laterly here. Has been following my liveblog! (Like all the best people.) His field is design. Really exciting time. Most powerful dynamic is the large number of people excited about making stuff – Arduinos, robots, programming – so many people interested in design.

History of design. Short primer on C20th industrial design. Came from within a few miles of here. Henry Dreyfuss – locomotive design, John Deere tractor, Honeywell thermostat, Bell telephone. WHen inventing the future, prototyping is key.

A contemporary, worked with Boeing, on a mockup prototype of the interior of an airplane. Had passengers board, with real luggage, experience it for the duration of the flight, staff coming through, and so on. That let him debug a bunch of stuff cheaply before they went live.

Dreyfuss’ book, Designing for People. Classic cycle – envision, make, evaluate – iterate the cycle. Any tinkering process is like this. Lots of recent advances in design thinking is about better envisioning, prototyping tools. Lot of the work at LAK, analytics, is on the evaluation side, improving our ability to learn from the prototypes we make. [Again nice clean slides with big photos.]

This led Dreyfuss to anthropometrics: the physical size of the dial, handle, steering wheel is dictated about human ergonomics. So can predict to a degree. With telephone, could be important for kids to be able to use; tractor usable with gloves. Helps us make better first guesses.

With online education, opportunity for this community to impact, in a lot of engineering research we invent the future. We are pioneers. Research is like a trip report from the future. Figure out what’s good there, good restaurants, how you live, and you send a postcard back to people who don’t live there yet. Research papers are classic postcards from the future; but also videos, prototypes. Analytics gives you real power when you move from the lab to the wild.

Moved to San Diego, started swimming. Like moving from swimming in a pool. things are regulated, to swimming in the ocean. Thrill, wonder of swimming in the ocean is similar to opportunities in analytics. Some examples of moving to the wild. Then broader implications of this shift generally. There’ll be a second session afterwards for a more in-depth conversation.

Looking to the future, draw ideas from the past. First lesson, physical space. Co-located cluttered studios are hallmarks of design education. Introduced in 1819 in Paris, endured for 200 y. First is the power of shoulder-to-shoulder learning. AS a CS UG at Brown, had one lab where all the computers were. All the programming done on Unix workstations. Got a huge cohort effect from people being colocated. Friend, when they lost a computing cluster, had a huge effect on the cohort experience.

At Stanford, 2005, first class, brought studio model to design class. It’s been great on the whole. One notable aspect: course evaluations strong on the whole, one element not. How fair is the grading – was in the 13th percentile. Traditional arts school background, great artist who gives you the grade they think you deserve, you take it. The engineering students were accustomed to exams with right answers, you could compare and agree the grade is fair. Really helpful in learning. Led me over years to long experiment with self assessment and then peer assessment.

Two punchlines: 1. Peer learning approaches work when they’re integrated well. Baking pedagogy into software is powerful. We do this when we move online; when you hand it to someone else, they get the pedagogical strategies you put in there, so good strategy for technology transfer. 2. Scaffolding structure is critical and overlooked, for both teachers and students. If you ask do these work well, answer is nearly always, they work great when you do this. Innovative things flop because we forget about the scaffolding.

Other universities using these materials. In Fall 2011, taught on a large online class. How are we going to scale projects and peer critique in to online learning? Multi-year project. Lead student Chinmay Kulkarni. System used in 100+ MOO classes. Guinea pig class was my design class. Others have taken it up – Python, philosophy, management, recipes, arguments, music.

Calibrated peer review process. Step 1, practice assessment on a training assignment, get feedback on that. Brings expectations in to alignment with the class, increases inter-rater reliability. Step 2, assess 5 peers, one of which is staff graded, gives ground truth. Then at the end, Step 3, students self-assess.

Extremely powerful to use open-ended assignments, pedagogically valuable. Embrace real world with many paths to success, important to teach. But really challenging, takes huge amount of grading time. Can’t scale. Machine grading advances are good, but can’t do a lot of things you want to, yet.

When administrators of press they think of peer assessment as a tool for scaling grading and making it more efficient. It’s also important for peer learning. Care more about that than the number-generating equation.

An intrinsic paradox of peer processes: asking novices to do things that definitionally requires expertise. They are there to learn things they don’t know about yet. This is where scaffolding comes in. After assignment, have micro-expertise on that area, can do a great job.

Peer grades correlate well with staff grade. Peers, with help, can provide great constructive criticism. Does this scale to global online classes?

Stanford HCI class on Coursera. Learner composition, has a rubric, with scores, and open feedback. Students may it their own, sharing cool things, interfaces. Amazon curated lists, Twitter chat, LinkedIn group for alumni with certificates. Often students come up with better ways of explaining stuff than I could. One benefit of online, see students doing projects far outside the walls. Example of a classroom learning system adopted by the UN. Really cool stuff.

Staff agree with each other with range of 6.7%. About 36% are within 5%, 59% are within 10%, then broad tail. This is adequate for a pass/fail class. There are errors on individual assignments, they tend to cancel them out. Simulations, did students get certificates and should? A small number got lucky, and 24/1000 didn’t get a cert by grading error. On the numerical side, that’s the big problem, the improvement that really matters.

Online classes, students report how much seeing different people tackled the assignments gives them ideas for their own work. Like in a design studio. Realise there are lots of ways of doing things, not just what you thought.

Assessing yourself at the end – when we do it ourself, in a maker mindset. Evaluating others is a different mindset. When we submit a paper, think we’re geniuses, when when we put on our reviewer hat, we’re total curmudgeons. That is probably a good asymmetry. But good to bring that perspective to your own work. Closing that loop, teaches you to be critical of your own work, and to be more forgiving in the work of others.

Found, consistent with the literature – adding more peers gives diminishing returns quickly. Wisdom of crowds works when errors are randomly distributed. [Actually particular random distributions – some distributions don’t converge.] If you want to do better, need to improve the feedback loop.

So you can let people know how good their grading is. Adding this feedback reduces this error in subsequent assessments, but they tend to overshoot. So now we give numerical score, not just too high/too low.

More important to focus on the qualitative feedback. Challenging, with diversity of language background. Some feedback is minimal or superficial – ‘great idea’, ‘I can’t read the words in the pics clearly’, ‘solution requirement is vague here but I’m excited to see where you take this in the storyboards’ (??).

Return of novices-as-experts paradox. Going to scaffold that, using fortune cookies. Broadly applicable useful advice. We know from HCI, recognition works better than recall. Reason why graphical interface works, all the things you can do are visible. Show nouns and verbs, you pick what you want. This is the same strategy. So we list a bunch of common errors – like you get as an expert teacher of a class. Common failings, or success patterns. Encode these as fortune cookies. This gets you much more detailed and actionable feedback.

Improving assessment – take a common rubric, but rare to systematically think about whether it’s working. Improvement is ad hoc. We plot variance of the rubric items. Tends to be the case that items yielding high variance in grading are not well written. But ‘element 5c is wonky’ is simple.

Separating orthogonal questions improves agreement. Often orthogonal attributes are combined – e.g. did they upload interesting photos – better to separate in to did they upload, were they interesting. Parallelising helps reduce misalignment – low performance ‘hard to follow’, high performance ‘easy to follow’. Also improved inter-rater reliability. Inch up the quality. The Gaussian overall tightens when you add rubric revisions.

7 habits of highly successful peer assessment.  Assignment-specific rubrics; Iterate before release (pre and during); Assignment-specific training; Self assessment at the end; Staff grades as ground truth (find how well the system is working); Show a single, adaptively-aggregated grade (not all peer grades are good, don’t show all grades – if they see they have one wingnut grader, that reduces confidence); Provide feedback to graders.

Peer learning stategies can give deep feedback, improve motivation, and learning. Biggest challenge is the labour requirement of doing it. In blogosphere about peer assessment in MOOCs, two genres of critique. One is, I took a class and it didn’t work. It’ll be because one of those 7 principles was violated. Even if you follow all 7, the labour involved is a lot. Part of me thinks, welcome to life as a staff member, you have to do the work. Gardening takes time. I do think there are big opportunities to make this more efficient. Want to talk about algorithms to do the busy-work.

Facebook group for class, didn’t know it existed for a year. One community TA follows it, content ins awesome. At the end shared their design work, portfolios. One did a poll on how long it took to complete the assessment.

Machine grading – NYT article about revising SAT essays. You can game a common machine grading algorithm. (Perelman 2013.) Grammar good, but not true. Combine machine and peers for accurate assessment?

Short answer questions are versatile open-ended assessment. Recognition better than recall. However, often in learning we want that deeper, recall-based learning, for real life situations where there aren’t four options hanging in front of you. Also more immune to test-taking strategies.

Machine grading. Auto grading of short answers, using etcML. Not specifically made for us, general framework. Trained with 500 submissions per question. Deep-learning based strategy, fair amount of training data.

Pilot: replace one peer with a machine. Previously took a median of the peers, now include the machine grade. Peers were lenient, 14% higher. Also swayed by eloquence over accuracy. Saw that a lot at Stanford. Unless you know the content, you would think it was a good answer.

Peers assess ambiguous answers better. If the grammar or type of solution isn’t in your training sample, you’re out of luck for machine learning.

Confidence score – in high confidence cases, machine score predicted staff agreement well. So use machines to allocate peer effort, interfaces guide it. Crowd wisdom only works for independent errors, so use this to mitigate. Predict, then asks students not to score assignments, but label attributes. More likely to agree about attributes than evaluation. [Not totally following this]

If system is 90% confident, assign to 1 rater. If 75%90% confident, to 2 raters, and <75% assign it to 3.

First ask, what are the correct or incorrect attributes. [I need to see the example longer to grok this, flashing by far too fast to read]

At the end , students see grades, and what they missed, from feedback from the peers. This more structured job helps create micro-expertise through interface. I really like project-based learning, authentic projects, real challenges. At the same time, a danger is, a bad course has attribute ‘do stuff and show us in 12 weeks’, danger is students don’t build the particular skills they need for a project. Good to combine openness with educational muscle training. Students do what they’re interested in, but tutor helps them work on particular skills. Less available in the online world. With labelling approach, bringing some of that back in.

How does this compare to the other system? Peer-scoring UI, give a score 0-10. Accuracy vs peer effort in seconds. Machine only 28%, but identify-veritfy up to 45%, but at a big cost of time. [Still missing a lot here – also slides hard to read because dim projector and low-contrast colours]

How many people do you need? Verification more useful with more peers.

Does a good job of overcoming biases, especially peer lenience. Identify-verify pattern does better, more targetted. Marginal increase in quality of the grades.

Blueprint for peer learning, using machines to amplify the benefits. After a while, looking at your peers’ work becomes drill-and-kill.

Photo, design school at Stanford, everyone co-located. But online, alone together? Real studio, just know what people are up to. Lose that online. A lot in the news about retention in online classes. A lot of it is silly. Don’t know what the denominator ought to be. It’s like if Amazon made all books free all of a sudden, I’d download 100 books, I’d probably never open half of them, and I’d get 10 words in to another half, and get a curve like the retention curve you get in online classes. Don’t think we should assume that free books is a disaster as a learning medium because the completion rate is 6%. Better question is how much is someone reading, or learning. So not at faction of books completed; analysis not by course but by individual. Media gets this wrong over and over. [I’m not sure I’ve seen any research or MOOC puff pieces doing it that way either, to be fair.]

At a residential college, social fabric designed to help you flow along. You get up on Tuesday morning, people are going to class, in the evening people are doing homework, in week 10 everyone’s stressed about exams. But online, the exam time is uncorrelated with anything anyone else is doing. You may have a final, but online, people just have their life, so when that collides, you not only have absence of positive reinforcement, you have negative reinforcement of everything else. Use studio as a muse to not replicate those features, but create stuff that plays a similar role. Can also ask, what’s difficult in a studio that’s easier or more powerful online. Teaching to 200 20yo, they do great work, they’re living similar lives. Every year I see a dozen apps about how my dorm mates and I get groceries, find parking, or where’s the cool party on Saturday night. That’s 40% of student design projects. Online, you get things like, project picked up by the UN. You have this diversity, huge opportunity.

Harnessing diversity at scale. Biodiversity, not monoculture. When harness this, algorithms have ideology baked in. Use it to leverage diversity, not stamp it out.

One example, Talkabout. Simple system. A front end to Google Hangouts. Real easy to batch up groups of 4-6 students, encourage them to talk about the class online. Used in 5 classes, results vary extremely widely. Developed it for social psych class, had materials for small group discussions, cultivated over years. So amplify that, shared with the world, worked awesomely. But class on HCI where they do these as an afterthought, not deeply integrated, they were essentially unused. That was me! I should’ve known better.

Re-showing the slide – peer learning requires integration and scaffolding to work well. The quality of that is the biggest predictor of quality.

Where it worked well, it worked really well. In e.g. Iran, Hangouts banned. Said you can also do it in person, and people did – 2000 in person, 2000 online. Went for more than an hour, in more than 100 countries. It was like a mini-UN. Different from the in-person comment – ‘lack of tension and active opposite person’. Worries – echo chamber online – small town America people go to the same town hall, online we all find our niche. But here we’re getting the opposite: more diversity than online.

More diverse groups to mildly better on final performance. I’m not convinced [nor am I on that data – also looks like single-country is best].

Looked at Dreyfuss, human-centred design. Marriage of prototyping to trying things out that are hard to predict, and theory to effectively to make a good guess first, and learn better and more efficiently.

Don Norman and Scott Klemmer, – need more theory, analytics techniques, to expand design beyond chance success. See this in spades in online learning. At Learning @Scale, energy great, smart folks, working on cool stuff. But appalled about the amount of completely post-hoc analysis. That’s a fine starting point, so long as we’re more serious and rigorous next time. A great benefit of online education led by CS, they are good a building stuff. Unfortunately, computer scientists have no training in social sciences, and so lack theory to make good design decisions. That’s where we can come in. This community, through its expertise in using analytics, to drive theory, a big teaching opportunity for online education.

Let’s create and share practical theory – Stokes, Pasteur’s Quadrant. 2×2 – seeking fundamental understanding or not, vs pure basic research/ inspired by use. Huge opportunity to marry social science theory with analytics and huge society-changing opps we see in the learning area.

Kurt Lewin – nothing is as practical as a good theory, best way to understand something is to try to change it.


Alyssa: Was worth getting up at 8am again. The diminishing returns for peer raters, in terms of accuracy of rating. Other side, value of doing the rating, getting a lot out of seeing peer work. Where’s the diminishing returns, how many ratings do I need to do?

Yeah, great question. When we designed the system used by Coursera, made conscious choice to randomly assign peers to assignments, to get a baseline in how well it works. Both from a score-generating and student-learning perspective, random is not the way to go. If goal is get a better score, use crowdsourcing algorithms to intelligently assign raters. But if goal is experiencing studio wall, interesting solution, my hunch is there are two things to share. One is, diversity is good. Don’t want to randomly give 10 of the same thing, probably. Some exceptions. Second, not worth showing people crummy assignments. Probably not fair to allocate a lot of grading resources to that. Some experimental work, if you see your assignment, show something 1 JND better. My design, worth seeing Jony Ive’s design, but also to see what a similar student did that’s a little bit better, that’s useful. Some evidence that’s helpful. Peer Studio.

Josh: Traditional distance education, Marist College, a few MOOCs. Seeing people taking that peer grading in to more traditional distance education, async, applying there

The Open University has done a great job on this for many years. Dan Russel and I ran an event at HICSS conference. They’re kinda grumpy, what’s new in this besides hype and the scale? What’s new is the hype and the scale! Scale enables all sorts of things you can’t do otherwise. The motivator for the system we used came from friend at UCSD for years. That was role model that inspired the online version. This year, took newest version of Peer Studio, used that class as the guinea pig for a newer version. It’s not distance education, we’re really excited about preliminary feedback. Doing flipped classroom stuff. Can have software to mediate preliminary discussion.

Q: Intrigued by graph of benefit of diversity. Wondering, in institutional research, struggle to illustrate benefit of diverse environment in college, how it impacts folks. Using LA to help us with that?

I do think, clearly most modern admissions offices see benefits in all kinds of diversity. That was what led us to run that analysis. The other thing was, there’s a technique called the jigsaw classroom, half a century old. Assign one student to do early years, some middle, some later, each teach their peers about their domain of expertise. Closest to a magic bullet in pedagogy. Everyone does better. Inspired by that rare success, looking at this in TalkAbout to leverage that online. Some of it depends on content. Social psychology is a great fit for diverse discussion group, e.g. Dubai and Indiana is interesting. Benefits will accrue for e.g. a linear algebra class. Kumbaya experience is intrinsically valuable, but content benefit is even greater.

9B. Discourse and Argumentation (Grand 3)

Statistical Discourse Analysis of Online Discussions: Informal Cognition, Social Metacognition and Knowledge Creation. Ming Chiu, Nobuko Fujita (Full Paper)

Nobuko speaking.  Ming Chiue is at Buffalo, she’s at U Windsor, but also small business called Problemshift.


We have data from online courses, online forums. Tend to be summary stats, not time, order or sequences. So using Statistical Discourse Analysis, Ming Chiu invented. Not individual note level, but sequence of notes. Group attributes, recency effects.

Also informal cognition leading to formal cognition over time. Informal opinions are easier to access, intuitions rather than abstract theoretical knowledge. Fostering that development over time. Also how experiences ignite theoretical knowledge.

Knowledge-building theory, how students work with new information and made them in to good explanations or theories. Linking ideas from anecdotes in to theories, and elaborating to theorise more.

Corollary is social metacognition. Group members monitoring and control of one another’s knowledge and actions. Most individuals are bad at metacognition, so social is good to take control at group level. Questions indicate knowledge gaps. Disagreement always provokes higher, critical thinking. (?!)

Interested in new information or facts, and how we theorise about them, pushing process of students working with explanations, not just information. And express different opinion, more substantially.

Hypotheses – explanatory variables – cognition, social metacognition vs several other bits.

Data description

Design-based research. Online grad education course using Knowledge Forum (KF). Designed to support knowledge building – radically constructivist approach. Creation and continual improvement of ideas of value to a community; a group-level thing.

KF designed to support this. Focus on idea improvement, not just knowledge management or accumulation. Client and web versions, years old (80s) and now sophisticated. Lot of analytics features.

Demo. Students log in, screen with icons to the left. Biography view, establish welcoming community before they do work. Set of folders, or Views, one for each week of the course. Week 1 or 2, instructor spends times facilitating discussion and moderating it. (Looks like standard threaded web forum things.) Model things like writing summary not, with hyperlinks to other notes and contributions. And you can see a citation list of it. Can see who read which note how many times, how many times edited

N=17, grad students, 20-40yo,  most working, PT, in Ed program. Survey course, different topics each week. After 2w, students took over moderation. Particular theme set, emphasising not just sharing but questioning, knowledge-building discourse. Readings, discourse for inquiry cards, and KF scaffolds.

Cards are on e.g. problem solving, set of prompts aligned to commitments to progressive discourse. Notes contain KF scaffolds, tells you what the writer was intending the readers to get.

1330 notes. 1012 notes in weekly discussion (not procedural). 907 by students, 105 by instructor and researcher.

Method – Statistical Discourse Analysis

Some hypothesis, some dataset. 4 types of analytics difficulties – dataset, time, outcomes, explanatory variables.

Data difficulties – missing data, tree structure, robustness. So to deal with it, Markov multiple imputation, and for tree structure store preceding message to capture tree structure. Online discussion is asynchronous, get nested structure. SDA deals with that. For robustness, run separate outcome models on imputed data. (?)

Multilevel analysis (MLn, Hierarchical Linear Modeling), test with I2 index of Q-statistics, model with lag outcomes.

Outcomes – discrete outcomes (e.g. does this have a justification y/n), also multiple outcomes (detail being skimmed over here).

Model different people, e.g. men vs women, add another level to multilevel analysis, three level analysis. (She’s talking to the presenter view rather than what we’re seeing? Really hard to follow what she’s talking about here. Possibly because it’s not her area of expertise.)


Look at sequence of messages. Asking about use of an idea precedes new information. Informal opinions leads to new information too. Male participants theorised more. Anecdote, 3 messages before, ends in theorising, as did asking about use, opinion, different opinion.

Looking at informal cognition moving to formal cognition. Opinion sharing led to new information as a reply. Also opinion led to theorising. Anecdotes, got a lot of those, they were practising teachers and they talk about that, also led to theories. As did elaboration.

Social metacognition moving to cognition. Ask about use led to new information. Ask about use led to theorise, and so did different opinion and ‘ask for explanation’.

Educational Implications

Want to encourage students to use one another’s formal ideas to create formal knowledge. Also wanted to encourage students to create subgoals with questions, wonderment, they take more responsibility cognitively of what you’d expect teachers to do. They motivate themselves and build knowledge over time. Handing over to collective cognitive responsibility. Consistent with Design mode teaching. (All Bereiter & Scardamalia stuff.) Doing it via prompts to aid discussions.


Participants coded their own messages themselves – we didn’t need to do content analysis. Scale that up, might be applicable to a larger dataset like a MOOC. Talking about extracting that data from e.g. Moodle and Sakai.


(Phil Long says he’s doing the Oprah Winfrey thing with the microphone.)

Q: Interesting. I’m responsible for making research in to applied tools at Purdue. What artefacts does your system generate that could be useful for students? We have an early warning system, looking to move to more of a v 2.0, next generation system that it’s just early warning but guidance. How could this apply in that domain?

Signals for EWS. This is more at the processes, at higher level courses, guide the students further along rather than just don’t fail out. This data came from Knowledge Forum. Takes a few seconds to extract that, in to Excel for Statistical Discourse Analysis. Many posts had that coding applied themselves. We can extract data out of Moodle, and Sakai. If we identify something we want to look at, we can run different kinds of analysis. Intensive analysis on this dataset, including Nancy Law too, and UK researchers. SNA, LSA, all sorts. Extract in a format we can analyse it.

Q2: Analytical tour de force. 2 part question. 1, sample size at each of the three levels, how much variance to be explained? Use an imputation level at the first level, building in structure there?

Terrific question, only Ming can answer. (laughter) I’m not a statistician. I know this dataset really well. Gives me confidence this analysis works. For SDA only need a small dataset, like 91 messages.

Phil Winne: Imagine a 2×2 table, rows are presence/absence of messages that lead to something else, columns are the presence/absence of the thing they may lead to. Statistical test tells you there’s a relationship between those. [This seems a lot simpler – and more robust – than what they’ve done. But I haven’t been able to understand it from the talk – need to read the actual paper!] Looked at relationship between other cells?

I’m sure Ming has a good complicated response. I was most interested in how students work with new information. Looked at the self-coding; can’t say caused, so much as preceded.

Uncovering what matters: Analyzing sequential relations among contribution types in knowledge-building discourse. Bodong Chen, Monica Resendes (Short Paper)

Bodong talking, from U Toronto.

First, talking about cooking. If you cook, and buy cooking books, you have to buy good ingredients, cook for right time, and put them in the right order. Temporality is important in cooking, and in learning and teaching.

Neil Mercer – temporal aspects of T&L are extremely important. Also in LA research. Irregularity, critical moments, also in presentations at this LAK14, lots about temporality.

However, Barbera et al (2014) – time is found playing almost no role as a variable in ed research; Dan Suthers critique of falling in to coding-and-counting. So challenge in taking temporality in to account. Learning theories tend not to take it in to consideration. Little guidance, and few tools.

Knowledge building – main context. Also suffer from this challenge. Scardamalia & Bereiter again. Continual idea improvement, emergent communal discourse, community responsibility.

Knowledge Forum again, but different version – posts in 2D space so students can see the relation between them. Used in primary schools. Metadiscourse in knowledge building. Engage young students to take a meta perspective, metacognition of their own work. Two aspects: first developing tools, a scaffold tracker, extracts log information about who used scaffolds, and present a bar chart to serve as a medium of discussion. And design pedagogical interventions, here for grade 2 students, what’s the problem for their discussion, to engage them – e.g. where are we stuck, how can we move forward.

What do good dialogues look like? Focus on ways of contributing to explanation-building dialogues. Thousands of discussions, grounded theory approach, Six different categories. [Is this like Neil Mercer’s stuff?]

To make sense of lots of data. Lay out in a temporal, linear form, how different kinds of contribution. Compared effective dialogues and improvable dialogue where they didn’t make much progress.

Can we go further than coding and counting? What really matters in productive discourse?

Lag-sequential analysis. ID behavioual contingencies (Sackett 1980!). Tracks lagged effects too. Many tools: Multiple Episode Protocol Analysis (MEPA). GSEQ analysis of interaction sequences, and old tools in Fortran, Basic, SAS, SPSS. A challenge for a researcher to do this.

So wrote some code in R to do Lag-sequential Analysis. Easy to do, and is one line of code to run. (Is it available as an R package?)

Participants and data – Grade 1-6 students, 1 science unit, 1101 KF notes in total, about 200 for each grade.

Primary data analysis, coded as contribution types, inquiry threads, and productivity threads. About 10 threads in each dataset, some productive, some improvable – fewer improvable. (We’re N=2-9 here.)

Secondary data analysis – compare basic contribution measures. And lag-sequential analysis to look at transitional ones.

NSD in basic contribution measures between productive and improvable dialogue.

LsA. Simple data table (in R). Feed in to R program, computes what’s happening. Takes one thread, computes the transitional matrix for that thread – e.g. if 1 happens, what’s the frequency of e.g. 5 happening. Base rate problem though. Try to deal via adjusted residual, or Yule’s Q, gives good measurement. Like a correlation score. “The code calculates that, which is just … magical.”

Merge in to one big table, 50×38. Simple t-test between types of dialogue and whether they differ in each transition. Run over all data.

Found – in effective dialogues, after questioning and theorising, tend to get lots of obtaining evidence. Also when working with evidence, they tend to propose new theories. For not very effective dialogues, students respond by simply giving opinions.


Temporality matters. Temporal patterns distinguish productive from less productive patterns.

Focus on community level, not individual or group level. Also, an R implementation of LsA, addressing the practice gap. Contact him to get it.

Limitations – LsA overestimates significant results, misses individual level. Data manipulation converted it into linear format. Other actions, like reading, are not considered.

So for future, design tools to engage students in better discourse. Connect levels with richer action, and refine the algorithm to be more effective.


Q: Agree that methods matter in LA. Useful to see these two presentations, employing different methods. Statistical discourse analysis is new. What would a comparison look like? They both hit on sequential analysis. Would be great, come from same lab – considered a head-to-head methodological treatment? (laughter)

Ming Chiu’s work is more advanced. A lot of work in SDA is different. Big difference here, I compare two kinds of dialogues, they are more not distinguishing between effective.

Nobuko: Focus on productive discussions, not non-productive. We looked at everything, but focused on things that led to provision of information and theories. But for you productive ones lead to theories.

I’m not trying to advance the methodology, I want to design tools for students. I’m trying to use a tool to explore possibilities.

Q: LsA is done, understood quite well, useful baseline for understanding new creature, SDA, it’s complicated. Before we can put faith in it, have to have some understanding.

Q2 (?Phil Winne): SDA can address the kind of question you did, like do discussion vary, like an upper level in a multilevel model.

Little Low-Light Monsters (D800 @ ISO 25,600)

10B. Who we are and who we want to be (Grand 3)

Current State and Future Trends: a Citation Network Analysis of the Learning Analytics Field. Shane Dawson, Dragan Gasevic, George Siemens, Srecko Joksimovic (Full Paper, Best paper award candidate)

(While Shane was out of the room, George stuck a photo of a dog into Shane’s presentation.)

Shane talking. Thanks everyone for stamina. Thanks co-authors, except George. (I contributed! Says George.) I had the lowest impact, so I am up here.

The slide comes up, and Shane looked straight at George. Yes, you did contribute. (Manages to recover quickly.)

Goal – citation analysis and structural mapping to gain insight in to influence and impact. Through LAK conferences and special issues – but missing a broad scope of literature.

Context – much potential and excitement: LA has served to identify a condition, but not yet dealt with more nuanced and integrated challenges.

Aim – to ID influential trends and hierarchies, a commencement point in Leah’s term. To bring in other voices, foundation for future work.

LA has emerged as a field. (Strong claim!) Often mis-represented and poorly understood, confused with others.

Using bibiliometrics – Garfield (1955), Ding (2011). Dataset: LAK11, 12, 13, ETS, JALN, ABS special issues. Straight citation count, author/citation network analysis, contribution type, author disciplinary background (shaky data).

Many criticism – buddy networks, self-citations, rich-get-richer. Gives us some great references (i.e. theirs). Real impact factor – cartoon from PhDcomics. But broadly accepted.

Highly cited papers are predominately conceptual and opinion papers, esp Educause papers. Methods – Wasserman and Faust SNA book. There were empirical studies mentioned, but few.

Citation/author networks. Count any link only once, not multiple. Lovely SNA-type pictures. A few big ones. Moderate clustering – 0 is no connections, 1 is all, got about 0.4/0.5. Some paper were strong connection points, but degrees surprisingly low. We’re drawing on diverse literature sets. Degrees were increasing from LAK11 to LAK13.

Author networks – a few strong nodes, but generally similar disciplines clustering.  Small cliques, few highly connected; not a problem really. For an interdisciplinary field, still largely disciplinary clustered.

Paper classification – schema from info systems, 6 categories, added a 7th. Evaluation research, validation research, solution proposal, conceptual proposal, opinion, experience, panel/workshop. Lots of solution proposals. A lot of evaluation research in the journals, the empirical studies are more there. LAK dominated by CS. More educational researchers in the journals – they prefer stuff in journals than conferences, but CS will. Largely conceptual/opinion. Methods – “other” was by far the most. Quant not far behind.

Field early, but maturing. Lots of opinion and definitional. Need to grow empirical students, more validation research, and critiques of studies. Would be great to see more arguments. Computer scientists dominate LAK proceedings; education research dominates journals.

By understanding our field, we can better advance it. Or do all fields go through this process? Working at other techniques too.


Matt: We’ve noted this for a while, it’s maturing. Is there another field that we can look at, in its first 5-10 y, to see how our research compares.

That was a comment from reviewers, can we normalise this. I’m not sure. How do you do that?

George: One field we could look at, the EDM community, there is some overlap. Talked about that, talking to Ryan. Point at the end, the location of a citation is more important as its existence.

Shane: Love to do the EDM work. Still argue it’s not as interdisciplinary as LA, so direct comparison very difficult.

Adam: Irony, that for analytics topic, not much quant. At history, from early days of ed tech, could we use that as a benchmark?

Shane: Yes. Look at where authors have come from, go out multiple steps. Largely they’re from ed tech, that brings in other literature.

Q: How can we use this insight to develop? Look at what communities are being cited but not well represented at the conference, approach for next LAK.

Shane: Great idea, thanks.

Hendrik: LAK data challenge, visualised the dataset of EDM and LAK, with ACM full proceedings. 12 submissions analysing differences between LAK and EDM. How could we team up for that for next LAK. Focused track, with focused tasks, where people have specific questions, compare how questions work on the datasets.

Shane: We did, Dragan chatted to Chris Brooks about the data challenge, would be great to get involved more.

Bodong: Analysing tweets since 2012, this is my first LAK but have been tracking it for a long time. And attendees who did not have papers. So Twitter could augment this. Data challenge next year, include tweets? Another idea.

Shane: Really interested in that, who’s tweeting, what’s being picked up and what their backgrounds are. The comment about the group we are missing, that’d be another are of interested people who aren’t publishing.

Q: Not just conference, but alternative mappings, where published work is mentioned and by whom. Lots of different communities, educators, administrators. Social media may reveal some of those trends.

Maria: Discussing on Twitter the idea of an Edupedia. We can do things, a LAKopedia, brief summaries on how research builds on itself. Every article gets summary, bottom line, strengths and weaknesses, leads to things that build on them. Have a taxonomy mapped out – field is now so don’t have to go back a long way. I’m not volunteering!

Establishing an Ethical Literacy for Learning Analytics. Jenni Swenson (Short Paper)

Dean of Business and Industry at Lake Superior College, MN. I’m not a data scientist, I’m a soil scientist. Might be only discipline nerdier than data science. Then to dept of rhetoric, looking at ethical literacies. Interested in 2008, via Al Essa. And Matt Pistilli at Purdu.

Rhetorical theory, concerned with ethics. Crisis of credibility, focus on ethics really helped. So have a lot of ethical frameworks. Our focus is to teach ethical communication in the classroom, raise awareness, to give skills to make ethical choices. Best message possible, analyse artifacts – recurrent pieces, written, spoken, visual. Always outcomes of a group’s work. Better understand consequences, intended and otherwise.

So taken these frameworks, three so far, and applied to LA. Looking for questions around purpose from an ethical framework.

Data visualisations – Laura Gurak, Speed/reach/anonymity/interactivity. Early internet stuff. Ease of posting online, get ethical issues for visualisation artefacts become apparent. We have software that allows anyone – NodeXL and SNAPP intended to be simple to use. Realtime analysis can be posted by people with no expertise, can be put out with any context. When viewed as a fact, we get questions such as, is there a lack of context, no history. Who’s accountable for predictions, accuracy, completeness? Target snafu, private data can become public quickly. What if inadvertently made public? E.g. through implied relationships. Interactivity and feedback, there isn’t as much with people who might be part of the vis.

Dashboards – Paul Dombrowski. Three characteristics of ethical technology – design, use and shape of technology. Think about first impressions. Things like Signals. We all judge people, 8-9s, sticks with you. Visual elements could be expressive. Meaning created beyond the dashboard label of at-risk, and how student responds without the context. Finally, how does it function as an ethical document. Many questions! Is there a way to elevate the student over data, rather than viewing the student as data.

Then Effects, of the process. Stuart Selber’s framework, institutional regularisation – required use leading to social inequity. We have an economic reality, Pell-eligible students, no access to computer, transportation, have jobs. Different from 4y schools. Need to be sure not making harm. At any point, can have 5% of our student population homeless (!). Crazy what these students are doing to get the 2y degree. Ethics of these projects, could be different between two schools. Transparency about benefits. In rhetoric, if uncover ulterior motive, you message has failed and you’ve lost credibility. So transparency needed that school will benefit. Do intervention strategies favour on-campus, vs online? We want to have available to all. Bernadette Longo, power marginalising voices, often unintentional. Who makes decisions about LA model and data elements? Legitimising some knowledge and not other. If we do give them a voice, are they all given that consideration? Bowker and Leigh Star – most concerning. We are really trained to accept classification systems as facts. We know there are problems, and we’re stereotyping to fit in to categories. There could be existing categories we’re covering up. Making sure that conversation is there, again transparency. Real kicker – at risk, “people will bend and twist reality to fit a more desirable category”. But people will also socialise to their category, so dangers that it may feel like defeat.

How does institutional assignment of at-risk match the student? Do they reinforce status? Can LA be harmful if categories wrong? We know it is, we have to have the conversation.

Return to Selber. Three literacies – functional, critical and rhetorical literacy. They are non-practitioners put to this test. Understand, analyse, and produce knowledge. To reach ethical literacy. Under rhetorical side, four areas – persuasion, deliberation, reflection, social action. Who has control and power of the conversation, who doesn’t, and why? Are we following a code of ethics?

Central question: Who generates these artifacts, how, and for what purpose, and are these artifacts produced and presented ethically?

We could be a big step up, took tech comm 10-15y. Questions are a jumping off point.


Q: Thought about fact that many of us get funding from agencies, foundations, and how this compromises ethical principles, about empirical findings with another agenda in the background.

Any time you go after money, there’s ethical implications. For me, in rhetoric, as long as you’re having the conversation, being open and transparent, that’s all you can do.

James: In ethical models with Aristotelian ethics, utilitarianism – possibility of a new ethical model? Because dealing with technology?

I do think there is time. There is a lot of different codes of ethics, different models. This was just one discipline. There might be parallels or best fits. I’m hoping for that. People have papers in the hopper on ethics of analytics. Broader conversation, the Wikipedia idea was great, discuss what this code of ethics is.

Q2: Real barrier to LA is the ‘creepy factor’, people don’t realise you’re doing them. More mature ethical future could overcome that affective feeling?

It is creepy! (laughter) I think young people don’t care about how much information is being collected about them, and older people have no idea. Everyting is behind the scenes and we aren’t being told. We have to have trust about collection and use. Thinking more of industry. There isn’t a transparency. We feel betrayed. There’s no transparency right now and that gives us the creepy feeling. The opt in/out think contributes to creepy feeling.

Teaching the Unteachable: On The Compatibility of Learning Analytics and Humane Education. Timothy D. Harfield (Short paper)

Cicero – eloquence as copious speech. Will try to be as ineloquent as possible.

Timothy is at Emory U. A little bit challenging, involves unfamiliar language. Paper has philosophical language. Best way to approach it is with stories, motivation for thinking about what is teachable and what isn’t.


Driving concerns – getting faculty buy-in, especially from humanities. STEM, it’s an easy sell. In humanities, unfortunate responses. Profhacker blog, LA space is interesting, but is a little icky. Generate a reason to engage in this space. Secondly, understanding student success. Problem at Emory, we already have high retention and student success. How do you improve? And opportunity – think about what success looks like as a learner. So changed conceptions of student success. Retention and grade performance is easy; others more challenging. And thirdly, understanding learning as a knowledge domain. More often than not, learning is not defined or if it is, it’s as basically experience, changes your behaviour or pattern of thought [what else could it be?]. Have concerns about language of optimisation as constraining what learning might mean. Any time you see ‘optimisation’, think of a technique. Some types of learning don’t lend themselves to enhancement through techniques.

Teachable and unteachable

Going back to Aristotle. Five forms of knowledge, fifth – wisdom – is combo. Episteme, nous, techne, phronesis. What is the case, nous, how do I do it, what should I do. Episteme and techne are training, phronesis is education – practical wisdom. We cannot teach this (!), includes moral behaviour. Cannot teach them the right way to act with the right motivation, only acquired through experience (?!). Training vs education from another philosopher.

Humane education (Renaissance humanism)

Approaches we see – three features. Ingenium, eloquence, self-knowledge. Ingenium, essence of human knowledge – deliberation, disparate situations drawn together. Not technical competence, imaginative, creative, can’t be machine-learned. (!). Eloquence – copious speech. Educators mediate between students and knowledge. Students come from different places, values. In LA, not all students respond the same way. Example at Emory, dashboards like Signals, expect everyone will respond the same, aim to perform at a higher level. For some it’s effective, others not. Already high-performing students end up pursuing proxies. So wrt eloquence, meet students where they are. Self-knowledge, cultivating ingenious behaviours, making them responsible for learning, developing capacity.

Humane analytics

Real concerns. Potential for analytics to not recognise the responsibility of the learner, undermine their capacity.

Really fruitful if reflective approaches, like in Alyssa Wise’s work, Dragan, Dawson. At Georgia State, predictive models, part of the funding for this project is not just the model, but also purchasing a second monitor for every student adviser. At others, it’s an algorithm where students are automatically jettisoned and have to choose a different path. Here, confronted with their data, asked to reflect on it in light of their personal goals, troubles, where they want to be. These reflective opportunities are powerful: encourage responsibility, and also have the ability to make prudential decisions about how they are going to act in this context.

Another space, fruitful, social network analysis. Humanists will be frustrated by online education because miss the f2f, with each individual student to eloquently meet each student where they’re at. End up as sage on the stage; humanists end up that way but like to think of themselves others. SNA has possibility to get the face of the student back. Can ID nodes of interest, opportunities to intervene, to lead and guide in a particular direction.


Thinking carefully about what learning is, and the types of learning we want to pursue. Some of our techniques may contribute to some types and distract from others. Need to be sensitive to needs of all students, and not all learning is teachable.


Q: Slide referring to higher performers, seeing their dashboard, undesired outcome?

We’ve had dashboards piloted in two classes. Told students to stop looking, because they’re seeing decrease in performance. The competitive nature is coming out to the detriment of learning outcomes. Really frustrated, students really anxious. So instructors told them to stop.

Alyssa: It’s not always necessarily positive. How students act on analytics. Goal not to just predict, but to positively impact and change that. Students optimising to the metrics at the expense of the broader goal. How do we support students in the action taken based on the analytics?

We’re learning you can’t just throw a tool at an instructor and students. Responsibility in the use of these tools. If instructors not careful in framing the dashboard, not engaging in discussion in ongoing way, whether it is accurate reflection. Levels of engagement measured in clicks or minutes is not correlated with performance at all. Bring that up in class. When you do ID students are risk, or in a class determine these metrics are not useful – say best use of analytics here is to not use them. Or not use as measures, but as a text itself, and opp to reflect on the nature of their data and performance – digital citizenship, etc. Cultivate practical wisdom in the digital age.

Maria: We have to be careful. Just because we have dashboards, not the first time they got negative feedback, or negative actions. It’s a bit more transparent now. One thing is cautionary note, this stuff has been happening all the time, this is just a new way for it to happen.

Hadn’t thought about it that way. Discussions about responsibility might end up leading to larger conversation about instructor responsibility in their pedagogy.

Maria: Tends to be case in tech, it creates new problems, we create tech to solve those. Phones where we text, people drive when they text, now cars that drive themself. We’ve created dashboards, new unintended effects, need to iterate to find ways to have intended effects.

Situate in a sound pedagogy. A talk like this is really preaching to the choir. Wonderful to enter conversation about these issues.

Caroline: Combining this with earlier talk, apply it to our development processes within the community. Technology is a theory to be tested. Ideas about reflection applied to development of the tools so don’t get in to those issues.

Q2: Interested in effectiveness of high-level things vs just showing the raw activities that make up the score.

Significant chunk of the paper discusses relationship between what’s teachable and measurable. If it’s teachable, optimisable, performance compared to a standard: skill mastery. These things are teachable and measurable. But things like creativity, imagination, even engagement, they don’t have a standard for comparison, we’re forced to produce an artificial one, which is maybe to lose what it is in the first place. Is more effort required to make distinctions about the types of learning? LA applied differently.

Closing Plenary

Kim gathers people together. Glad you joined us here, looking forward to connections.

Few final things.

  • Best paper (Xavier Ochoa), poster (Cindy Chen), demo (Duygu Simsek).
  • If you presented, get it up at – or view it there!
  • Twitter archive will be there too. Conference evaluation to be sent next week, please complete.

Hands the torch over to Josh Baron, from Marist, for 2015. A bit frustrated, you’ve set the bar too high. (laughter) Had an awesome time. (applause) Advanced welcome drom Dr Dennis Murray (sp?) who’s looking forward to it. Not everyone knows where Marist College is, it’s Poughkeepsie, NY, an hour up the Hudson from NY City. Focused on technology. $3m cloud computing and analytics center. Many faculty excited.

Safe travels. See you next year!

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.


Author: dougclow

Academic in the Institute of Educational Technology, the Open University, UK. Interested in technology-enhanced learning and learning analytics.