LAK14 Monday (2): Workshops pm

Liveblog from the first day of workshops at LAK14 – afternoon session. Overview of session, timetable etc and papers.

Computational approaches to connecting levels of analysis in networked learning communities (pm)

Live demos

Ulrich Hoppe: Workbench for chat analysis

https://workbench.collide.info/

Click go to Workbench. Log in using ‘guest’ and ‘guest’.

Standard version, mainly for network analysis. Can get data from a repository, list to select. Fun example of voting in Eurovision. ABBA data, who voted for whom. (!) Can access stored workflows. It’s all network-based, pretty much everything is server-side, although some of them are client-side for visualisations. Load, transform into internall preferred, SISOB (EU project) graph format. Then Clique Percolation Method to find groups. Then need something to visualise it – Force Directed Clustering. Then click Execute. Highlights which bit it’s working on as it does it. Then click on Result link.

Tried another workflow. Publication network, as Pijek data, nanotechnology data. Publication network – author-publication. Fold it to one mode, have relations between artefacts or between authors. Then author centrality, then clique percolation, then force-directed clustering. Visualisation is client-side. Results are stored, so have a menu for those, can re-use them.

Chrome better than Firefox. They have students using this in the course.

Can export the viz, but only static – usually as a picture. Use D3 for display. Clique percolation gives you overlapping subgroups, can see links between clusters. Not really tool for exploration – that’s Gephi. This is to encode standard procedures. Extract the routine patterns, and make them sharable.

It’s a nice level to talk about the processes and procedures. Enough detail to talk about, but not beyond grasp.

Now, chats. Tapped In data workflow. Several ‘Chat-xxxx’ widgets. Extended Contingency Graph builder, then Main Path Analysis (MPA), then PageRank, then viz. Main Path Analysis: a chat log is a Directed Acyclic Graph (DAG), has sources and sinks. Start with one source, one sink (placed before and after your dataset), enumerate all the possible graphs, count how often a certain edge is used, if you have the path with highest edge weights, is the most important path (originally in citation networks, but also applicable elsewhere). Then here, have Result Downloader, so can pick up and use elsewhere. Another workflow on same data, gives reports. Can see what proportion of an individual’s contributions are on the main path. IMPORTANT: Main path analysis not validated in this context against human judgements. But have checked that number of posts on the main path correlates with PageRank. PageRank here is the weighted in-degree, highly connected, the incoming links are like references, or citations.

General network analysis is much more flexible, can do much more stuff.

Currently working, have – networks, content, action sequences (temporal things). Some networks – e.g. citations networks – have implicit idea of timing.

Bits written in R, or Python and ltk. TupleSpace has client adaptors, agent adaptors, for all sorts – R, or Prolog. Java, C#, whatever. API is for workbench plugins is a bit complex, but for TupleSpace is open and specified. Go to wikipedia Tuple Space, then SQLSpaces is theirs.

Agathe Merceron: Lemo tool for analyzing activities on a learning platform.

Is a web application. Showing us fake data. Mostly the teacher’s view. Many diagrams, shows you little view. Click through. Often popular: Activity Time – number of accesses and how many users. (On same scale?) Can magnify sections. Below it shows you all the resources/elements in the course/LMS, shows total number of requests, and number of users who’ve hit it. Can select the time span you want, begin and end. Or can see what happened with forums, see when they were accesses.

Much concern with privacy in Germany. Also when working with kids. All the data in the system are anonymised. Not a warning system, like e.g. Signals, or Moodle tool that gives you more analysis. Some people say don’t want to track students more than what Moodle does already. But with this, only see the trends, the collective view, not more individual analysis.

Multiple diagrams of the same data, see which they prefer. Diagrams are interactive – mouseover, gives you more detail.

Can’t yet see e.g. when I post something to the forum, what was the effect. Can see how many posts and accesses there were to a thread – but that’s anonymised.

Partner in project, had an encyclopedia, knew people worked with key words. Designed it so can go through a whole chapter, with things introduced stepwise. Wanted to know whether people followed through a whole chapter. So looked at sequences – what do students do one after another. Can see e.g. 40% of students looked at this resource, then that resource, not what route or whether anything in between. Interesting finding – 60% of students followed same sequence of looking at assignment, then forum, then assignment, etc. Two vizes – Frequent Path BIDE, and Frequent Path Viger. Latter is about ‘just after’, can also look at within a time period. So e.g. Wiki x, chat x, wiki x, wiki x. Can’t yet say here’s this sequence, does it occur, but that’s interesting to explore. For researchers, this is a bit little; for a teacher, you need to keep it simple.

Another one: the assignments. Performance tab. All of the assignments – quizes, assignments, SCORM, rescale to 100%. Then show how many students scored at each percentile, can mouse over. Can also see the average of each one too. Wanted to see what is useful. Also boxplots. And student’s view – boxplot of how you did on all your assignments – and can see other users’ too, next to yours, for comparison.

Another – area-based viz of how much each component was used/accessed.

Another – hitmap – can compare multiple instances of a course. See if the pattern vs time is the same.

The handbook is in German. ‘LeMo Handbuch’.

Discussion: Tools for reproducible data analysis

Not just sharing your outcomes, but also your processes; these tools we have seen could support that.

Ugochi Acholonu: initial statement

Ugochi is from DePaul University.

Open source approach to research, open up your work – give dataset, analyses, workflows. Lower barriers to replicating, adapting and building upon the data analyses and visualisation. Key is sharing more than methods, sharing actual analysis scripts and workflows. All the data manipulations, cleanings, etc.

Why? Scientific method perspective – replication is key to verification. Encourage research innovation through collaborations. From LA perspective – collaboration, supports fast turnaround time to support interventions.

Examples from statistics, computational research areas. Supplementary materials – dataset, plot functions, manuscript plots, Java applet – lets you do the whole reanalysis. All through the journal.

New open source tools – R, Python, LaTeX; Network Workbench nwb.cns.iu.edu, Collide/SISOB Workbench workbench.collide.info.

R code often not pretty, not great to share. But workbenches are good.

Why interested? Trying to close participation gap in STEM – big gap in bachelors degrees in Comp Sci and Engineering, despite more women than men in HE. Support from Mayor in Chicago – ‘Chicago City of Learning‘. http://explorechi.org Through Digital Badges. Workshops/self-paced, game design challenge, missions, a whole range of activities. Diversity of contexts, so logging lots of things.

So want quick turnaround for quality data analysis. Don’t get much credit for reproducible research. But for us, interested in how to support great learning, time is important, dealing with lots of kids, parents, organisations. Have to get it out, use it, improve our site to support learning. Expanding to different cities – LA, Philadelphia, Dallas, others.

What standards are needed? Is it even possible? How can we support new models for publications and authorship? Citations are the current metric. Peer review, ownership. Is it even feasible? Privacy, IRB, community support. [My concern: deanonymisation!!] Chose R as the tool because could find books, communities, forum posts where they could get support.

Supported by MacArthur and others.

S1: We need speciality in each part of research. In LA, would be good to have standards for the content. Same for ML, or network analysis. We work with instutitions, universities, hard to get their datasets and put them in public. Could analyse it, but still lots of issues. What kind of dataset can we give to everyone, it’s a challenge.

Cindy: NSF in the US is saying we have to find ways to make datasets available. Also really helpful to have some idea, is there a way of organising it so if folks have really nice tools, I may be able to use, what’s a good way to set it up, or what’s a good sort of format.

Ugochi: We’re dealing with kids. People are hypersensitive. We can’t release this stuff, chance of revealing identity of students. So when we release it, it’s at a very high level.

C: What level can you model the context?

Dan: Issue of levels. Permissions, format, interoperability. But semantics of, inter-rater reliability, communicate in the coding approach. Contextual information too. That’s just sharing the data, and sharing the analyses – the process you went through isn’t enough, what’s the objective, the meanings of the terms.

Ulrich: Privacy issues. We had 2w, 3w ago a seminar in German comp sci, on MOOCs, new research challenges. French group, US groups too. EPFL Pierre Dillenbourg etc. Raw material you need are the user traces, if they’re rich enough you can reconstruct a lot. Actor/artefact traces, can reconstruct a lot. One French colleague said why shouldn’t we give the traces to the users, you have them on the server, but are encrypted and can only get them with a token that is yours. Have to trust your provider, of course. Given trust, would be very nice. If you want to do aggregated analysis, have to get all the users to agree – given e.g. a certain anonymisation going on.

Ugochi: With kids, U13, parents have to give permission. Don’t know if we have the trust. Need more computing in schools. Not trust, like medical. People are hypersensitive. We have opt-in, opt-out, that’s all we can do; we might not share it to the world.

Ulrich: All the MOOC data should be made available for research. All whales are only killed for research.

Sean: In many discussions about sharing open online community data. Many issues the same. Privacy issue is one thing. The other issues are, we look a t a lot of soc sci papers, read methods, can’t tell what data, how processed, how got to their results. So sparse you can’t even abstractly reconstruct it. A lot of soc sci, more than 5y, can’t find the data or the scripts for it. The author could likely not reproduce their own results. Possibly true even of our own work. I’ve used it as a ‘data factory’. What is the data we’re gathering, how was it structured, and a sense of how to reuse the methods others are developing around tech use. With stats, read textbook, set of inputs that go in to e.g. factor analysis. Similar for network analysis, more or less standardised. But not good at expressing what a node is, or an edge, or the weights. That’s a big hole for reproducibility. I have 4-5 papers on being specific about operationalising this data.

Ugochi: Partly responsibility of the journal. Prioritise different things.

Sean: My theory is journal editors are taken by surprise one article at a time. See much they don’t know what questions to ask, nor reviewers. Get, looks like good science, don’t need to understand the methodological bit in the middle. But some journal boards, nobody understands the bit in the middle, all taking it on trust.

Cindy: Editors try to find at least one with expertise. But limited journal space. In biology, articles on 5pp.

Dan: Electronic supplement of 40-50pp!

Cindy: Make additional data available. An ANOVA, pretty simple, but factor analysis many parameters, lots of decision.

Sean: Some canonical answers around more traditional methods, fewer canonical answers around newer methods. Reviewer can only say so much here.

Dan: Another layer. Incentive, research culture. Experimental psychology, ask a masters student to reproduce a study. Doesn’t happen in education. At faculty level hard to get a reproduced study published.

Cindy: Some people who think their stuff can’t or shouldn’t be replicated.

Dan: Ethnomethodological approaches!

Ugochi: Should we even try?

Dan: Can you get tenure on solving a whole field’s problem?!

Ulrich: If your project allows you to spend time on general value work, not clear on benefit to your user community. If you have the degree of freedom, it’s worth doing it more. Your question on reproducible, shareable research results. Data analyses. Embedded learning analytics, put in platform?

Ugochi: Yes, it’s real time, we want to support them. We have other Cities of Learning, interesting to see how our scripts have to be augmented or changed to look at different contexts.

Sean: If you have, scripts and data, students learn and apply and steal by structuring their data like mine. If you share the structure of your data, creating a system to reproduce your results without disclosing the personal data.

U: Where’s the home?

Sean: Github! Open civic data, school district data, etc. It wasn’t de-identified. But could build their capacity. Share the structure. All repositories on github. City and school district have repos. In DC, Data One, do data hackathons, open civic stuff

S2: Also Boston.

Sean: Using github. Data is trickier to share. Very much roll your own. Keep structure open. Each district has populations that can be input.

U: Who’s the community? Want it to not just be researchers.

Dan: Do people other than hackers know about github?

Sean: For general public, need a separate site. But hackers use github.

Ulrich: With tool we showed you, can publish your workflow without sharing any data. The point is how could you get a community to use these tools, and share them. It’s on the web, can share it, have to organise it. It’s not so easy. Access to remote labs in STEM project. Similar problem, have workflows, like instruments in experiments, how do you organise around the space. Github is the hacker solution. I would like to go beyond that. I’m not the one who can move an empirical researcher community. We also garnish what we do with empirical research as a proof of concept, but it’s not my main concern.

Cindy: Broker these connections, in a shared understanding.

S3: In the LACE project, looking for quality indicators, welcome to give the input, from actual people working in LA. Please go visit our study at bit.ly/LAK14.

Tea break

Dan: Our Productive Multivocality project (will be a book soon), was looking initially for specifying tools for sharing, but found we were too different, so turned in to what is interesting about each other’s stuff.

Wanli Xing and Sean Goggins: Automated CSCL Group Assessment: Activity Theory-based Clustering Method

Sean talking. All focused on digital trace data. Here, asynchronous groups. Even github! Specific case is synchronous group on math. Work with Wanli, his student. Person leading project is ethnomethodologist, doesn’t really talk about reproducibility. Have operationalised some of what we’ve done in to things we think can be measured.

https://github.com/The-Art-of-Big-Social-Data

What can we infer? Validity, reliability issues. James Howison put together detailed work on analysis of trace data, and common errors. Many about using the tool without understanding – e.g. give social scientist network toolkit and they go out there and … do things with it. Temporal mismatch, data completeness, and so on. Try to think about all of these.

Striving for systematic methodology and ontology. ‘Group informatics’ – most interesting behaviours happen in subgroups, cliques, communities. And massive network vizes are not any good. Focus on social science questions. Being specific about the words used, about people – not about networks, clusters or computing phenomena. Is it knowledge construction, learning. What am I trying to say something about. Be specific before analysis.

Group informatics ontology – every interaction, is it person-person, or person-artifact. Social connection between two people.

Ulrich: Start from actor-artifact network. Then can get actor-actor networks with links established through mediating artifacts. And vice versa, can get artifact-artifact links.

Discussion forum data, natural interaction is between two people.

U: But in asynch situations, much more, the first-order thing is people-artifacts.

In terms of analysis of the sites I’ve looked at, forums are communication acts between people on a platform. cf software stuff, where no social interaction between the coders, so the primary connection is between the person and the artifact.

U: The artifact is the utterance. It’s not necessarily the tool. The contribution is the artifact.

Cindy: Wouldn’t have thought of that, would think artifact are resoruces outside a discussion.

U: Can be. Emerging learning objects. In constructivist learning environment, participants create learning objects, interact around them. Those are artifacts.

C: Shared whiteboard is an artifact. Making it clear – this is a good example.

[much animated discussion, hard to follow!]

U: Actors and events, but events can be artifacts, publications.

Sean: Those are fun conversations. It’s the theory we’re using. An online learning situation, learning theories are social theories. So I see interactions more between people than between people and artifacts. Can take opposite stance. Neither is right or wrong, just have to be explicit and clear.

You make the choice whether it’s a person-person network, or artifact-artifact one.

Contextualising interaction through theory – Activity Theory. Rich, multivalent data from VMTwG (Virtual Math something or other, with Geogebra). Teaches geometry. Book ‘Translating Euclid’, new ideas about how to teach geometry, interactive experiences, to learn math the hard way, by manipulating objects, goal being deeper learning.

Have a geometer sketchpad, wiki, chat window, view of all interactions so far, and a replayer to scroll back and see what’s been done. Teacher has 30 students in 10 different rooms, can’t pay attention to all of them or even with watching replays.

So created a model, to push trace data through Activity Theoretic model – Rules, communication, tools, learning tasks (object), division of labour. All operationalised. The workshop paper is a shortened version of my next paper.

Carolyn: Have you tested to see if people can appreciate it?

We’re starting that.

Carolyn: I have trouble getting my head around Activity Theory (AT). May just be because I’m a computational linguist. Are teachers going to be able to get their heads around it?

Dan: Do they have to buy AT to use this?

No.

Cindy: It may be less troubling than other stuff.

Dan: They’re not contaminated with computational linguistics.

Wanli: They will know which aspect they are incompetent with.

Twelve groups over five modules. Shows you these different things – but with their labels on it, not the Activity Theory ones, that was a big feedback.

Do more advanced things too. Visualisation. The Compass tool is a significant event – when they start talking about it, or using it, they’re about to solve the problem. Took high level concepts from ethnomethodological analysis, turned them in to metrics.

Underlying is a clustering algorithm to ID similar groups.

U: Look at velocity of interactions?

We can. Not just clustering around when people interact, how they interact, key words from qualitiative works. Spider diagram showing individual group vs cross-group average. Not prescriptive, with dashboard about performance, but to help decide which ones to spend more time with. More a teaching analytic tool than a larning analytic tool.

Q: How do you know this text belongs to that group?

All contextualised by ‘room’, the data we get includes who are the students in the room. They don’t cross over. This wouldn’t work if they did. Partly accessible and functional since designed around that constraint.

U: Example of the Compass tool, it’s important where they start to use it. A time issue. In original histograms, you just aggregate over time, so don’t see if recent or not. Thought of e.g. heat metaphor to give you recency in the events? The other diagram, if you want to use that as a teacher, it may be too detailed.

Good idea, we haven’t used them.

U: Colour as indicator for recency, here you just have overall aggregation.

Shady: Categories like ‘try’ questions, how do you know the text belongs to the categories?

Dan: Longer version of paper shows you how, based on counting up. I have an issue with your understanding of AT distribution of labour, you think it’s better if all participating equally, but it’s about role specialisation.

Our colleague has issues with how we interpreted AT as well.

Dan: A contrast. Sean focused on group activity, here it’s individuals in the context of a group.

Cindy Hmelo-Silver, Carolyn Rosé, Jeff Levy: Fostering a Learning Community in MOOCs.

Carolyn: Another contrast, in Sean’s work (here), the groups were assigned. Here we’re trying to form groups. Sean presented work he did. Here it’s work we haven’t done together.

Cindy: Putting together an NSF proposal. I’ve done work on supporting collaboration in project-based learning. Carolyn comp linguistics, academically productive talk. Accountable Talk – which is trademarked. Lots of common ground. Call from someone on a startup in MOOCs (Jeff Levy, OfficeHours), work together to create learning communities. Motivations about supporting productive collaboration on a large scale. In CSCL good at a small scale, we have to be able to go beyond that to have impact. From CSCL and PBL research, depending on goals, particular strategies may be more or less useful. What it means to be an individual in a group, and how we form groups. Connecting students together, supporting TAs. That’s what OfficeHours is doing, providing TAs for the MOOCs.

Some work Carolyn did, models on how discussions came to successful close. Replies, experts in the group – many indicators that are directly observable, look at latent factors about whether threads came to a successful conclusion. There are five – Friendliness in the thread Fen, Influence of the starter Sin, Expert participation Epr, Content matching Con, Thread popularity Pop.

Carolyn: Supporting people without much social capital to get more.

Another thing irl to that social network, how it affects dropout. It’s well known that only a small percentage of students on MOOCs participate, less likely to drop out. But based on the network we see, can see how resilient. Connections – more strong connections are less likely to drop out of the course. When one of their strong relationships drops out, they are more likely to drop out next week. (!) Again highlights importance of community. MOOCs not so good at creating community, supports the idea we need to create more community, maintain it more.

U: For these strong ties, this is usually around you. If community is larger, doesn’t have impact on your strong ties. These strong ties are important for resilience. We know, the strength of weak ties, is about if want to do new things, be stimulated, that’s how.

Carolyn: These MOOCs last only 7w, they are all weak ties anyway. Not arguing having more of a community will be beneficial to people who have ties, but makes more people tied in.

Cindy: To use different LAs, come up with actionable indicators – social, semantic, sequential patterns, others? Carolyn’s work on semantic analysis and language technologies. Work today on sequential patterns. Facilitation to help build communities, make them more relevant to course content. May have patterns that warrant attention by TAs – natural variation that some groups do fine and don’t need much, others nothing’s happening at all, others where you might be able to do something useful by e.g. connecting to a human supporter or a software agent. JIT support for the TAs, who may have content knowledge but not about pedagogy or collaborative learning.

‘Locomotive mentoring model’ where TAs train students who may be the next generation of TAs. People who are central in social networks as candidates for mentoring. Helping TAs mentor students who may in turn mentor others, with researchers mentoring TAs at the top.

Carolyn: There’s an assignment problem. Research team, set of TAs. They have to pick or be assigned a team. They then have to find their group. Waves of new students coming in, dropping out. How can we manage to get this organised? Use some social recommendation software. Two approaches. Both modelling used for link prediction tasks in the past. Start with a network, for a person, suggest another opportunity they may be interested in that occurs later on. Usually after that time has passed so can check you’re right. Can use that to rank algorithms for accuracy of prediction, have a way to predict in real life. We know the ingredients of a successful thread. Only so many people/experts to go around, load balancing issue. Tech to do the matching. Works in terms of individual load, and everyone getting attention. Locomotive mentoring, get buy-in from the front, likely to take suggestions about where to go. Also want to experiment with a more naturalistic version, subtly encourage interaction between people who might interact well. Build a larger sense of community, involve more people in the discussion. Find soft partitions in network, kind of like groups, but maybe we can influence how they form.

Shady: To evaluate predictive link, this happens without showing them the predictions?

Carolyn: With these experiments, it’s data already collected, can partition network by time. Predict next time point. Sometimes take whole graph, remove some links, see if possible link is one we removed or not. In the future, when we run it, we’ll suggest connections. Can’t evaluate it the same way, but will be able to do it on a MOOC by MOOC control – was better in one with it rather than without.

Cindy: Challenge in real context, figuring out, we’ll have several things going on at once, trying to pull out effect of one thing. Maybe enough where we can do some experimental stuff.

Carolyn: Some small group experimental stuff. True that when, whole MOOC, hard to isolate the factors, a lot of qualitative impressions.

Modeling – two paradigms. One is Matrix Factorisation. Decompose a network, maybe user/user or user/thread, almost like a topic analysis, find reduced representation, by decomposing matrix (Users/threads) in to submatrices, map them on to K factors. Can match users to threads, can match users to each other. Probabilitistic graphical models – Latent Dirichlet Allocation. Those K factors we come to with matrix factorisation are kind of like topic distributions out of an LDA model. Applied to chunks of text, have documents, with words in them, each word is ‘generated’ by a topic. General distribution of topics, average across documents, but individual document has its own distribution. And the general distribution comes from a prior one. Through inference algorithm, we learn the topics. For every topic, have distribution reflected by the words in that topic. Many flavours of LDA. More detail in ML workshop tomorrow! Start with same prior. Have people interacting in threads, have some connection with each other. Conversational role as much as a word distribution, role reflected through how they talk. Not just learn from text, but topics influenced by connections, who talks to whom, and what is a document – which is everything one person said on one thread. Intention is to ID topics that represent the roles that come in sets on a thread.

Interestingly, looks very different to matrix factorisation. But people have an average across these topics, the roles they take, which is like the K factors from matrix factorisation. Can average to get across the thread. What’s different here, because it’s probabilistic model, can build intuition in to the structure of the model, idea that roles come from the discussion and orienting to the roles other people take. If build that in to the model, will find the roles people tend to take on. The roles come in pairs, I’m interacting with each person. Same sort of information, but build more intuition in. What is it we want in the matches? We could change this model if we have a different idea about what’s important about each conversation. Find a representation that works. Idea of different levels of representation: community average, person average, person-in-a-thread.

Dan: Thread a setting, or a group?

Good question. Here, it’s one context. Not sure that’s good. Threads persist over time, may not read the whole thread each time you come. This is simplistic, but also standard. Took threads as a basic unit. Could decide to do it differently. Think about what makes sense.

Ulrich: Do not want more detailed explanation of your approach. Matrix factorisation is like block modelling in social networks. Clearly gives you roles – positional analysis. LDA, I see the topics. I want answer on higher aggregation level, I cannot follow your point that the roles arnd the topics fall together. For me, you get roles, and then you get topics. What makes you think they fall together?

Good question. We’re tying two kinds of models – LDA model, and a soft graph model – mixed [gah, missed the name]

U: On a pedagogical level. Group formation. Have finished a proposal on group formation in MOOCs. Had workshop, it’s in the air currently. Boils down to application, you want to assign roles, to have a certain ability of the people to assume certain roles, and to distribute the knowledge in a certain way.

The roles are reflected in what people say.

U: That’s the two level. The knowledge, and the social interaction patterns.

Not sure I believe if you just do a plain LDA you just get plain content without role. They’re intertwined.

U: I agree.

Bounce back and forth, putting pressure on organisation, not just co-occurrence, themes, but because same people use same things in same contexts. It’s just an adjustment. People play different roles, mentioning different things, the organisation will look different. Might not be trivial to interpret it. We’re still struggling with this model. There are matrix factorisation approaches that let you bring them together that’s more like that. MOOC is not really that massive to support the complexity of this model. Maybe if you ran 20 MOOCs.

General

Sean: Looked at emergent groups, on github lately. Not used this approach specifically, like it. ID the emergent groups. Most group analysis, the groups are a priori. In software engineering, they’re emergent. I’m excited about what you’re headed towards.

Carolyn: Next year’s workshop! Code for this is already nearly shareable. I’d be happy to work with you on github, see if it’s the same. I’d understand my model better if I see other people’s.

Sean: That’d be fun.

Cindy: Another thing interesting for both of us, Carolyn focused on tech details, I’m on what the TAs can do with it. Sean’s work is trying, how do we get this in a form that’s actionable, helps people make sense of a display that they can use in real time, rather than long after. Most of my analysis is long after the course is done.

Dan: After the NSF report is in you do the really interesting work!

Cindy: As a social scientist, important to do studies of how teachers or TAs understand these displays. I imagine it won’t be the way we do.

Sean: Has to be a design-based research iteration of all these.

Ulrich: General discussion. Clusters of topics, that I’ve tried to extract from the submissions we had – without LDA! First something for Sean. You said you’re interested in the social relations, actor-actor. Here’s a picture I used in a presentation at the last LAK – relations around epistemic artefacts. We have actors, and artefacts – a learning object, a knowledge artefact, something created by the actors, and the knowledge artefacts may have direct connections e.g. hyperlinks. We also have inferred links, that come through the folding – e.g. these three people are co-authors, see they form a triangle of the authors. To say one or other is more important, but this is the data that we have, in its complexity. Doesn’t make sense to focus on one part when we do our observation. To mask the data we have given the target, doesn’t make sense to me. This is at least the complexity we have to deal with.

Sean: Not arguing that. The data is just as messy as you show it to be. In async discussion, pedagogy is to develop fb to each other, the relations, there’s a choice you can make, the theories of learning you use direct you towards human communication, where there isn’t a clear knowledge artefact, but knowledge emerges from the conversation. Can be thought of as a social network, the messiness isn’t really there in the discussion-based forums, or Facebook, or Twitter, they are human communications.

U: Reification of ideas in to knowledge artefacts, this exists, we can study it.

Sean: Agree absolutely.

U: The themes from today. LA and CSCL. Tools and methods in large online courses, MOOCs etc. Reuse of analysis workflows. Embedded LAs. What would we say, where would you bet on to be the next hotspots from your point of view? Here’s the point of the special issue papers. Where would you like to see future work? What would you suggest?

Wanli: Do we have publicly available MOOC data?

Maybe in the Fall.

Sean: Scaling, theory. A lot of privileged access to data. Some people have access to MOOC data, most don’t. Some have Facebook data, most don’t.

Carolyn: At Stanford, guy in charge of their MOOC data, working to a deal to make their data more open. Berkeley is saying if they do, they might too. Then idea is MIT and Harvard will cave. Stanford working hard to overcome the issues. Huge discussion. Feel they’re close to coming to some agreement. Also course me and Dan and Dragan and George working on, will provide some data.

What platform?

Carolyn: Maybe EdX.

U: John Mitchell from Stanford. Many MOOCs, once you hand them over the Coursera, forget your adaptive analysis of data. These platforms will not give us means to directly interfere.

Carolyn: True of Coursera, though some colleagues say they would like to do more, are a bit overwhelmed, in principle willing. Don’t have things set up to give you data in real time. Don’t get it until after the data is done. Can scrape the discussions as they go. It’s scrapeable, but that doesn’t give you the click data. In open EdX, can do what you like, get data in real time, we’ll have way more control. Platform at Stanford, NovoEd, more affordances for synchronous collab, worth looking at.

Shady: We set up MOOCs, we would be happy to have collaboration on that.

U: All the workshops have been asked to nominate ideas for special issue. For full-day, we can nominate 3. Have to use the papers that are there. Can’t nominate things we haven’t written yet. We’ve done link prediction between authors and topics, but the accuracy is low.

[Not clear on the plan]

Dan: I am interested in whether or how people want to see this continue. What to focus on for future workshops.

Ugochi: Trying to form the right question. Not a theme, but the tension points, thinking about in all the different presentations. Theory-based analysis of data, Sean’s work, how do you talk about what’s an artefact and what’s not. Also about sharing data, e.g. with these workbenches, how do you form a community. Or ubiquitous learning, relates to e.g. MOOCs. There’s particular people who do MOOCs, mostly researchers and people with higher educational levels. So people are building community, but could be translated to other places. Those questions are key, versus thematic – shareable, privacy, and so on.

Ulrich: This is where the synergy is.

Agathe: How you encourage people to participate is interesting.

Ulrich: I like to ask everyone, one point for next year.

Opening up the data, how to compare different analyses. Some standard data, maybe not completely fake, but where we can compare our different analysis methods, see if they can be integrated and combined to make more sense.
How can we share data and also our analyses. On one hand, read papers and want to understand, can’t follow analysis method because it’s too short, don’t see enough of the data to see if the method is applicable really. So idea of standard dataset as a benchmark is a good idea. On the other hand, ways to make analysis more explicit and shareable.
Relationship between LA and psychological issues. My project tried to predict self-regulatory learning skills by analysing the learning log. Generating a plan, vs dropping out. Try to predict learning style/role, but difficulty to predict it without psychological issues like self-regulation, motivation and so on.
Would like to see how MOOC research is going to evolve, working with Dragan, we have some data. For interaction types, we should analyse actor-actor interaction, but there are users who benefit from emergent objects, so maybe active and passive.
SNA and pedagogical practice.
Liked all the talks, really inspiring. Want to see evaluation, tell that they really work. Assumptions made. Don’t know if we change the dataset it will still work.
MOOCs, and the transferability. Models we are using can transfer? (Carolyn: Look at MIT work on MOOC TV.)
Standardised datasets. We all struggle with getting access to data, even within one context. So we can validate what we’re doing. Emergent groups, detecting them as they form.
Me: Want more data sharing, replicability in this field. Like real science (should be). I think the privacy issue around data sharing is really, really hard. Deanonymisation is scarily easy. Fundamentally, I’m interested in using all this to make an actual difference to actual learners. And getting better evidence than we have done that! It’s hard but if we can’t do it, no researchers in education can.

Dan: Would like to see some progress made! So not same workshop again. Combine methods to look at new things. And do what Doug wants!

Ulrich: This has been different. If you want the sharing, you have to fix either the dataset, or you have to fix the tool. If you leave both open, the empirical researchers always in love with their own datasets. Comp scientists always in love with their tools. So it doesn’t work well. Don’t have to fix both, but just one.

Cindy: Maybe a place in the middle. Compromise in both sides. Synergy coming up with something better. It’s the collaboration among both. I agree it’s can we do anything to improve teaching and learning.

–
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow