LAK11 – Monday afternoon – Doug Clow's Imaginatively-Titled Blog

Liveblog notes from the afternoon session on Monday 28 February, the first full day of the Learning Analytics and Knowledge ’11 (LAK11) conference in Banff, Canada.

(Previously: The Learning Analytics Cycle, liveblog notes from Pre-Conference Workshop morning and afternoon, and from Monday morning.)

Introduction to Xavier Ochoa. Principal Professor at ESPOL, Ecuador. Work on learning objects, inventor of ‘learnometrics’

Keynote: Xavier Ochoa – Learnometrics

Slides available on Slideshare.

Ecuador. Joke about banana republic, and doing research on bananas. University is outside the city so as not to distort the city.

Too many different ideas about what a learning object is – little consensus about what it is. Lots of research about LOs, but not answers to questions about numbers, quantity.

Simplistic model of learning object lifecycle, learn about the processes, may give us a better idea of what LO might be. Start with the Offer. Broad view of what a repository is – not just MERLOT etc, but LMSes, institutional repositories.

What is the size of repositories? He has counted; Wide range. A medium sized LMS – including his – has more objects than MERLOT, which has been around for 12, 15 years. LMS use less than 5 years. Distribution of LOs – there are peak, small ones. 20% of repositories have 70% of the LOs. There is a long tail of resources – lots of repositories with small number of LOs. How can we find those objects?

Growth of repositories – has a slow start, then takes off. But the growth is linear. YouTube, Wikipedia – user-generated content – the growth is exponential. LO repositories as we know them are not working.

There is hope – a new breed of repositories, on how to do it. Connexions growth in contributors is exponential. Looked at connections between contributors. Established, old guys, have links to newer ones – accept new contributors and collaborate with them. The community is why Connexions is working. Quantitative analysis tells you something is there, doesn’t tell you why; he suspects it’s social interactions.

Objects produced per teacher or contributor. A very strange distributor – there’s nothing like an average teacher. May or may not be a power law – but is very fat-tailed (log-normal). Average has no meaning here. In traditional repositories, a few people produce a lot of contributions – those repositories depend on super-users. LMSes and MIT OCW – not long tail, but fat belly (Weibull) – most resources contributed by the ‘average’ teachers – between 10-50 objects. Institutional repositories – 99% of the users contribute just one object: their thesis. Most content is produced by the single super-user. A ‘light-tail’.

(Everyone looked at me when he mentioned the distributions and whether it was a power law or not.)

The key is the engagement; there has to be something in return to continue to engage. Traditional repositories, exponential decay of interest (Ariadne). SIDWeb, grows over time, then fades (Weibull). LogNormal decay of institutional repositories – you contribute once then disappear.

If our LMSs could be our repositories, could make them all shareable, that will solve the problem. Open Educational Resources (OER) are the key. Reuse is the reason-to-be of learning objects. But we know very little about reuse – there is no data. Within LMSs, we can see.

Reuse paradox, proposed by David (Wiley?). We have small objects – photo, table – is very easy to use, but very little educational context. At the other end, richer, bigger objects, with more context, are harder to reuse.

Tried to measure reusability. Split in to three sizes – Small (components in slides, images in Wikipedia); Medium (modules in Connexions, libraries in software); Large (courses in curricula, WebAPI in mashups). Found that reuse is about 20%, regardless of size. Could be higher.

Need to take this with a grain of salt – very little data here, have to scrape the barrel to get this. Reuse is happening with or without us – people are doing it anyway. We should be humble: we make tools to help the reuse that’s already happening. Reuse is natural.

We need to re-examine our understanding of reuse; do more studies. He doesn’t think David’s analogy that the data wag the theory dog is right, data isn’t the tail. It’s more like left/right side of the brain, need both. Some things are right from the data – e.g. expansion of the universe, was empirically-driven. Data isn’t the whole picture, doesn’t tell you why. Have to go hand in hand, and prove the theories. Don’t want ‘this works but we don’t know why’, also don’t want string theory which doesn’t tell you anything. Need both.

Distribution of reuse, looks like a log-normal distribution – is like a popularity contest.

If we want a science of this, we need both theory and data.

Call this line of research learnometrics. Study of empirical regularities of the data, develop mathematical models; then understand the influence and impact, and close the cycle with useful metrics that are applied. Need to use the knowledge to improve actual learning. But now doesn’t like the name – parallel with bibiliometrics, scientometrics – but those stop at the metrics. Don’t want ‘educational data mining’ either, it’s not the same. Learning Analytics, educational data mining, or educational research – is it the same or different?

Took papers from a data mining conference, the terms in the papers – educational data mining and learning analytics tend to overlap. So ‘data’ is shared. But there are some outliers. EDM has cognitive, intelligent tutors. LAK has useful, experience. Need to understand how we fit with educational research, long established field. We have some things to learn about identifying significant findings.

The questions these three ask are the same, but present different kinds of answer. Educational research might have small experiments with control groups, but it’s not transferable. We try to come up with tools to analyse data, large scale observational studies, come up with patterns with empirical similarities between results. Good answers to some questions. We need their models.

Questions

Someone: Distinction between educational research and us, liked. With reuse, are you considering cultural reality of faculty, and institutional incentive. Historically set up, faculty don’t like to reuse, I’m the expert. We all reinvent curriculum that has been reinvented 180 times. How can we make it easier, more convenient – engagement is a component that works outside the institution, in an informal environment. When you’re inside, does your model include accounting for cultural and incentive barriers?

Cultural difference, very big. Latin America, when they ask if people reuse, say yes. My colleagues reuse, it’s the normal way of doing things. Don’t have ingrained fear of copyright infringement – if it’s educational, we can use whatever we want. I know it’s different here.

Someone: We might say out loud it’s a copyright issue, but historically teaching very private enterprise. See this when you do team teaching, god forbid my colleagues see what I do in class – that’s the barrier.

I think there’s a lot of fear. My numbers could not say that.

Someone: Were you able to find anything about the way in which reuse happens. Following from cultural thing, people are happier to adapt than reuse as a chunk. Is there evidence?

The 20% is reuse as-is. I didn’t look at with modification. This is a baseline. There is more reuse in the sense of inspiration. I have no numbers for these other kinds of reuse. Much of the data I collected, little was the easy way, most was scraping the site without permission. Only way you can get some reuse data.

Someone: Do you have data on the reuse by complexity, by how complex it is to edit the object? People might say they want to edit a Flash file to do something different.

The distribution of reuse is lognormal, appears when there’s lots of steps in the way. Would be interesting to look at this.

Daniel Suthers & Devan Rosen – Unified framework for multi-level analysis of distributed learning

Dan presenting. Works primarily in CSCL. This work came out of analysis of interaction in small groups, f2f and online, sync and async, multiple media. Also supporting online community of teachers, how to bridge two levels: small groups, larger communities. A way of representing the data and the interaction that was developed for manual analysis, but can scale up – NSF VOSS grant to do this.

Many theories about how learning happens in social settings: social as stimulus, social entity as learning agent; and lots of others. All of these involve uptake – basic unit of connections or associations between people. It’s someone taking something someone else has done as relevant to their current activity.

Key question about interplay between individual and collective agency – need multi-level analysis. Also, activity may be acros multiple logs, but do things in different media, coordinate those actions to make their interaction with each other. Logs record in the wrong ontology for analysis – media-level events – this framework gets you from log data to ties, and back again.

Big model. Has logs – a process trace. Events drive this. That builds up within an entity-relations framework (domain model). Event model as an abstract transcript. Contingency graphs – how can we see interaction? Adjacency is one way of spotting it. Identify empirical relationships between events that collectively evidence uptake; they are contingencies. One example: threading relations – to reply to a message, it must first be written. Other contingencies around temporal proximity, and lexical or semantic overlap. The graph can get pretty complex, have to be selective. Deal with the complexity using technology. Visual representations are just for demonstration – they are a data structure which can be manipulated computationally.

Finally, have an uptake graph. Can find uptake not manifest in the threading structure. Can find things like integrative discussions computationally, not just by hand. Things really important for learning.

From there, the contingency graph, find affiliations between people through media to generate representations called associograms (like sociograms). A directed affiliation network of actors and artifacts – a mediation model of how actors’ associations are mediated. Produces a mediation model. This lets us factor out time. Can find interaction patterns – a cycle in the graph is a round trip. Interesting for learning. Forums with more of these could be places where there’s more learning happening.

Can recover temporal information.

Associograms to figure out relationships between people. Two approaches – one is a grammar approach, look for predefined patterns – e.g. dialogue pattern, or a producer/consumer pattern, others. Real life is messier. (Not sure what the other was.)

This was looking at a single medium; also look at multi-media associations between actors, see which paths go through discussions, wikis, and so on in which direction. Look for patterns between them.

A final step – transitive closure, collapse the associograms in to ties, then use standard Social Network Analysis methods.

This nearly automates going from log data, to ties suitable for SNA. Also, if you track where things came from, can go back down to look at the types of interaction responsible for what’s going on.

Currently applying it to Tapped In environment – very long-standing. Lots of data – 8 years, focusing on 2 years peak. Was 20k educators. Many different media – chats, threaded discussions, wikis, etc. Looking to construct contingencies for that, find discussions that are interesting, where they’re doing more than just sharing resources. Also following the actors – if they meet in chat, do they take them to another, or ideas.

Advantages – integration of distributed data. Common format for algorithms to work. Multi-level multi-theoretical analysis is possible, and can also change ontologies – map between interaction, mediated affiliation and tie level of analysis.

Workshop coming up at CSCL – connecting levels of learning in networked communities. Still open for participation.

Question: How do you know a single message has been read?

In our environment, when you open a forum, it just displays the titles, when you click on it, it’s RESTful, records in the server it was opened. We’re assuming they’ve opened it! But is more than you get sometimes. This can be adjusted to the quality of the data available – if you don’t have reads you can still work with that.

Aneesha Bakharia & Shane Dawson – SNAPP

Aneesha presenting, from U Queensland. Shane is from UBC.

Aim for what they’re trying to do is to build a diagnostic tool, so academic staff can in real time evaluate student behavioural patterns, review, and make interventions in a timely way. Focus today on what we’ve done so far, discuss SNAPP version 2, release imminent, and future directions.

SNA is useful for educators. Social interactions are harder to analyse as your class size grows. Hard to undstand when understanding how to scaffold the learning content (Brooks et al 2009) So want to harness power of real-time social network analysis.

Egample – two forums, same number of messages and participants. The threaded view or temporal activity doesn’t show differences. But look from a relationship point of view, can see one forum has no interaction really, but the other has participant interaction not through the tutor. No need for further analytics as such, but do need multiple representations to help people facilitating these forums make better decisions. Want to embed these sociogram representations within the forum itself.

Integration within LMS (Blackboard, Moodle, Desire2Learn). Tool renders a sociogram as an alternate representation.

Had a lot of barriers – no access to run queries, APIs not available – inherent limitations on how extensions or plugins work. Could built a totally new tool/version of discussion system. But can’t add on functionality. Also when lots of browser-based extensions happened; scripts to change e.g. GMail appearance. So built it as a bookmarklet. Allows targeting of multiple LMSs – empowers users to use those analytics.

At the moment, shows aggregate of all interaction. New version will look at timestamps, view over time, filter by date. Can export as Gephi, NetDraw – powerful SNA visualisation/analysis tools. Don’t replace visualisation tools, do simple visualisations, common interactions, but let people take it out so they can do more with it. Also new version allows annotations.

Runs completely client-side, runs in user context; nothing goes back to a server. Can be embedded inside the forum.

Taking safe option of screengrabs rather than live demo. (Wise, wifi is flaky with so many users).

Installation is dragging it to your toolbar, visit a forum, click the bookmark, and you’ve got it. It gives you tab for visualisations, stats, annotations. Shows activity over time, and SNA connection graph. Can filter by date. Export as image, NetDraw, Gephi incl dynamic. SNA metrics shown, can filter. Annotations feature useful for academics moderating, marking off key points – shows as a dot on the timeline of activity.

Key patterns they see:

Learner isolation – nodes with no connections.
Facilitator-centric – tutor is really very central, no student-student interaction. Is most common pattern we see.
Non-interacting groups – different cohorts put together but keep own group connections
Facilitator interaction with high-performing students – lower grade students not included in the discussion

What’s next? Two techniques: social network analysis (what SNAPP does), and content analysis. Get asked – you can see the interactions, but what are they talking about? And how do you know the quality? So content analysis – two methods: behavioural coding, and topic modeling.

Behavioural coding – specific categories you map messages to – classification problem. Research background – Support Vector Machine (supervised machine learning). Have the algorithm, but don’t have data can use for training. Want to build models, then share them.

Or topic modeling, finding the themes. Used to be called text clustering. Non-negative matrix factorisation, latent dirichlet allocation. Can fit themes to multiple categories.

Gives you a tripartite graph – how to present to users is ongoing research. Want to include in SNAPP, so anyone can use them without having to export data or understand the algorithm.

Example – groups supposedly discussing same topic, but this could show you they were different, could see who was talking directly about the subject, and more related to the topic. Quite promising.

Finally – Apache Mahout – very low-level implementation of map/reduce. Framework for developing, testing and deploying large-scale algorithms. Very useful tool within the learning analytics community.

Ravi Vatrapu, Chris Teplovs, and Nobuko Fujita – Visual analytics for teachers’ dynamic diagnostic pedagogical decision-making

Ravi presenting. A vision paper – colleagues at Copenhagen Business school and Univ of Birmingham.

NEXT-TELL EU project, just started. Is the context. Vision about teaching analytics, providing support to teachers in real time, and in real classrooms. Model – Triadic Model of Teaching Analytics (TMTA) – and an implementation proposal.

One vision of C21st classrooms is high-density – rich, personal learning environments. Nice to relay all this back to the teacher, problem of information overload. Information needs to nuanced, visualised in meaningful, actionable way. Web 2.0 has widened the learning ecology. Students not bound by curriculum (in most parts of the world). But problem of learning fragmentation – across multiple media and contexts. We have fantastic learning technologies for learners, but forgot teachers as professional practitioners. Schools are going to stay, like it or not – good way of keeping children off the streets, if nothing else.

New demands placed on teachers – not only to develop subject matter, but C21st competencies, – personalise learning, teach adaptively, provide evidence-based accounts for activities and assessment. And to be accountable to many stakeholders.

NEXT-TELL’s core vision. (Peter Reimann). Providing an innovation platform, not an ICT platform. Providing pedagogical methods and computational tools. Focus on formative assessment because is hard problem – often too late. Very costly to do it. Platform, is about second-order change – not changing a school, tools for a teacher to implement changes within that context.

How you do things in a classroom is more important than what the room looks like. Making it high density – communicative, temporal, content – leads to cognitive density. Move towards teaching analytics – is an amalgam. From learning sciences (e.g. interactional pathways to learning outcomes – some better supported by ICT than others), learning analytics, visual analytics (HCI, etc).

Basic process to support is dynamic diagnostic pedagogical decision-making. Building a classroom information system.

Triadic Model for Teaching Analytics (TMTA) – three people: design based research expert (explicit facilitation), visual analytics expert (support to better understand, and hypothesise, test and predict), teaching expert (inspiration, understanding the teaching and learning processes).

Open-Learner Models, from ITS/AI in education. Domain model, learner model, provide teaching strategies. Open ones are available for inspection by the learner.

Teacher uses planner to specify activities, rolled out to students, recorded in to analytics engine, visualisation on task process – goes back to teacher.

Four years of studies – initially led by researchers, then teachers in years 3 and 4.

Griff: What do you think will be the greatest challenge in this implementation.

One is that it’s an EU-wide project, many countries. Danish and Norwegian systems are closest, strong involvement of unions in collaboration with parents. Large systemic differences with other countries to deal with. Buy in of top management. One challenge was, before, the IT systems, the system administrators, think their systems are UNESCO Heritage sites. But now everything is a web service, if can plug in to the internet, you’re good to go.

Someone: Can talk about the Computational social science laboratory, eye-tracking, physiological data collection.

Have to invent representations, but want to go beyond that. Eye tracking to get at good designs – A vs B vs C comparisons, what to teachers attend to. Cannot make strong cognitive attention arguments based on this. Want to put this in the classroom, see what happens. Physiological data is because we really like instrumentation. (!)

Phil Ice – Multilevel institutional application of analytics

Associate VP in the American Public University System. How they use analytics across their system.

D3M culture – data-driven decision making. Two cycles, interlocking, with many points where they measure information. Continuous loop of improvement. The analyses and stakeholders are interrelated.

Multiple levels of analysis, a range of approaches. Descriptive statistics, inferential statistics, exploratory statistics – span all of the enterprise.

Examples – student count. Descriptive is appropriate here. Student demographics – pie chart. Grades by core courses – simple descriptive.

More complex when evaluating faculty -end of course survey, not just descriptives. Compare individual instructor against program, and whole university, regression across all the indicators, and factor analysis.

Retrospective analysis – historical retention, effectiveness etc – do regressions, factor analysis, decision trees.

Gets fun when you get to predictive modelling. Use federated data, demographic and transactional data, remap them using variety of reduction methods. Feed a neural network analysis that predicts retention – who is most likely to disenroll in the next five days. By individual student, program, etc. Takes all 85,000 students, ranks them in likelihood to disenroll. 87 different variables, with subcomponents – can apply back to remediation, to create interventions to affect the likelihood. Dashboard is cool, but not good for executive team.

So use good UI principles to make an executive dashboard – show number of high risk, medium risk, low risk.

Then semantic analysis – federation, disaggregation, relational mapping, ontological ordering. Using LSA, prototyped in 2009. Were up for accreditation with a professional body. One problem – School of Education is a big nightmare. Want to see matching every single piece of content with a long list of goals and objectives. People are psychotic about this – if you miss one, you’re in trouble. Room full of notebooks, put things in to ring binders, mass of paper to examine. So took an ingestion engine – on Sourceforge, Java – ingest any type of content, and do internal flex processing to strip out the content, turn it in to XML, disaggregate the content – e.g. disaggregate JPGs from text, will pick up metadata, or if text, do natural language processing, map to goals in an ontology. Strip all content down to a granular level. Then can search to see which course has its goals and objectives met, gap analysis to see how many unfulfilled objectives are there. Then gap remediation – look across the whole program, show things unfulfilled.

How accurate is it? Compared to human coders, tested – had 93.4% accuracy. Was a 92.7% saving of time to do it – $83k savings with this.

Now thinking about round-tripping: take student work product, match with LSA to goals and objectives, see if they’re meeting them. Actual evidence of learning outcomes. Government is coming back with seat time, because we can’t show learning effectiveness. This could get to it.

Convergence – quantification of semantics, not just retention. Want to do multi-institutional comparisons. Programatic globalisation.

George: Does anybody know where this PDF stripping code is?

Not exposed outside, have to feed it in to common library and it does it internally – is essentially doing OCR.

Someone: Predictive analytics, how do you validate if you put things in motion to stop people leaving – how do you know they were at risk?

Do a test-train process. Use data retrospectively – six month lag. Train on historical view. Test comes looking at how effective it was against what actually happened. This is not a static model – very complex, not a fixed algorithm. Is a dynamic state. Every time you have an intervention, changes what’s happening. Run it two minutes later, has changed dramatically.

Someone: When you use this for accreditation, did they know you were doing this?

We did it, and we didn’t tell them we were doing. We showed it to them, and they thought it was great. Took a huge chance, but did a few random checks and it met their criteria.

Someone: What tool do you use for LSA?

Built inside common library – on Sourceforge, has tool built in to it. Takes a while to learn to train it. With one person, will take them three months to work it out. It’s not easy. Need about a three-pass run, all content should be at least 200 words to get good ontologic order out of it. Below that have problem with NLP.

Steve: When map learning resource to predefined aspect of ontology – assigning a probability? Or is it some threshold?

That’s what you do with the passing – give it a probability assignment. Start with 50%, then do exclusions, then update to 75%, then on last 90%.

Steve: What about conflicts? Pick the highest one?

That’s where the human factor comes in. Have human eyes on it at some level – but cut out 92% of the human effort that’s something.

Christopher Brooks, Carrie Demmans Epp, Greg Logan, and Jim Greer – Who, what, when where and why of lecture capture

Greg talking.

System is home-built, with idea of distance education, but not how it is being used – mostly on-campus classes. Could be streamed, but isn’t at the moment. Encourages participation, before and after class. Is a giant logging platform.

Students see big white area which is what’s on the projector – Powerpoint, videos, web browsing. Also smaller frame with NTSC cam pointing in to the lecture theatre, half on the podium. Run through the video with software, automatically pick up where the slide transitions are – including when jump ahead three steps. Can then jump ahead to each slide by clicking on it. Logging of activity of use – navigation, use. Can reconstitute user session from the beginning.

Have had multiple versions of the UI. Added a note-taking component, mapped to slides, and also global notes showing everyone else’s notes. Wasn’t used heavily.

Chris (?) talking

Wanted to make tools better, but understand how students learn. Want to validate educational theory with the analytics – e.g. social constructivism. Is a data wagging the dog sort of guy.

Ran it in an integrated system, a dozen classes, 1000-1500 students. Focused on one class, second year STEM. Could be prototypical, also interesting. Highly motivated students in the class. Students not required to use this, only consider students who used the tool. Takeup between classes varies – from e.g. 89% usage, 31h of video to 60% / 3h video.

Hypotheses – a group of minimal activity learners. A group of high activity learners, watch every lecture, use it in their day to day learning. Disillusioned learners – keen to start, then stop using it because didn’t help, like going to gym in January. Deferred learners – the last-minute assessment-is-coming students – expected to be big.

Used a clustering algorithm – k-means clustering – found the groups – found those groups and more, size of cohorts begins to get small. 44% don’t use it in regimented fashion, can’t group it, idiosyncratic. Also use at beginning then tail off. Artifact in week 8 – the mid-term. Two clusters there – those who used it that week (38%), and those who used it that week and the week before. Very small use it every week (4%). The pattern in the last two weeks didn’t predict how you used it for the final exam, so separate analysis during the term and working for the final.

How does this help at-risk students, how correlates with achievement. How do f2f, distance and online students use it? Disciplines/domains. Teachers too.

Want lots more data from students, hope you can help. Have joined with other universities – half American, half European. Opencast Matterhorn project – built for HE in HE, include analytics tracking in it. Is deployed at U Saskatchewan. A common platform, and a community of inquiry.

Question: May be round 2 of research, but correlate usage with other signs of success?

Didn’t. Grades are looked at in current study. Just clustered in to cohorts, now want to see if cohorts are indicative of academic achievement. Need ethics permissions for thousands of grades.

George: Difficulty in physical classrooms, need digital data. Looked at capture of learning activity in class?

Interesting. We can control multiple cameras, in some rooms have cameras back on the students. Use machine vision to look at different patterns – e.g. looking up or not, laptop or not. Doable in small classrooms – 50 or smaller – with HD cams. Just need face recognition – this many eyes looking to the front. Haven’t done a lot yet. Students making notes in the classroom live, would be interesting, determine levels of engagement.

Someone: An in-class web-oriented student-used facility that captures information students are entering, relates to presentation. Lecture Tools, in Minnesota.

[Next section is about innovations, future ideas with potential but not much evidence]

Christopher Teplovs, Nobuko Fujita, Ravi Vatrapu – Generating predictive models of learner community dynamics

Ravi presenting, for Chris and Nobuko. Focus on combining LSA and SNA. Chris focus in dissertation.

Social Network Analysis – focus on relationships and patterns.

Latent Semantic Analysis – model of human knowledge acquisition – high-dimensional representations of associations between words in documents. Interprebility an issue – with 100+ dimensions, not easily interpretable. Principal component analysis, eigenvalue analysis … is NOT the case that first few dimensions are adequate. Visualisation may be a help.

Knowledge Space Visualizer (KSV) – focus on artifacts created by learners. Relationships between documents. The nodes are documents, and different sorts of links shown. But also visualise implicit links using LSA – cosine between LSA vectors is indicator of document similarity. Very flexible – two axes, can define them as any variable; can use chronology and authorship to see when things happen. Can overlay additional information, showing relations between documents.

Scalability issue – get ‘hairballs’, ‘furballs’. What do we learn about the learners?

Now in to work with French researchers, modelling learner communities with SNA. Threshold idea important – come to CSCL to find out more.

Content-based learner models are time-consuming to create – two years or more. Variety of ITS-based approaches. Ours is simpler! Uses LSA to create model of the learner based on what they’ve contributed – then look at relationships between user models.

Look at hypothesis about where we might see productive interactions – e.g. Vygotsky’s Zone of Proximal Development – a more capable peer might be very helpful.

Combine these LSA models with interaction-based user models – productive. Compare potential and actual actions you see.

Focus is on understanding community dynamics. Want to understand the actual mechanics, from semantic and social network perspective. Can we use Game Theory to understand it? Behavioural economics parallel. Understand payoffs and strategies.

Interested – contact Chris Teplovs to take this further.

Question: Use of LSA and scrutability in previous presentation – a problem?

Very good question. Don’t know the answer. Have been discussing. Even mathematical interpretation is problematic. Hopefully someone will solve the problem for us.

Ravi Vatrapu – Cultural considerations in learning analytics

Another presentation!

Works in the intercultural communication/collaboration community. Work done for PhD.

Culture is a messy concept. Social psychologists have just picked it up again. Still best definition is Kroeber and Kluckhorn in 1952. Strong empirical findings that culture influences behaviour: communication, cognition. Cultural influences in computers – we relate to computers as social beings. Empirical findings again demonstrating cultural influences.

More data-driven bottom-up empirical findings, from enterprise systems, interactions across countries – same kind of results. What people do differs, but outcomes don’t change – process differences but not output.

Culture in CSCL. Two processes – how people appropriate the socio-technical affordances. Explored Chinese and American – no difference in learning outcome, but significant cultural differences in how they got there.

That’s really interesting from a learning analytics perspective. Even if no outcome differences, may have path differences.

Have been borrowing from other disciplines – what would a native theory look like? Is ambitious. TEL environments are socio-technical systems. Perception and appropriation of affordances. And structures and technological intersubjectivity.

Long history of affordances – Norman to HCI, debate thereafter. Enactive approach, tight link between perception and action. They are meaning-making opportunities, and action-taking possibilities; related to what actor can do, what they know, and what system can do.

Strong argument says cultural differences are incommensurable. Weak says it’s really socio-cultural. Could be either; inclines to weak argument.

Cognition isn’t something one has, but something one uses. In some Asian classrooms, it’s not Ok to ask a hard question, save and ask later, face-saving. Affordance usage is culture-sensitive, context-dependent and tool-specific way.

Intersubjectivity: problem of other minds. Now can time and place shift. With produce/physical stuff, enables feudal lords and oll that. Technological intersubjectivity – is culture specific. So e.g. PhD advisor/student relationship very different. Relationships function – good evidence to show some cultures require barging in; others it’s very rude.

Formal definition of social-technical affordance – technology, self, and other. Opens up a design space.

(Seems very close to Activity Theory to me.)

Keen on instrumentation. Want to understand actors, add screen recordings to software logs.

Want to trace back from learning outcomes to learning pathways – can you construct the interaction pathway and see if some are more effective than others.

Someone: How do you think your talk about culture will influence analytics? Have been talking about very superficial levels of interaction.

Systemic concern with structural variables – retention and so on. One bridge is the role of epistemic agency – where is learner’s agency within learning analytics? Or look at how it looks at organisational level.

Mike Sharkey – Academic analytics landscape at University of Phoenix

Similar to Signals project. Big problem is the data and how to get to it.

Background on UPhoenix. Large, private, 75% online. Central design of courses – same content, facilitators teach it in different cities. Associate degrees to doctoral, generally serve nontraditional students. Age profile shows this.

Systems are proprietary, built themselves. Were no packages to do it – courses talk in serial, etc, didn’t fit.

Data access problem – data person can get data, use their tool to munge it around. But then you want to switch it around, will take another three weeks. Doesn’t work for us. Want to get all the information – SIS, LMS, other sources including CRM and tech support – in to integrated data repository. Normalised, rationalised – isn’t clean. e.g. ZIP code collected multiple times.

Data from 30+ source database. 430 tables and growing, 1.5TB and growing 100GB/month. It’s all effective data – if has changed, expire the old one, have the history. Current and historical.

Problem of joining the data. Think about lifecycle, from the lead, enrollment, program (academic analytics, when they’re in the system), then as an alumnus.

Tableau – for data visualisation, is phenomenal but has some problems. Work with instructional designers. Pull together things like gender pass rates, assignment level, look for outliers. Student satisfaction, look across dimensions and compare to norm. Basic reporting, business intelligence, very powerful.

In flight now – like Signals and APUS – predictive analytics. Predicting student success. What is success? Grades aren’t a good measure, it’s learning outcomes. Did they really master these outcomes? Some do it well, we don’t, want to do it better. Don’t have good measures of that, so looking simply at e.g. attendance. If still attending, better than dropping out. Use data to predict attendance/completion – e.g given complete week t, prob complete t+1. Or survival analysis, prob will drop out. Intervention – what you do about it – feed it to academic counselor (could handful to 100s of students) – give richer information than just call all your students. Have some metric, and what the issue is, is very rich. Other institutions are doing this.

Abelardo Pardo and Carlos Delgado Kloos – Towards analytics outside the LMS

Abelardo talking. Slides here.

From the macro level to the micro. Analysing what hapens in a course, to make actuators on a course. Everyday operations. Micro level doesn’t mean it’s simple.

Specific scenario – second year engineering. Paradigm is ‘you work, I work’. Four objectives, made very clear: learn to program in C, use industrial tools, in teams, increase self-learning. Significant out-of-class activity.

LMS has forum, but intense activity outside the LMS. Not enough to analyse just LMS data. Can we record interaction outside? How? Can we do it for 250 students?

Think as a student! Offer fully configured environment for a course. Are asked to share documents, so offer a system. Attract course activity to that environment. Ongoing work, first results have done capture, moving to report. Have to be transparent with the student.

Solution based on virtual machines – a guest computer in an 8GB file. Give students that virtual machine. Runs as a regular application in their host PC. When full window, almost like another operating system. Easy to access files in the host machine, your own files – but this has its own files too. Available on the first day. Linux.

Give machine with page telling them about it. Students take 5 or 6 courses – we’ll give you something so when you work on it, you have everything you need.

Instrument the tools they’re supposed to use, record when used, how long open, relay to server when updating shared documents. Transparent to user – regulation forces this – user confirms terms of use.

Advantages – limited, can get PC back to normal, but collected data. Observe some elusive behaviour, can use modern engineering tools.

Collected first dataset – 248 students. 85% used the machine; 15% disabled sampling mechanism. Most of them already had a Linux machine. Used Contextual Attention Metadata (CAM) – capture where attention is going.

Used a tool called TimeFlow, shows events over time, and levels of intensity at particular times. Can also see activity before class.

Browsing outside the LMS – 48K events, 15.5k were URLs. 8,669 pages, 29% point to LMS. 3 out of 4 pages are outside the LMS. Looked at what they’re searching – you have notes on topic X, they go to Google, look elsewhere.

Virutal Machine is good trade-off between LMS and more loosely coupled/controlled. Want to make this available for instructors – they come up with the actuators.

Question: Interested in [can’t hear] how they write the code, might be related

Question was – if we took measures of type of code they’re writing, how they use compiler. Yes – every time they invoke the compiler, count errors they get. Now have a hunch, capture feelings like frustration – if compile 15 times with same errors, could be frustrated. When you sit with them in the office, could do intervene earlier. Students try to go shortest path to get the topic done. Many times you have to write tools to get things done. Are you using the right tools? Can pick up quickly.

–

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow