LASI14 Monday (2): Workshop on Learning Analytics in the Social Media Age

Liveblog from Monday afternoon at LASI14.

Workshop: Project on Learning Analytics in the Social Media Age #pLASMA

Anatoliy Gruzd, Caroline Haythornthwaite, Rafa Absar, Drew Paulin.

Caroline welcomes everyone. Talking, then an exercise, break, few more short breaks, then another exercise. Introduces the team.

What can you do with social media? Blogging, twitter, discussion. Learning about the learning networks, who talks to whom about what.

Please participate in our online survey if you use social media.

Relational perspective. Focus on what the ties are. They learn – but what? What are the interactions that go on? What makes up a learning tie? We’re going to explore that.

Part I: Defining ties.

When we ask people what they learn from others, what do they say? It’s a directed tie. Asked science teachers who they learned with, what they learned most. Content analysis – most often was teaching techniques. Content much lower down. Then classroom management further below, external matters. Multiple relations – teaching techniques, but perhaps also science content, emotional support. So look for who has the stronger ties, versus weaker ties. Where does innovation come from?

Looked also at interdisciplinary teams in science and social science. Mostly not about the content in other fields, but about process, method. Also found PIs talked at the fact/field level, but method was talked by methodologists.

Strong and weak ties – strong once have more relations, more frequent interaction, include intimacy and self-disclosure, use more media, higher reciprocity. When you set up an environment, that will connect the weak ties, set the base means of communication – that’s a latent tie structure. If take the room away, the strongly tied people will talk to each other anyway, but weak tie people won’t.

Part II: Relations describe networks.

Cut points in SNA – where two networks would fall apart but for one tie. Find out how resilient a network is to change.

Exercises for today: first part is discovering learning relations using netlytics to help code text into categories. Data from MOOC Twitter chat. Then later discovering network interactions.

Exercise brief

Anatoliy takes over.

Mostly social media data. How do you approach social big data? Sensemaking. Visualisations as first step. Then understanding.

Automated text analysis and visualisation. Can we automate the process of creating an ‘effective representation’ of the texts?

Techniques. Example from paper with Caroline in 2007.

1. Most frequently used words. Profession-related words, but also learning-related words.

2. Important topics over time. Was 4y study. Frequency of use of words over time – e.g. databases popular at start, not Google, but changed over time.

3. Community style. Commonly-used phrases. In this group, uncertainty – don’t think, don’t know, don’t have.

4. Community support. Frequency of messages agreeing, disagreeing, or saying thanks – up to 17% thanks messages by the end.

Practical part

Instructions here: bit.ly/lasi14plasma

From Open MOOC dataset, from CCK11, run by George Siemens [and others!]. Twitter, online forums, blogs. For this exercise, only the Twitter dataset. Cleaned up already – removed non-English messages.

Drew – about CCK 2011 MOOC, run from Athabasca University. Only a month of data from entire dataset. Topics included connected knowledge, rhizomatic knowledge, concept mapping, how connections produce patterns of knowledge. Connectivism vs others, e.g. activity theory, constructivism. Groups, networks, collectives. Personal learning environments, networks, knowledge – PLENK.

Netlytic – only get 3 datasets by default, but email them for more.

Interactive exercise following the instructions above.

Categories – learning, connecting, innovation.

Contribution from floor: LIWC has similar stuff about word choice/analysis.

[Making these categories is hard. Doing top-down from your ideas is fine, except you often don’t find it in your dataset. Bottom-up is easier, since you’re working with the data that is there, rather than what you hope is there. Or perhaps using a hard database of linguistic relationships, like WordNet.]

Suggestion for UI: click edit, select multiple and do the same thing. Also, teach, teacher, has taught – be great to type t*** without close/open.

[coffee break]

Data

Rafa talks, about where their data comes from, and the issues and challenges.

Data sources – MOOCs – CCK11, Change11, PLENK10. Not just one platform.

Start with daily newsletter, pull out blog posts, discussion threads – and from them the comments. And twitter.

CCK11 812 blogs, CHange11 2486, PLENK10 719.

Challenging data, it’s all over the place. Some are mostly images/videos or are live seminars. Some links unreachable, domains gone, login required. Comments partly on the site, partly on blog pages hosted elsewhere.

Participant issues too – worldwide, language not always English, and live sessions at inconvenient times. Also hard to identify single identities across platform, and disambiguate people with the same alias. Develop mapping matrix, for each person.

Social network analysis

Drew talking. Another project with a student at UBC. Twitter use at conferences – whether and how social learning is failitated. Looked at Twitter network around #lak14. Interested at experts and leadersin the field, and whether their positions in the network facilitate learning.

Why this approach? Learning theories – social learning, modelling, ZPD, connectivism. Twitter provides access to and interaction with experts. Experts can be hubs of info exchange, and models.

Looked at h-index, bibliometric measures. Who takes leadership roles associated with LAK. Characteristic for each user.

Positions in network – centrality, betweenness (freq on geodesic path between 2 other nodes), eigenvector (=PageRank), degree (in-degree = prestige, out-degree = influence).

Eigenvector betweenness of retweet network. Experts are quite central, well connected. Many large nodes, network is very well connceted. Betweenness centrality, two big ones – info conduits. Can come from high prestige, but also from brute force approach, having lots of tweets out there. One well known expert, another a well known tweeter.

Mention network – who mentions whom – tightly knit network. (I was quite high in #lak14 – though Matt Pistilli was higher.)

Experts’ positions in the network. Experts more likely to have high centrality than non-experts, retweeted more, mentioned ~40 times more than non-experts. Ran in UCI-net using t-test.

Implications – highly central roles for experts. Facilitate connection-making.

Network viz for understanding social networks

Back to Anatoliy.

Basic concepts: nodes are people, edges/ties = relations – retweeting, replying, mentioning.

SNA, reduces complexity of network. Can see most connected actors. Understand what is going on. Can look at a group of nodes, why are they connected.

Traditionally, for social networks, use surveys or interviews, and they’re time consuming, sensitive, incomplete. So want to go for automated networks discovery from online social networks.

With emails it’s easy, can see who emails whom based on the header. But in social media – many-to-many – problematic. Post a message in the forum, everyone can see, no clear recipient. Chatrooms, listservs, comments much the same. Sometimes threaded, sometimes not.

To do this, two basic approaches. One is chain network, reply-to network. Assume connection based on previous poster in the thread – but misses info. Second approach, name network – look for mentions. Connect sender to people mentioned in the message, and connect peope whose names co-occur in the same message. With a Twitter dataset it’s usually usernames.

Chain networks tend to be long and thin with few connections – ‘noodle networks’; name networks have more connections, but more isolated clusters. (Examples based on YouTube comments.).

The function of names is to get attention, indicate who you’re talking to. Maintains sense of community.

Example of Twitter networks. Each node is a user, ties is who retweeted, replied or mentioned whom.

Three example networks. First example, theatre production at Stratford in Ontario. Their Twitter account is connected to lots of them, but no communal discussion: one central node, plain dissemination mode. Unlikely learning happening. [!] Second example, people talking about 2012 Olympics – many different clusters. Multiple conversations, interested in different things. Group work, but nothing shared across group. Third example, #tarsand Twitter community – clusters over groups, densely connected, but they do have connections across to each other. Part of my student’s thesis work, the clusters are often based on geography.

Not going to interview people for this example. (Though you can talk to some of them!)

[demo/interactive exercise using Netlytic to do network analysis]

Takes about a minute per 1,000 nodes to do a new network layout as a viz.

Fun to explore the clusters, but hard.

Within tweeting environments, there is a role of an active tweeter. Bodong Chen has a paper on the roles – popstar, high in-degree; engine participant is more out-degree.

[Alas! I lost a few paragraphs here of the final wrap-up discussion to a laptop crash.]

–
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow