LAK12 Keynote: Visual Analytics in Support of Education

Liveblog notes from the opening keynote on Tuesday at LAK12.

Shane Dawson welcomes everyone and outlines practical arrangements.

George Siemens talks about SoLAR. Last year, had a post-conference meeting. SoLAR is trying not to be prescriptive. SoLAR Flare – regional conferences, in different parts of the world, October 1-3rd is first at Purdue. Events planned in Australia and the UK. More of a practitioner emphasis. SoLAR Storm is a distributed research lab. There’s an advisory board. Then next layer is research leads. Third layer is doctoral students, postdocs, etc. Monthly meetings online. Connect up to an international network.

Simon Buckingham Shum updates on the Learning Analytics journal. ‘Learning Analytics: The International Journal of the Society for Learning Analytics Research’. Editors Simon, Phil Long, Dragan Gasevic. Triad of education, computation, and sense-making. Peer-reviewed, indexed, open access. Journal-conference synergy.

Dragan – in computation have lifecycle focused around conferences. Not the same in other communities. Not used to high bars to conference papers. Aim to innovate there, encourage educators to contribute to the journal, and those to be invited to the conference. To ensure balance, and avoid double-publishing.

Phil Long – new journal, want it to be credible and useful. Needs an impact factor. Also interested in open scholarship, want feedback and ideas. Journal will be open access. Interested in open publishing, sharing manuscripts and papers in draft, and community comment and discussion around that.

Simon – now thinking about inaugural issue, to be drawn from invited selection of papers at LAK11/12 plus other invited contributions.

Simon introduces the keynote speaker: Katy Börner, Prof at Indiana University. Energy in to outreach efforts as well as excellent research; model of transition from research to practice, and across disciplines.

Katy Börner: Visual Analytics in Support of Education

[Katy’s slides available here (PDF)]

Three parts to the talk: motivational overview, theory (formulas, algorithms, technicalities), practice (some tools you can use on your own data).

First example: monitoring and evaluating 3D worlds.

10y ago, weren’t extensive tracking in real worlds. Active World, Second Life, could track in detail. Try to understand the navigation, and support it and people in their foraging activity. Aim to design learning environments that really work, lead to empowering learning.

Only a few in the audience have used Active World or Second Life, but lots of people have GPS devices with them right now – which gives you the tracking in real life that was previously only possible in 3D worlds.

Imagine being tracked at a conference (VLearn 3D Conference, Dec 2002) – position, tweets, teleport (!), map who’s near to you and can overhear. Conference structured as a series of five separate world, and then a final closing ceremony with virtual fireworks. Shows a map of colour-coding of people moving through those worlds. Reason for virtual conference was finance – teachers couldn’t get funding to travel.

Each world was rectangular; mapped people moving through that space, and through time. Quest Atlantis worlds. Deployed with children, was very motivating.

(Börner and Penumarty (2003) “Social Diffusion Patterns in Three-Dimensional Virtual Worlds”, Info Visualiszation 2(3), 182-198)

Interested in group dynamics, patterns of learning. Another work – looking at spatial proximity and semantic coherence. Knowledge landscape depicted – wanted to know how users explore these information spaces. Showed different papers with linkages in a 3D space; could click on ‘citation sticks’ and get further information. Very specialised. Can track what people are doing, what they’re talking about, and how their conversation relates to the space they’re in. Can map chat sequences to the physical space.

2nd Example: Teaching children a holistic understanding of science.

Many classrooms have a map of the world on the wall. Many children won’t see them all, but useful to have an understanding of how you can take data and overlay it on the base map of the world. Wanted to teach them the base map of science.

Mapping Science Exhibit – 10 iterations in 10 years. Terabytes of papers, patents, grants.

An illuminated diagram. Combination of hi-res printing glued on to backlit projectors. Shows map of 554 sub-disciplines, and that highlights who and where in the world that’s being researched. Also can see impact of Nobel laureates and their impact.

It also travels in trains! Trains can bring a lot of science to a lot of people. But nothing can be too interesting or people don’t move along. 62 cities in 7 months.

Took the real maps and data, and overlaid watercolour paintings to make it more tangible for children. Had puzzle pieces, had to try to identify the right spots for 18 inventors and inventions. Science books in Europe, get many European inventions/scientists. Similarly for US. Did the same for the science maps. Travelled to many libraries and museums. Took in to classroom settings too. Children think it’s quite normal to see the two maps and use them the same way. Discussion about where to put someone with many contributions in different areas – e.g. Einstein. Asked children about what they might do in science – one 7yo said she wanted to be a nuclear physicist. Also puzzles, colouring activities.

Call for Maps for next iteration. If interested in having the exhibit, they’re always looking for new places.

3rd Example: Introducing data analysis and visualisation to classrooms and government agencies

Not bringing tools to another ivory tower, but people who might use e.g. PowerPoint or Excel, but don’t do network analysis or burst detection. How do people make sense of data? Can they read network layouts? Do they use visual analytics tools and services?

Goal is to empower people to have fun with big data and tools. And identify patterns, trends; and make more effective decisions.

They used Wordle, Gapminder, and Google Trends as tools. Even children in classrooms can do this – e.g. pizza vs spaghetti.

Theory

Approach: start with stakeholder group, do a needs analysis – what drives them, makes them most productive, what are the priorities. Next, understand what data they have. Or set up processes to get data. Then set up workflow to help them visualise it. Preprocess data, clean and interlink, geocode, extract networks (80% of entire project time on preprocessing!). Next data mining – could be simple, extracting backbone for the network, simple trend detection. Then lay out plot, geomap, whatever.

This is very serial. Inherently parallel process of designing meaningful visualisation. What can they read? Timelines, geomaps are good. Networks are tricky, not learned in school; circular too. Projections and distortions – e.g. globe to 2D. In scatter plots, may need logarithmic scale. Then overlay raw data on the plot, will show patterns – e.g. missing data! Next layer is graphic design – colour, shading; ideally an artist. If data large, need to aggregate and cluster – colour coding, boundaries, etc. Different views of the data. Interaction, ways of moving on e.g. taking you to the original/source. A legend to give indication of what it is and who created it – so they can get back to you. Only put out things that matter to you and your stakeholders so you’re proud to sign it. Then deployment. Big difference between handheld devices, or large display walls. Print on paper – can pack more in to a square metre than on a monitor.

(More details in her book, Atlas of Science (2010).)

The different algorithms required – sometimes written in C, Fortran, Java, whatever. Often output of one doesn’t fit the input for the next; very challenging. Created tools that help.

Different types of analysis: profiling, temporal, geospatial, topical, network – vs level: micro/individual, meso/local, macro/global.

One study: Mapping Indiana’s Intellectual Space. Geomap showing academic, industry funding data in biomedical domain; showed links too. Shows pockets of innovation, pathways from ideas to products, and the interplay of industry and academia. Flash animation to show changes over time, show the proposals, sort by PI, dollar amount, and so on. Could do this for students and their progress; research; industry-academic partnerships.

Another: animate networks of info vis researchers. As time goes on, see nodes growing as they publish together, colour-coded by citation. Shows man networks unconnected. Quite normal – researchers work with their own students. Typically universities have a single main researcher; but at places like PARC have multiple ones who work together.

Another: understanding impact of internet on the importance of space. Do you still need to be at MIT/Harvard to do the most impactful work? 20y dataset of PNAS publications (biomedical). Citation sticks for each institution. Plot of log of geographic distance vs log of institutions citing each other. Expected curve to flatten – more citations over longer distances – but actually it gets steeper as time passes. A similar phenomenon elsewhere – as we’re bombarded by information, we use our scholarly networks to decide what should be included in the list of references – also taking in to account those we will see physically in the future.

Recent work with funding agency – NIH – looking at ROI, funding and publication data. Interlink funding data to publications that acknowledge the funding. The networks from centre funding look very different from other funding.

Mapping topic bursts – topical trends. Same 20y PNAS dataset. Identify 50 highly frequent and bursty words in the top 10% most highly cited papers. Nodes, circle size is the burst weight, colour is the year of burst onset, ring colour for maximum word count. Normal science can be predicted to a certain degree; but other stuff is less so.

List of references, special issues and so on – in her publications.

Practice

You can map your own data!

Börner (2011), Plug-and-Play Macroscopes, CACM 54(3), 60-69.

It works like a Chinese buffet – go to a tray, pick the tools that read in your data. Could be a database connection.

Next you move over, deliciously prepared data cleaners. 80% of time will be spent here. Pick these unification algorithms – e.g. geocode from address string, extract networks, etc.

Next tray – entwork analysis, temporal analysis/burst detection.

Next – communicate results, grab a few visualisation tools in the right-hand corner.

Final tray for data writers to get the data out.

Pick your tools, they are all in your menu system. Every morning, get new datasets, algorithms, workflows.

Galileo ground his own lenses, nobody else could make them; Katy develops her own tools. A matter of taking what exists, extending, inventing what you need to make your dream tool.

Microscopes and telescopes are very static instruments. Now have very flexible macroscopes, help you make sense of processes that are too great for the human eye to comprehend – and the tools evolve on a daily basis. Have to embrace new technology, data – become a data junkie, try out new toys, tools, and have fun with them. Macroscopes are continuously changing bundles of software plug-ins. Can share the plug-ins; becomes very powerful – could become as easy as sharing on YouTube/Flickr. Assembling a new tool is as fast as making a new music library.

Another food metaphor – people sit around the table, from different backgrounds and different views. Matrix views, node/connection, or just edges/single characteristic. Have people who typically cannot share their tools. Want to create an infrastructure where you can plug and play algorithms. OSGi – Open Services Gateway Initiative – modularising software. Has cost more than NIH and NSF together could afford. Small piece of infrastructure – CIShell – sockets in to which you can plug datasets and algorithms. Developers have algorithms in whatever language; then CIShell wizards to turn them in to plugins. Then at the other end, users create their tools by building workflows, using eg. Sci2 (Science of Science) or NWB (NetworkWorkbench) tools. Need computing people to set it up, but not to operate/share. Tutorials on CIShell – don’t have to share if you don’t want to. Can get citation counts for sharing.

Network Workbench tool. 170 algorithms in it. Used in computational proteomics, computational social science (Wikipedia). Compared to Gephi, has a lot of data analysis algorithms. Computational epidemics.

Sci2 tool – funded by NSF. Sci maps, network vis, horizontal time graphs. Can look at e..g profile of a conference. Simple geomaps, hierarchical network maps. Extensive documentation.

Demo with educational funding. Looked for NSF funding from ‘DRL’ – 769 awards, 891 PIs, $1bn funding. Includes funding back to 2002, and ones just started that will continue for another 5y. Have title, program, start/end date, organisation, amount awarded, abstract (could do topic analysis), PI and Co-PIs.

Map awards over time – size by award amount. Can spot big bars, small fat ones, ones still going. Could colour code by e.g. gender of lead PI. Or career, equipment awards.

Co-PI network in a network vis. Bimodal network PI-institution to Co-PI network – shows lead PI institutions connected to PIs on the whole award. Can visualise e.g. in Gephi or Cytoscape.

New tool funded by NIH.

(Slides will be up on the LAK website; she also has some handouts.)

–
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Author: dougclow

Data scientist, tutxor, project leader, researcher, analyst, teacher, developer, educational technologist, online learning expert, and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management. After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in a wide range of contexts and industries. View all posts by dougclow