Liveblog notes from the afternoon of the pre-conference workshop (Sunday 27 Feb 2011) of the Learning Analytics and Knowledge (LAK11) conference.
There was an excellent discussion over lunch about the power of simple measures to infer great – and privacy-threatening – amounts of information. So, for instance, you can guess well at what appliances someone has in their house simply by looking at the usage of electricity over a 24-hour period, using a smart meter. And then that raised the possibility of identifying an appliance that was about to break down simply from a change in its use in electricity over time. Which would be gold dust for advertisers, since they could potentially target you with an advert for a new washing machine before you even know that you’re going to need to buy one in the next few weeks, because they know yours is about to break down.
Linda Baer – Systemic Adoption of Learning Analytics
Linda is Senior Program Officer at the Bill & Melinda Gates Foundation, but joined in July. Previously with the State of Minnesota for many years.
It’s important to do the at-the-moment learning analytics – what is learned through to outcome. But has to link to leaders knowing why that’s important.
This is not a cycle, it’s a reset. – the financial situation, things are changing fundamentally. What we do know builds what will happen on the other side of the tough times. Also, very much a moving target for education. Obama aspiration that US will have highest proportion of college graduates again by 2020. Big focus on college education, and on poor completion, and education. Even with high unemployment, employers are struggling to find qualified workers. Education critical, but graduation levels in crisis.
Action analytics – takes academic analytics (large data sets, statistical techniques, predictive modelling) and also adds in the whole enterprise being engaged, accessing, using data for accurate problem-solving. Optimising institutional performance as well as individual success.
Connecting the dots is crucial – being more concrete about measures, targets, assurances and practices.
View of competitive advantage coming from strategic intelligence. Higher levels of analysis than standard, ad hoc reports, and queries/alerts: so statistical analysis, forecasting, predictive modelling – leading to optimisation.
Analytics DELTA – Data, Enterprise orientation, analytical Leadership, strategic Targets, and the Analyst. Students understanding their own information too.
Examples: Gates Foundation, Lumina Foundation, many others.
US State Governors are meeting right now, and talking about metrics, with Google – Complete College America have linked with the National Governors Association – on a Complete To Compete.
Gates Foundation are interested in this area. (Also in many others, including eradication of polio, malaria reduction, and so on.) Buffet donation to Foundation, led to push to other important areas – what would make a big difference? If they could double the number of low-income adults who earn postsecondary degrees or credentials by age 26 by 2025, would make a big difference. Break generational cycles of poverty. So major effort in to the completion agenda – the NGLC grants. Major resources. Hybrid courses, web 2.0, tools to make big difference, learning analytics. Had 625 proposals on that strand. How many in learning analytics? 90. Wonderful proposals – a bird’s eye view of where learning analytics was going. Interest because they want to assess points where students are lost, where they are motivated, and so on.
Modelling the pathway to success. Let the students in, take their money, you owe them more of a success rates. Connection (interest to application); Entry (enrollment to completion of gatekeeper course); Progress (enrollment to completion of gatekeeper courses); Completions (complete course of study to credential with labour market value). Also transition to the workforce is key.
Have theory of change, built in to a four-part mechanism: robust, transparent solutions in the marketplace. Increased capacity – 1.5 institutional researchers per campus, for community colleges – very low to have revolutionary change in completion. Continue to improve performance. Build knowledge and awareness.
Lumina mainly focused on Higher Education – have big goal – 60% with degrees by 2025. Target 16 million more graduates than at current rates. Data elements – outcomes in developmental courses, outcomes in gatekeeper courses, persistence and graduation.
Complete College America – about raising expectations. Late registration a big problem, having people in for headcount date, but are very unlikely to complete.
Challenge of Federal Government carrying seat time – the Carnegie unit. Problem with online and hybrid education. Pressure to guarantee seat time!
Also, making things optional – like Orientation. Students don’t then do them. They don’t come understanding what to do to be successful.
Washington Student Achievement Initiative Metrics – First year retention (15, then 30 credits); completing college-level math; completions.
What if students do the work towards an associate degree but not made it all the way to a four-year degree, partner up with a community college, issue them the associate degree if they make it that far from the community college.
Minnesota State Colleges and Universities.
Combination of many different institutions, cooperate and collaborate. Want to make transfer work between them.
Developed a dashboard – red, blue, green – publicly available. Rang institutions before they were about to show up in the red to alert to possible negative media interest. Access and opportunity, quality programs and services, meeting economic needs, and so on. Performance thresholds were very complex; worked on how to validate each one.
Move to action bethind analytics – know what works, then customise interventions on that basis.
Calls on people to go forth and partner. Can’t do this alone.
Minnesota Analytics Partnership – everybody had a nice interesting piece. Dashboard, embedded faculty-calibrated learning analytics, improved business intelligence. Focus on student success and advising.
Hierarchy of student success through action analytics – Gates Foundation – raising the analytics IQ. Stages of analytics in companies … now doing it for higher ed.
Griff: Wonderful stuff. But playing with cherry on top. Problems are in the K12 system. What mechanism is in place to transfer this down?
Address in multiple levels. Look at Wisconsin! And Ohio. Teachers are the focus. The issue is connecting your data points. Open access is one big problem. Gates has a college-ready unit. Math is a particular issue.
Someone: The goals are interesting. Interesting – an institution can meet those goals by reducing requirements for the degree programs. Might change what the degree means. Issue with curve-grading versus criterion referenced marking. They’re gaming for in-seat metrics now; they would optimise for completion.
Profiling – you won’t let people in who you think won’t complete. Where will quality go? Also, we don’t know where the economy will go. Everyone talking about STEM. Presidents are smart at following the money. Is a real question.
Simon Buckingham Shum & Anna De Liddo -Theory-based Learning Analytics
Simon and Anna are from the OU, and the Knowledge Media Institute.
Most of the analytics just now – web analytics, data mining – weren’t developed by people who knew anything about learning or sense-making. We may need to invent new kinds of analytics. It’s not just mining the existing data. We may invent new kinds of learning theory enabled by learning analytics. Like big science is doing with new infrastructure.
Theory and Analytics
Every analytic has a theory – at least implicit – of learning. So theory includes assumptions, evidence-based findings, models, methods, academic ‘theories’.
The question is whether a measure has integrity as a meaningful indicator – and who or what acts on that proxy measure of the real world.
Need to help guide who to meaningfully and ethically present analytics.
A mature theory might then drive analytics that can couple to adaptive system behaviours.
RAISEonline – learning analytics in English schools. There are mandatory tests. Theory: Reasoning goes that core subjects can be measured in controlled exams for the whole country. Analytics: Can generate from test scores, accounting for contexts, national league tables. So get analytics from the Government – Ofsted – can see how you relate to the mean.
OU’s analytics – have been gathering since we were founded. Years of data reveal significant patterns between history, demographics, and outcomes – formalise as statistical models, compare prediction and outcomes. Effective interventions – many validated interventions and understanding of how they improve retention/completion rates. Modelling of patterns of success – have a definition of what an at-risk student is. Previous OU study data is best predictor. Empirically-based findings.
Sensemaking – Karl Weick. Cognition breaks down in certain ways faced with complexity and overload. Tools could be designed to minimise breakdowns in sensemaking. So have mapping of risks in sensemaking in complexity to a focus for analytics and recommendation engines. Presented in Buckingham Shum & De Liddo, OpenEd 2010.
Learning-to-learn analytics. ‘Shift Happens’ quote. Generic, transferable, C21st skills. Learning power work from Bristol University – generic set of skills that characterise effective learners, can make those explicit, develop, learn better. Analytics generated in many ways. Have seven dimensions (from expert interview and factor analysis, then validated empirically). Resilience, critical curiosity, strategic awareness, creativity, meaning making, learning relationships, changing and learning. They’re not discipline-specific: trying to develop in people for situations we can’t yet antitipate. ELLI – Effective Lifelong Learning Inventory – self-rating on seven dimensions. Conversation with someone trained in the approach, suggest interventions. Embed in online platforms – e.g. ELLI spider. What would e.g. signal resilience? or asking good questions? Could pick this up automatically, in future.
Discourse analytics. Talk about in more detail on Tuesday. Addresses quality problem. Theory says learning conversations have typical characteristcs – so analytics to analyse stronger or weaker contributions. Socio-cultural discourse analysis: disputational talk, cumulative talk, exploratory talk. In the latter, knowledge is made more publicly accountable and reasoning is more visible in the talk. (Neil Mercer). Once something is visible, it’s potentially computable.
So, can pick up within Elluminate – would be useful if we could identify where real learning was happening, rather than social interaction. Could we take Mercer’s framework, spot canned phrases which might signal each category and thus identify where learning might be happen. Suggests that you can begin to do that.
More sophisticated with Xerox Incremental Parser (XIP) – detects levels of knowledge-level claim – Agnes Sandor. Machine can analyse a corpus of documents in seconds, and classify elements. Then compare human and machine annotation of literature. Exploring where the humans add more value, and where they overlap. Just emerging from the lab at the moment.
Traditional threaded discussion forum doesn’t show what’s happening in argumentation terms. So have Cohere – discourse-centric analytics. Can see if learn compares own ideas to peers, or are a broker connecting peers’ ideas.
New learning theories may emerge based on the vast data, and in turn generate new analytics.
Terri Lynn-Brown & Alfred Essa: Analytics Research – Community & Collaboration
From Desire2Learn. Analytics baked in to tool design.
Value around simplicity, elegance and beauty. Example – really cool application – iPhone/iPad, Android – A2ZEconomy. Visualisation of US economy data. Corporate profit dataset – straight graph, can show presidency. Corporate profits are at an all time high again. Same story with GDP. With a simple interface, can pull up a lot of data, with a very elegant and clean interface.
The tyranny of choice. Often believe giving people more choice is better. But much data in e.g. behavioural economics, that as creatures we don’t like lots of choices, and our brains shut down if presented with too many choices. Worth thinking about when creating analytics interfaces. In propaganda, good way to shut people down is to present too many choices!
Chess playing – Kasparov has written about machine intelligence and people intelligence. In the chess-playing world, know that the best chess machines can beat the best humans. But also finding certain combinations are a surprise. A relatively weak human with a strong computer with good processes will beat a very strong computer or a very strong human. The combination of strong machines, reasonably competent humans – and good processes (key) – can solve problems no other combination can.
From a vendor perspective, huge market opportunity. Digital education, global, growing in leaps and bounds. Best of times! D2L was 125 people a couple of years ago, now 250, with 60 open positions. Just on North America.
Learning environments are still in their infancy. They are inadequate, we need to evolve LMS to learning environment. Uncharted area – we don’t know how that’s going to go. Parallel with Hilbert’s challenge in mathematics: What are the major ‘unsolved’ problems in learning and knowledge analytics. Potentially fertile stimulus. In maths, one problem was around algorithms and mathematical logic, led to Gödel, Turing and computers. Or Donald Rumsfeld’s unknown unknowns – we do not even know we don’t know them!
From reporting to analytics – from data access, reporting, forecasting, predictive modeling, then optimisation.
Hard problem: Infrastructure. On enterprise infrastructure, things are not as easy as is made out. Servers, logs, architecture, run in the enterprise – is expensive, unsolved, difficult. Mainstream organisations don’t have infrastructure that’s needed to do predictive modeling. Everyone from Napoleon to Hitler thought to invade Russia
Hard problem: Focus. Capacity in organisations – trying to do many things: competititve benchmarking, organisational effectiveness, academic success, student lifecycle.
Hard problem: Foundations. State of data management is often abysmal. Data accuracy is low. People are not attending to data quality. Can build wonderful systems, but data may be crap.
Predictive modeling – look at student pathway. Like George’s “Who’s going to come to the hospital?”. Identify students, understand the cause, apply the intervention, track the success. Across the journey – apply, matriculate, remain in class, re-register, pass, have high GPA.
An art and a science. Science requires data. Art – what’s the minimum intervention I can apply, for the maximum effect.
Capella – primarily online, for-profit. Problem statement: Student takes first online course, two weeks passed, is there data (leading indicator) that tells us whether he’s likely to drop out, and can we design an intervention to help? Simple measure – ratio of this student’s time in forums, over the average rate for whole class = relative discussion time. Low levels are highly indicative of dropout.
Rio Salado – problem – 2LC score – number of logins in first two weeks, divided by number of courses. High risk = small logins, large number of classes. Statistically very significant. Doesn’t work in other large courses beyond biology – didn’t work in chemistry!
A hard unsolved problem: validating model’s applicability in a particular context. And even harder is to validate it in a general case.
Another hard problem: Enterprise architecture. What analytics do we implement where? What’s the architecture of these tools? Does it all go in the LMS? D2L has an analytics product, so tend to shove everything in there. But has architectural implications.
Hard problem: Designing faculty and student-centric tools. Pushing faculty on proper design. From a course design perspective, what the tool can tell you. Not just increased participation – is that what should be happening on the course? Generally need a lot of data.
Hard problem: security and privacy. Connecting enterprise data, social data, public data … as CIO, would think that was crazy. How can we prevent everyone from having access to everything about me? Has enterprise data, private data – some want available to e.g. the Registrar but not others. Controls only exist in transactional systems – if transfer to a data warehouse, lose that control.
Possible areas of collaboration
Can’t solve these on our own.
Blackboard strategy has parallels with Microsoft – plan to be the operating system, define the ecosystem, and be everything. They are acquiring everything related, they want to be the vendor for everything related, including administrative systems.
Desire2Learn want to collaboration with universities, foundations, other vendors.
Possible collaboration areas:
Sharing and reviewing predictive models, open-source style. Problematic if one vendor owns the predictive model.
Architecturally – building scalable, affordable infrastructure, and working out what elements belong where.
Exploring what are the critical data elements for learning and knowledge analytics.
Someone1: Signals being acquired by SunGuard. Much talk around models, about traditional inferential algorithms. But this is more about processes; a process can be considered intellectual property.
Desire2Learn were sued for a patent violation by Blackboard. The concept/software design claim was bogus. But Blackboard initially got a patent, and had it upheld in the courts. It took years for that patent issue to be resolved. Big troubling issue on validity of software patterns. Fear that some software companies will say, we’ve developed this model, and we’re going to patent it, which will lock out others.
Someone2: Process can’t be IP. But if we have a large set of students, devise a model.
Someone3: Can see what’s going on inside it. But could be problematic if licence agreement says can’t decompile/reverse engineer – don’t want to use a black box.
Quality of data is a key concern. Would be nice to get people to agree on e.g. an ontology, or structure of data. One example – clean data is better than more data, and more data is better than bad data. How do we get data from one place to match up with stuff that doesn’t follow the structure or format. This is one of the hardest problems.
(I think that this is actually going to be the key skill of a learning analytics engineer in the future – a facility for joining up mismatched and messy data. Getting all the data Right is just not going to happen.)
How do we take unstructured, messy data and make sense of it? Some tools available.
What are critical literacies for educators? You may as well learn Python. Hang out with R a lot. Big structured and distributed data formats. Analytics tools: Gephi, R, others. Won’t all have skill in all of these.
We need datasets, and need access to them. At Athabasca, if anonymise their data, could be really valuable for researchers to draw trends and inferences. NEED MOAR DATA.
Surface understanding versus actual depth is a key problem. It’s nice to know who chatted to whom. But from there to learner success is a wide step, and not one we can make yet. Time on task may be too simple and glib.
(I also think that a key principle in learning analytics is the good-enough metric: it doesn’t have to be totally right to be extremely useful. So predictive models aren’t 100% predictive – but still add huge value if e.g. can trigger interventions that reduce dropout. And analytics that work most of the time (but not all) can still be very useful.)
Dashboards. Need to be learner-facing. Everything that an institution gathers about a learner, the learner needs to have access to as well. The analytics have to face the learner as well as the institution.
Algorithms – Google is facing this with content farms. If people know your metrics, they will game them. How do we update the algorithm? How can you have an algorithm that accounts for continual changes in learner behaviour and contexts.
(I think it’s going to be a question of continual activity, not inventing once-and-for-all analytics. We’ll have to keep at this the whole time. A cycle of investigate, develop, test, evaluate and investigate again)
Philosophy behind LAK11 – multi-disciplinary; full-spectrum of learning; systemic development. Connecting disciplines increases value of each. Tricky issues in worldview between e.g. computer scientists and more social-view educationalists.
What if we had agents that tracked everything we did, at night it links against background, then fills gaps and agenda for tomorrow? Would be awesome … and freaky.
What does a data science team look like? Two approaches: traditional one is have an individual who knows Python, creative, knows R, can run hadoop cluster or purchase Amazon services, and ask intelligent questions of data. Only a handful of people have this all. Alternative, have a team. Stakeholder (defines the question), Data Scientist (the lead, who understands how systems relate to each other, clean data), Programmer, Statistician, Visualizer.
Someone: Also need a bridger, link, to communicate between them.
Someone: Call stakeholder the client. Need broader stakeholders too, involve as part of the team.
Someone: Need a bridge to rest of institution – e.g. registrar area, they may not know about the details of how a course is taught
Shane: An interpreter – take the visualiser and turn it in to action. Explore what happens because of this.
(missed a bit here)
Someone: Need a champion within the organisation.
Phil: Project manager. We’re talking about software development.
Someone: These distinctions make a lot of sense a lot of time. But there will be folks who do do a subset of these. You need a bit of all those to do a lot of tasks. Want to have training programs that make people diverse. Also need to know about education. But in machine learning, need to know much of these.
If going to deploy analytics at an institutional level, will need these skillsets – whether they’re separate individuals is secondary.
Someone: Internal communication expert to communicate the information to rest of the organisation, to ensure it reaches the people who have to have it.
Someone: Also, want it to be interdisciplinary. If it’s at a policy, or institutional/managerial level, or learning science level – represent different academic communities. Pre-test/post-test paradigm is going.
There is an integrated degree being released, resources available.
Dragan: Making same point. Talking about how to get students to graduate – question about quality. That requires educational scientists.
Someone: Student success coordinator. This is going to need a whole other team to implement.
Insight from analysis has to feed back to the instructional designers, similar to Shane’s point.
Simon: Taking that formidable skillset, and Al Essa on what it takes to build an infrastructure – you’ll probably outsource this to whoever is able to offer this as a service. If you can do this, you’re made.
Before we leave the conference, incorporate a company!
Linda Baer: The learning innovation, when they make this concrete, things will have happened – may get entrenched. Need to keep fresh.
Griff: Need someone to pull this together. Pipeline to stakeholder communities, not just generating stats for their own sake. And a good fundraiser.
Anna: Could be data scientist, but someone should take care of controlling the quality of the data, the provenance, the intrinsic quality. Related to some low and legal evaluations.
Chris: Talked a lot about privacy and ethics, but not represented here, gets more important. This team is very expensive now – several millions of dollars. Need to know when you need who on the team. Different questions may need different skillsets. Team has to be flexible, grow and shrink with different questions.
Someone: This is more like what a data science project will involve, the people deployed will depend on the project. More a concept map useful to people to see what’s involved in a project through from conception to completion.
Alfred: Big missing piece – analog of the Russian winter – IT and Legal may say No. Have to get their buy-in.
Someone: I’ll support that – we can’t use Dropbox as a collaboration tool for student data because it’s
Someone: There’s no faculty listed here! Possibly as stakeholder – but have to have faculty input to what would be useful, not just individually but on the program level. Another thing – stakeholders exist in different types. Champion to coordinate with multiple stakeholders. See the bigger picture, and prioritise different project efforts. Human resources as well as finance.
Someone: Faculty – the data scientist understanding the data; could be a data expert who could be the faculty. Need to know the backstory, the trend, why it’s collected the way it is.
Comments Note: As of 10 Sep 2012, comments are closed on this post because it seems to be attracting heavy amounts spam. If you have a comment, please do leave it on a related post.
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.