Liveblog from second session on Monday 1 July 2013, at LASI13.
Panel: Ethics, privacy, access
Chair: Grace Lynch
Panelists: Abelardo Pardo, Patricia Hammar
Doing things a bit different – personal introductions.
Legislation and technology are coming together. Key guidelines around reducing that distance. From our specific points of view, look at how organisations are considering privacy, ethics, data access, ownership and other different issues and how they’re starting to be addressed.
Trish. Theoretical mathematician turned lawyer. (!) I do a lot of work automating policy compliance in to technology. Looking through laws and making ?bullions out of them. Have done that in other fields. Working in LA through Roy. Recently finished paper on IRB issues in legal analytics, also working with Nat Acad Sci on common role (?) rewrite which guides IRB decisions in the US.
Computer scientist. Approach from comp sci perspective: found myself teaching, saw possibility of a lot of data. Used tricky/sophisticated methods. When I realised I could collect all this data, I had no idea what I could do – potential to get in to serious trouble. Needed guidance. Also an electrical engineer. I tell students – we’re focused on solutions, the shortest path to the solution. With privacy I apply that approach – I need an answer, and in 3h. I know there are a lot of issues … but I need a solution. What are the major blocking things I need to be aware of? Other experts will figure out the details, but I need to know the basic issues. Also very keen on finding helpful analogies – e.g. it’s as if you were … Can do this with privacy.
I started as a mathematics teacher at school level, then physics at community college, then education and ed tech around maths ed, how we know what we’re doing improves learning. Now evolved, I’ve managed centralised learning & teaching centers, asking how do we know ho we teach makes a difference and how are we grabbing that data? We have a new data warehouse, LMS, early alert programme – 600k student records. Do students know what we’re doing? As we develop predictive models – ethically, if we predict say 80% chance of failure, do we take their money? What supports do we give them? What if they find out we predicted failure and they sue us? Real cases live now.
Trish works with federal US government and Gates Foundation. Expert on these privacy issues and on drafting legislation.
In case you’ve not heard this from a lawyer before: Yes. Second thing – I will go to Australia, I’m available (!). IRBs. I believe in compliance, I really do like IRBs, I want them to be efficient and effective – many are neither. Aim not to get rid of them. I’ve learned if I give it to the lawyers it takes longer so I don’t give it to them – that’s long.
Adam: What’s an IRB ?
Institutional/Independent Review Board, reviews research for the safety to the participant. They come from the common rule, a health and human services rule, 45 CFR 46 – that rule guides all medical research and all research involving human participants.
In this area, people think IRBs take too long. Even things that are exempt, take a while to be confirmed as exempt. They’re sometimes not very tech-savvy – a huge issue. Accommodations to them sometimes make data reuse hard – e.g. only you can see this data ever. Hard to create institutionalised capability. Don’t want too strongly limited data. Also multi-site multi-discipline studies become close to impossible – you collect IRBs as you collect sites/disciplines/universities, have multiple people to agree on a solution and there’s little standardisation. When you get big data, very difficult. Aggregating data sources – people are learning this first hand.
Big question: What is the risk to the participants. Student population. What are the real risks? The IRB is trying to see what risk the research itself adds. Often in education, there’s not a lot of risk to the activity of the test – e.g. not giving a new drug. People aren’t going to die. Usually it’s not even a good or a bad thing which side of the experiment they fall. Some of it is, but some of it is not so much A lot of it is we need demographic data, and that has privacy concerns. So instead of the IRB, who are supposed to look at the research, they’re ending up as privacy and computer security experts. Another issue: consent! When do you need it, and how much? That is in the IRB’s purview. Great opportunities now – the common rule, it is being rewritten. Nat Acad Sciences are doing a study looking at how this impacts on behavioural research. That research will get impact in to the rewrite – so e.g. medical approach rules aren’t transferred so much. Want to apply HIPPA. Healthcare privacy rule, quite extreme, would’ve been difficult to meet. Trying to back them off from that. Dealing with all of those pieces.
How should they change? Trying to get more guided elements on what research is and how they’re evaluating it. There’s two disciplines. It is cross-disciplinary, don’t want to become experts in other things. Privacy protection is a big discipline. When IRBs do it they tend to be too conservative – don’t realise how much is out in the world and how to protect it. Want to pull out, what data are you collecting from a research perspective, what are you adding from a privacy perspective that are in other places already. Make the questions distinct, can answer them more clearly. Have privacy levels for various data, say in this level, therefore needs this type of protection, then this model of computer security. Helps with the expertise problem. Also want them not to limit reuse. Content minimisation – e.g. take two fields off. Doesn’t make it any better. Consent forms. Biggest things is the process. Templated analysis of types of research. Four areas – consent, when can passive consent be used, does active consent skew participants. There are people who can respond to active consent requirements, and some who can’t – we believe there’s a skewing of the participant base. So you’re hurting one element of ethics to get to another. Collection. Can we say what’s Ok to collect. Is someone hurt by the fact that they learn algebra differently? We have to pull out the elements that impact on people. Storage and transmission. Most of it isn’t something most IRBs shouldn’t care about. It’s an infosec problem, someone who knows about that should know what to do. Some things you should – large aggregated datasets, how you can combine data – those are real research issues. Reuse. You get all the rules about how you can use the data, but when you want to share, things change. If we want an educational discipline, have to do that. Better for participants to participate once and have their data used twice than vice versa. We’re going to come up with categories, matrices of research types. I’ll start digging in your stuff. One is – what is your research. What age/populations are you researching. What research data are you (actively) collecting. What demographic data are you collecting. Is there anything unique in those forms. Any other unique factors. Biggest is the research data – if instrumenting brainwaves it’s different to a clickstream.
John Mitchell, Stanford: The MOOC providers collect a great deal of data – clickstream, MCQ answers, discussions, demographics, other data. Providers return some portion of that to universities, some have been resistant, partly for cited privacy reasons – TOS and other complications. It’d be a great service to have some clarity about legal and ethical issues. Guidance on what could be returned to us.
Current rules allow everyone to decide their own stuff. If we can give some standards, people don’t like making decisions they can’t understand. If say can understand this this way, IRBs will tend to be enthusiastic to adopt. There are some clear standards about how certain elements should be dealt with.
Elizabeth ?Rowe from Turk: COPPA rules – especially kids under 13.
Nothing will help you tomorrow. Goal to work across a lot of laws to come up with standard understanding of what those laws are. Across the US Government, a privacy dictionary so use terms consistently across agencies – used UML, say what accept as proof in a system. Not quick, is consensus-building. There are 352 laws on information handling on federal data. More on others. You get tired. Computers are good at doing things over and over. Congress reuses words, so can optimise to a set of rules in the systems we work on. Goal is we push the policy people towards objective measures where we can, clarify vagaries.
Paul Cholmsky: In Healthcare they have this issue too. Some companies, privacy analytics, tools for de-identification. Also do e.g. risk assessment based on the data. Are they applicable?
Yes. HIPPA has created a whole world of privacy experts, vendors, capabilities – anonymisation, prove statistically you can’t re-link it. There’s guidance on this. Vendors do it too. That might not be what we want. There’s uses for anonymised data, and there’s times we want that data. When your research needs that information, how do we protect it? Not one solution for everything.
Someone who has to deploy an LA application in the next 3 days. Working on flipped classroom with Phil. Suppose you have a magic lamp with ethical genie, you get one question, what would it be? Audience chip in suggestions:
- What’s the least I can do and not get in trouble?
- What’s diff between collecting data for teaching and research?
- Is it too late?
- What do third parties, startups, how do they participate in it so they don’t get in trouble
- What if there’s underage kids I don’t know about?
- How can I share third party data?
- Who actually owns the data?
- What does it mean to own the data?
- Is anyone actually getting harmed?
Immense difference in responses there. At the beginning, I followed examples to help me navigate. The users own the data, don’t think it’s my data, you borrow it. Experience with students, you have to offer something valuable in exchange – would you let me track your clicks etc, what are you offering me? Internet considers privacy like money – people pay you with their data, have to offer something in exchange – has to be something they perceive as valuable. Otherwise why would they exchange? It’s too expensive for privacy e.g. demanding HIV status – that’s too much for something not worth the price. Notion that we all work in different contexts. In a learning experience, helps to think you’re collecting data for a specific problem in a specific context. If tell students you’re making things available in public, that’s too much. If you take data out of context the context loses integrity. Safe and valuable.
Two elements to context integrity. Who do you distribute the data? How accurate is it?
These won’t get you out of trouble but at least point you in the right direction. You’re borrowing the data. Core European data idea – request to be forgotten. Case recently, someone in financial trouble, got out of it, wanted information removed, court said no, freedom of information. But legislation is headed in the direction of the right to be forgotten. Someone might say I want all my data removed, and you might have to comply.
Caroline Haythornthwaite, UBC: Group ownership. As individual, contribute to a group discussion. Who owns the discussion?
It’s made out of individual pieces that are put together. Not clear who owns the data. Not clear what happens if one participant says they want to remove theirs. Don’t have an easy answer. Discussion is one step of transformation. After the sixth step, who owns that? Originally it’s individual, but finally, have at least to acknowledge the source.
Chris Brooks, Saskatchewan: Shaking my head a lot. Don’t think it’s the student’s data. My tool, my teaching, my interaction created that data. What students responsibilities are – do they have one to share data to help us build a better system? Or is it solely them as consumers of education systems, buying an education from us?
Abelardo: Don’t agree with students as consumers. They have some privacy rights, as human beings. Have to be extremely respectful. For sake of your education, will ask you to provide us with data. If they say no, I think you have to accommodate those requests.
Phil Winne, SFU: What happens in the context where the data co-evolve between student and teacher, so the teacher can’t take the next step without depending on the data, and they’re also intended for research purposes. E.g. step 6 in an ITS and step 7 requires the data to be present. The student can’t profit from the teaching without the data.
Abelardo: Student has the right to say no tracking. So default decision, one way would be, you might have a sophisticated ITS, add one more case that is what happens if no data. Accommodate their right to not be a part of it.
Al Essa, Desire2Learn: 1 A lot of the data in this environment is derived data. Not just clickstreams, it’s things we do with it. There’s a combinatorial explosion. 2, a lot is probabilistic in nature – e.g. if someone clicks on a particular page, what’s probability they’ll click on something else. 3 Probabilistic attributes about me are just as must facts about me as my height – I like to think I’m 6′ within a degree – this is as much data about me as other data. Controversial statement. These policy discussions make my head hurt. If we accept there’s a lot of derived data, let’s say the entity that owns that derived data are the ones that produced it – if we don’t then some of it is completely intractable and impractical. E.g. if a student says give me all the derived data, that’s just not practical.
Ilya Goldin, Pearson: The extension of what Al said, is the notion of student ownership, and the analogy to ‘it’s mine and I can take it back’ – can’t take it back, it’s not an object, it’s a digital artefact. No you can’t take it back. There’s kind of a blanking right that we endow to students – right to erase what sits in your database about me, possibly at just the first iteration. Just because something about me individually has been erased doesn’t mean you have to erase the average performance of the class on which I was enrolled, the duration, etc.
Abelardo: What I said was more like blanking. Already have to do to comply for lots of databases.
Ilya: There is in fact a setting in which we can take the data back – e.g. go to a doctor, get test results, I can say give me all that data. We’re not in that kind of setting.
Grace: We’ve talked about privacy and ethics. I’m going to talk about access. I’m going to say we’re here, we’re in institutions, the business of education, and we are using the promise of LA to make more money – to keep students, retain them, make more money. Caroline’s question – difference between teaching, research and other practice? We really want to make our product better, we want people to buy it again, we want access to data to do that. We have corporate intelligence, BI units, huge data warehouses – integrating all types of data. George talked about LA cutting across admissions, recruitment, delivery, progression – institutionally it has all that. Organisations are drawing all that. Organisations are doing all that. Separating the hype from the hope – what is it we want to know, and why? Comes back to the questions Phil asked – I started as a teacher. I want to know if how I’m teaching, is it helping students learn? As an administrator, do I want to spend money in e.g. building a new tool to get what return? That determines the access, and what happens. The privacy and ethics – it is the student’s data, we’re borrowing it and using it. If we have access to student data, and models say high risk not successful, what should we be doing to support them, alert them. Ethically as an institution, what should we be doing?
Eamonn Kelly (?): Profit question. The people generating the data may be coming from the students, the insight are being turned in to a tool making product for other people. Example of FoldIt, big pharma use that to create billions of dollars, what is the IRB question in terms of the profit you’re contributing to but not sharing. Particularly where not well off people are being mined, in the worst sense of the word.
Trish: Big question, we’re struggling with it, in IRB. At the moment large companies aren’t struggling with this. After Colorado movie shooting – profiling, why wouldn’t the federal government/educators know stuff that e.g. Amazon does? That’s changing. People are starting to expect services based on the data, and government/educators have a block on that.
George: The priorties being played out. John Campbell has talked about the obligation of knowing. Very challenging – the ethical dimensions. Privacy is a function of negotiating priorities. NSA is a terrific case study. If we can mine this stuff, we can keep a safer country, we can keep our allies safer. That is a valid argument. But comes across little point of individual rights and freedoms. Institution can say we can give you this data and we can do so much for you. Important to not talk about it as a broad concept, but contextually.
Stephen Aguilar, Michigan: Data is information. If we move from there, LA has been happening throughout history. Problem is a lack of fidelity, teacher doesn’t capture it all. If we go from there, we’re recording, reifying the data, we can only think about ownership because it’s out of a mind. If it stays in a learning environment, there is no ethical concern. The data is a co-construction, both L & T have obligation to make it better. In a position to do that very well now. For me, ethical concern is now that data can live outside the learning environment. What do we do with data sharing- data ownership, privacy are wrong questions within the environment. Once outside it, that’s when we need to ask those.
Grace: It’s about who’s actually seeing it. While it’s in the LMS right now, lots of people can see it. Traditionally, it’s within a room. Now online, the interaction much wider, broader, many more people look at it, extract it, manipulate it. Have to be very careful. Yes, we are trying to improve the learning. Have to be careful that no harm is done. Now have access to data we never had bef.re
John Whitmer, Cal State U: The data in a learning context brings ethical implications. Specifically, what if student doesn’t want people to know they’re Pell elligible, from a low socio-economic status or I don’t want my teacher to know my current GPA? Ethical obligation – what is it of not knowing – we have this data, looking at students who are e.g. low SES, non-represented minorities – we have it, do we have a responsibility. Found a big gap in achievement in an online course, thought there was a big issue. But other places are not looking at it.
Grace: Now we know we have an obligation to do something.
Trish: Question earlier, about probabilistic data. Most data is. If know there’s a likelihood you’ll act in a particular way, what’s the difference between a probabilistic outcome and a prediction. The fact that 80% like you by demographics act this way doesn’t mean you’ll act this way. In the US, rules about not acting on those predictions based on the fact they’re there. The producer owns it, the probabilistic data is yours, whole issue, the integrity of context. What is knowing vs acting.
Alyssa: Agree with Ilya’s comment that ownership is the wrong paradigm – e.g. I own it you don’t. It’s about what you do with it. Instead of worrying about privacy of the data, more important to think about the inferences. Do you have the right not to know? E.g. probability of disease, or probability of failing a course. The data in raw form might not be problematic, but I don’t want inferences about me as an individual. Issues about transparency. You can do a lot of stuff as long as it’s clear. Better than focus on the data as an object.
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.