Data Privacy: Share your data and hide your secrets
: 23.05.2022

Data Privacy: Share your data and hide your secrets
: 23.05.2022

Data Privacy: Share your data and hide your secrets
: 23.05.2022
: 23.05.2022
The Department of Computer Science came out in full force at this year’s DEDS Winter School in Athens, 4-8 April 2022. Professor Katja Hose, Assistant Professor Daniele Dell'Aglio, and PhD student Antheas Kapenekakis from the DKW research group (Data, Knowledge, and Web Engineering), together with Professor (MSO) Kristian Torp, and Phd students Abduvoris Abduvakhobov, Rodrigo Sasse David, and Christos-Charalampos Papadopoulos from the DESS research group (Data Engineering, Science and Systems) attended the school, whose overall topic this year was Ethical and Legal Aspects of Data.
The composition of the group actually reflects very well the interdisciplinary nature of DEDS ("Data Engineering for Data Science"), which is a European joint doctorate designed to develop education, research, and innovation at the intersection of Data Science and Data Engineering.
All PhD students presented their projects at the so-called “student talks”, according to Daniele Dell'Aglio a very good opportunity for them to experience in practice “the life of a researcher”.
Daniele himself headed a seminar with the title “Share your data and hide your secrets! A brief introduction to data privacy”.
At the seminar, Daniele used some of the known cases of privacy breaches as a backdrop to a discussion of the challenges of ensuring data privacy and the possible technical solutions addressing these challenges.
- We have to conclude that the techniques that we have traditionally been using for the last 20 years like anonymization and deidentification do not really work. For example, it has proven much too easy to identify sensitive data about a person by joining anonymized datasets including sensitive information with “harmless” and open datasets including individual identifiers like name, address, etc., he says.
One example is the Netflix Prize running during 2006-2009 as an open competition for the best collaborative filtering algorithm. Netflix published a training dataset of 100,480,507 ratings that 480,189 users gave to 17,770 movies. The datasets were anonymised, but two researchers from The University of Texas at Austin, Arvind Narayanan and Vitaly Shmatikov, were able to identify individual users by matching the datasets with film ratings on the Internet Movie Database.
Four Netflix users filed a lawsuit against Netflix, alleging that Netflix had violated U.S. fair trade laws and the Video Privacy Protection Act by releasing the datasets. A settlement was reached and Netflix announced that the company would not pursue new Prize competitions as a consequence of the lawsuit and Federal Trade Commission privacy concerns.
- The Netflix case and other cases show that we need to develop better techniques to ensure data privacy. If we do not, there will be a public resistance to sharing data for research purposes, and this will be detrimental to the quality of data-driven research in any domain, says Daniele.
A possible solution is differential privacy, which Daniele delved into at the seminar. Differential privacy is a technology originally developed by cryptographers that enables researchers and database analysts to obtain useful information from databases containing personal information about individuals without revealing their identity.
- As a very simple definition, you can say that differential privacy protects the individual by injecting “noise” or a distraction into the datasets. You aim for a sweet spot where you inject enough noise to ensure the privacy of the individual, but at the same time you maintain the possibility of discovering useful insights from the data. Or to put it in another way, what you gain is that any person will be able to deny being part of the data, Daniele explains.
Differential privacy is implemented by a number of big tech companies: Apple uses it to collect anonymous usage insights from devices like iPhones, iPads and Mac, Amazon uses it to access user’s personalized shopping preferences while covering information regarding their past purchases, and Google uses it when collecting browsing patterns in Chrome.
- From a purely theoretical point of view, differential privacy gives strong guarantees on its mathematical properties, but we still need to discover how it works in reality when deployed, and how it affects our current data science and analytics processes. Is the overall impact positive or negative, is the question we need to find an answer to, says Daniele.
Daniele Dell’Aglio
Assistant Professor
Department of Computer Science,
Aalborg University
Mail: dade@cs.aau.dk
Phone: +45 9940 7830
Stig Andersen
Communications Officer
Department of Computer Science,
Aalborg University
Mail: stan@cs.aau.dk
Phone: +45 4019 7682