Performance 2021 - Politecnico di Milano, ItalyNovember 8, 2021
Big data have opened new opportunities to collect, store, process and, most of all, monetize data. This has created tension with privacy, especially when it comes to information about individuals. Researchers proposed approaches for anonymization to publish personal information – i.e., generalizing or removing data of the most sensitive fields. Privacy-preserving data processing allows companies, researchers and practitioners to share datasets carrying significant value while preserving at the same time the privacy of users. Several techniques were proposed over the years, with the most important ones being k-anonymity and differential privacy. In this tutorial, we will present the basic concepts of privacy-preserving data processing, introducing the main motivations, challenges and opportunities around anonymization techniques. We will focus on k-anonymity (and its variants) as the classical approach to anonymize tabular data, and we will provide an overview of differential privacy as a flexible technique for running privacy-preserving queries. During the tutorial, the students will put their hands on a real case-study and use Python code and state-of-the-art libraries to learn how to anonymize a dataset before publishing or querying it.
Martino Trevisan – PIMCity member. He received his PhD in 2019 from Politecnico di Torino, Italy. He is currently an Assistant Professor at the Department of Electronics and Telecommunications in the same university.
Luca Vassio – PIMCity member. He is an Assistant Professor at Politecnico di Torino and member of SmartData@Polito research center for Big Data technologies.