Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Contributors: Benjamin Zi Hao Zhao, Mohamed Ali Kaafar, Nicolas Kourtellis
Year: 2020
Venue:
Abstract:
Data holders are increasingly seeking to protect their user’s privacy, whilst still maximizing their ability to produce machine learning (ML) models with high quality predictions. In this work, we empirically evaluate various implementations of differential privacy (DP), and measure their ability to fend off real-world privacy attacks, in addition to measuring their core goal of providing accurate classifications. We establish an evaluation framework to ensure each of these implementations are fairly evaluated. Our selection of DP implementations add DP noise at different positions within the framework, either at the point of data collection/release, during updates while training of the model, or after training by perturbing learned model parameters. We evaluate each implementation across a range of privacy budgets and datasets, each implementation providing the same mathematical privacy guarantees. By measuring the models’ resistance to real world attacks of membership and attribute inference, and their classification accuracy. we determine which implementations provide the most desirable tradeoff between privacy and utility. We found that the number of classes of a given dataset is unlikely to influence where the privacy and utility tradeoff occurs, a counter-intuitive inference in contrast to the known relationship of increased privacy vulnerability in datasets with more classes. Additionally, in the scenario that high privacy constraints are required, perturbing input training data before applying ML modeling does not trade off as much utility, as compared to noise added later in the ML process.
Repository link: https://arxiv.org/pdf/2008.08807.pdf
Download: PDF file

This project has received funding from the European Union Horizon 2020 Research and Innovation programme under the ICT theme: ICT-13-2018-2019 - Supporting the emergence of data markets and the data economy.
Grant Agreement No. 871370

Project

Start Date: 01/12/2019
End Date: 30/08/2022
Cost: € 6.208.953,75
UE Funding: € 5.240.830,13
Project Identifier: H2020-871370
Estimated Effort: 723 PM

Project Coordinator

Marco Mellia
Politecnico di Torino - Department of Electronics and Telecommunications (DET)
Email: marco.mellia@polito.it