User profiling by network observers

Contributors: Roberto Gonzalez; Claudio Soriente; Juan Miguel Carrascosa; Alberto Garcia-Duran; Costas Iordanou; Mathias Niepert
Year: 2021
Venue:
Abstract:
Targeted online advertising is a multi-billion dollar business based on the ability of profiling and delivering targeted ads to a wide range of users. Due to the privacy erosion associated with such business, researchers are trying to understand how profiling works and anti-tracking applications are becoming popular among users. Both research and privacy-enhancing apps, however, target ad-networks or over-the-top providers that have unrestricted access to users’ online activity. There seems to be little interest in potential profiling activities by “network observers” like ISPs or VPN providers. On the one side, this may be explained by the pervasiveness of TLS that secures connections end-to-end. On the other side, TLS does leak some information, and it is not clear what an eavesdropper can learn about a user, despite her traffic being encrypted.

In this paper, we show that a network observer can build accurate user profiles notwithstanding the limited visibility due to TLS. In particular, we introduce a technique based on representation learning algorithms that can build profiles by only using the hostnames of URLs requested by users. To evaluate the accuracy of the profiles built with our technique, we setup an experiment where we serve personalized ads to more than one thousand real users over a period of one month. We compare the click-through rate of ads served by our system with the one of ads served by ad-networks. We empirically show that the quality of profiles that a network observer could build is comparable to the quality of profiles available to ad-networks and over-the-top providers. This is particularly worrisome since current anti-tracking mechanisms cannot counter profiling activities by network observers, whereas effective mechanisms like TOR incur in a performance and usability penalty.
Repository link: https://dl.acm.org/doi/10.1145/3485983.3494859
Download: PDF file

This project has received funding from the European Union Horizon 2020 Research and Innovation programme under the ICT theme: ICT-13-2018-2019 - Supporting the emergence of data markets and the data economy.
Grant Agreement No. 871370

Project

Start Date: 01/12/2019
End Date: 30/08/2022
Cost: € 6.208.953,75
UE Funding: € 5.240.830,13
Project Identifier: H2020-871370
Estimated Effort: 723 PM

Project Coordinator

Marco Mellia
Politecnico di Torino - Department of Electronics and Telecommunications (DET)
Email: marco.mellia@polito.it