Machine Learning for automatic assessment of the risk related to web tracking
- Contributors: Marzia Maffei
- Year: 2020
- Venue: Master Thesis
This work aims at understanding today’s tracking ecosystem and using machine learning tools to automatically assess the risk connected to web trackers and assigning to websites a risk indicator score. The web is a highly dynamic ecosystem and each user browses dozens of websites everyday, encountering a large number of trackers. Trackers serve different purposes, and while some of them help to improve a user’s experience on a website, others can be more or less malicious, collecting different kinds and different amounts of data in order to build user profiles, and users are often unaware of their presence. Assigning a risk indicator to websites would make users better aware of the whole web ecosystem and would improve the user’s experience as a first step toward a better protection of their data.
The classification performs well enough and shows that machine learning algorithms can be considered for the detection of trackers in the web. The estimation of the tracking risk associated to a first party website represents a first step towards a more detailed labelling that should help users to be more aware of tracking practices and how much they are used on websites they wish to visit.
The results of this work, both from the classification part and from the risk indicator score assignment, also give a picture of the web itself and of its tracking ecosystem, showing how much trackers are present, even if they often are unnoticed by users in everyday activities.
- Repository link: https://webthesis.biblio.polito.it/15992/
- Download: PDF file