A Non-Compromising Approach to Privacy-Preserving Personalized Services

Could one even achieve both high privacy and high utility? Yes, and we explain how.
comments

By Ghazaleh Beigi and Huan Liu, Arizona State University

Web browsing history, which is a list of web pages (i.e., URLs) a user has visited in the past browsing sessions, is one of the most significant portions of

Figure 1. By looking at this browsing history, we can easily infer that the user's gender is female, she is living in San Francisco, and is married.

The privacy issues get even worse with the overturning of the Internet Privacy Rules by the Federal Communications Commission (FCC) in late March 2017 that allows Internet Service Providers (ISPs) to collect, share and sell their customers' Web browsing

Figure 2. It is challenging to solve the dilemma between privacy and utility.

A recent paper accepted to the 12th ACM international conference on Web Search and data Mining (WSDM’19)proposes a novel solution to the utility-privacy challenge, namely, PBooster. Achieving a solution that can both preserve privacy and retain high utility is challenging. The reason is twofold: first, it is not straightforward to quantify the privacy of users and the utility of their personalized services in the context of web browsing history data; and second, it is a tedious task to infer what the required number of links is and what links should be added to a user’s browsing history to boost privacy while retaining high utility. PBooster is a non-compromising and effective approach to achieve high user privacy and utility.

In order to quantify privacy and utility in the context of web browsing history data, two metrics have been proposed and used by PBooster. These two metrics utilize topic probability distribution of browsing history for a given user. Having introduced metrics for capturing user’s privacy and utility, the goal is now to find a set of new links to be added to the browsing history such that privacy is maximized, and utility loss is minimized. PBooster tackles this problem by dividing it into two subproblems:

1) topic selection and

2) link selection.

In the topic selection phase, PBooster searches for a subset of topics and calculates the number of links which should be added to each topic in order to maximize the privacy and minimize the utility loss for the manipulated history. Finally, in the link selection phase, PBooster exploits publicly available information in social media networks as an auxiliary source of information to select a proper set of links which corresponds to the identified topics and their numbers found in the previous step. These links will be then added to the user’s browsing history.

For more details about PBooster and how it works, interested readers can refer to the abovementioned WSDM’19 paper which is available at https://arxiv.org/abs/1811.09340 .

Bios: Ghazaleh Beigi is a PhD candidate of Computer Science and Engineering at Arizona State University. Her research interests include privacy protection, trust prediction, recommendation systems, sentiment analysis, and machine learning. Huan Liu is a professor of Computer Science and Engineering at Arizona State University.

Resources:

  • On-line and web-based: Analytics, data Mining, data Science, Machine Learning education
  • Software for Analytics, data Science, data Mining, and Machine Learning

Related:

  • A Right to Reasonable Inferences
  • Weak and Strong Bias in Machine Learning
  • Cartoon: GDPR first effect on Privacy