Personalized Social Recommendations Accurate or Private

Similar documents
MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

CMPUT651: Differential Privacy

The Optimal Mechanism in Differential Privacy

Interact with Strangers

The Optimal Mechanism in Differential Privacy

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

Differential Privacy and its Application in Aggregation

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Lecture 11- Differential Privacy

Effective Social Network Quarantine with Minimal Isolation Costs

1 Maximizing a Submodular Function

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Web Structure Mining Nodes, Links and Influence

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response

Finding central nodes in large networks

On Node-differentially Private Algorithms for Graph Statistics

Ateneo de Manila, Philippines

Kristina Lerman USC Information Sciences Institute

Power Laws & Rich Get Richer

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

A Few Thoughts on the Computational Perspective. James Caverlee Assistant Professor Computer Science and Engineering Texas A&M University

SOCIAL MEDIA IN THE COMMUNICATIONS CENTRE

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Collaborative Nowcasting for Contextual Recommendation

An Efficient reconciliation algorithm for social networks

Building Cognitive Applications

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Facebook Friends! and Matrix Functions

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Learning a Degree-Augmented Distance Metric from a Network. Bert Huang, U of Maryland Blake Shaw, Foursquare Tony Jebara, Columbia U

Supplementary Information Activity driven modeling of time varying networks

K-Nearest Neighbor Temporal Aggregate Queries

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

Efficient Respondents Selection for Biased Survey using Online Social Networks

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Three right directions and three wrong directions for tensor research

[Title removed for anonymity]

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

Generative Models for Discrete Data

Understanding Generalization Error: Bounds and Decompositions

Modeling Controversy Within Populations

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Bayesian Contextual Multi-armed Bandits

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha

5th March Unconditional Security of Quantum Key Distribution With Practical Devices. Hermen Jan Hupkes

Math 115 First Midterm October 11, 2011

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies

Publishing Search Logs A Comparative Study of Privacy Guarantees

Greedy Maximization Framework for Graph-based Influence Functions

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Saturation of Information Exchange in Locally Connected Pulse-Coupled Oscillators

Circle-based Recommendation in Online Social Networks

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

Dynamic Poisson Factorization

Filter Bubbles in Opinion Dynamics

On the Complexity of the Minimum Independent Set Partition Problem

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

8.1 Concentration inequality for Gaussian random matrix (cont d)

11 : Gaussian Graphic Models and Ising Models

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

8 Basics of Hypothesis Testing

DS504/CS586: Big Data Analytics Graph Mining II

Densest subgraph computation and applications in finding events on social media

Latent voter model on random regular graphs

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Online Social Networks and Media. Link Analysis and Web Search

Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy

RaRE: Social Rank Regulated Large-scale Network Embedding

Data Obfuscation. Bimal Kumar Roy. December 17, 2015

Generating Private Synthetic Data: Presentation 2

Algorithm-Independent Learning Issues

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DIFFERENCE EQUATIONS

SOCIAL networks provide a virtual stage for users to

1 Complex Networks - A Brief Overview

Collaborative Filtering

1 Hoeffding s Inequality

The Growth of Functions. A Practical Introduction with as Little Theory as possible

An Overview of Homomorphic Encryption

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Network Science & Telecommunications

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Recommendation Systems

Spatial Extension of the Reality Mining Dataset

Sampling. Everything Data CompSci Spring 2014

Competition Between Networks: A Study in the Market for Yellow Pages Mark Rysman

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test.

Collaborative Filtering. Radek Pelánek

Using Geospatial Methods with Other Health and Environmental Data to Identify Populations

From Social User Activities to People Affiliation

2.6 Logarithmic Functions. Inverse Functions. Question: What is the relationship between f(x) = x 2 and g(x) = x?

Lecture 13: Spectral Graph Theory

CORRECTNESS OF A GOSSIP BASED MEMBERSHIP PROTOCOL BY (ANDRÉ ALLAVENA, ALAN DEMERS, JOHN E. HOPCROFT ) PRATIK TIMALSENA UNIVERSITY OF OSLO

DS504/CS586: Big Data Analytics Graph Mining II

Degree (k)

Transcription:

Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

Personalized Recommendations Advertisement Products People content

Personalized Recommendations Advertisement Products People content

Personalized Recommendations Advertisement Products People content

Personalized Recommendations Advertisement Products People content

Recommendation Algorithms

Traditional Based on generic recommendation & history. Other users that bought this book also bought

Social aware Based on active friends. Your friend already bought this book!

G V, E Facebook's Open Graph API & Google s Social Graph API

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

Motivation Increase user s degree of engagement Avoid privacy breach

G V, E Can a recommendation algorithm be accurate while not breaching one s privacy?

Tradeoff Sensitive information usage vs. better recommendation

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E The Model The network graph V entities: people, products. E connections

G V, E The Utility function u Gr, i The utility of recommending node i to node r # of common neighbors # of weighted paths Page Rank

G V, E The Utility by # of common neighbors 0 1 2 1

G V, E The Social Recommendation Algorithm (R) R is a probability vector on all nodes: p Gr, i R The probability of R recommending node i to node r

G V, E The Social Recommendation Algorithm (R) R : p, p, p, p G, Blue Guy G, Blue Guy G, Blue Guy G, Blue Guy Dog Book Camera MacBook Shoe R1 : 0.5, 0.2, 0.2, 0. 1 R 2 : 1 1 1 1,,, 4 4 4 4 0 1 2 1

G V, E Why do we need R to be probabilistic?

G V, E The Social Recommendation Algorithm (R) Example: R best p G, blue guy Macbook R best 0 p G, blue guy camera R best 0 p G, blue guy red shoes R best 0 p G, blue guy Dog book R best 1

G V, E OK, so why not recommend randomly?

G V, E 0 1 2 1 R 1 20.5 10.22 00.1 1.4 R2 1 1 1 1 2 1 1 0 1 4 4 4 4

G V, E Maximizing the expected utility Max i G, r G, r i i u p R

G V, E Simplifying the notation G and r are constant: u p Gr, i Gr, i ui p i

G V, E R s Accuracy R is (1-δ) accurate if for any r: i i max i u p R u 1

G V, E Example R max 1 2 i u i u p i R 1 1 1 1 1 2 1 1 0 2 0.5 1 0.2 2 0 0 :.1 : 4 0.7 4 R 4 4 0.5 2 2 0 1 2 1

G V, E Privacy Definition Differential Privacy an algorithm preserves privacy of an entity if the algorithm's output is not sensitive to the presence or absence of the entity's information in the input data set.

G V, E Differential Privacy Definition 1: recommendation algorithm R satisfies Ɛ-differential privacy if for any pair of graphs G and G that differ in one edge (i.e., G = G + {e} or vice versa) and every set of possible recommendations S: Pr R G S e Pr R G' S

G V, E Differential Privacy Pr RGS e Pr RG ' S R : p, p, p, p Dog Book Camera MacBook shoe R R 1 2 : : 0.5, 0.2, 0.2, 0.1 1 1 1 1,,, 4 4 4 4 1 2 0.56 0 R R 1 2 : : 0.286, 0.286, 0.286, 0.143 1 1 1 1,,, 4 4 4 4 0 G 1 0 G' 1 2 2 1 1

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Problem Statement Given u, determine a recommendation algorithm that : (a) satisfies the Ɛ- differential privacy constraints. (b) maximizes the accuracy of recommendations

G V, E Generic Privacy Lower Bounds We focus on: Theoretically determine the bounds on maximum accuracy (1-δ) achievable by any algorithm that satisfies Ɛ- differential privacy.

G V, E Exchangeability Let G be a graph & let h be an isomorphism on the nodes giving graph, s.t.: for target node r, h(r) = r. Then: i u G h Gr, i G, r hi : u h

G V, E 0 Exchangeability 1 1 2

G V, E Monotonicity R is monotonic if: i, j ui u j pi p j

G V, E 0 Monotonicity 1 2 1 R1 : 20.5 10.2 200.1 1.4 2 1 1 1 1 R : 2 1 1 0 1 4 4 4 4

G V, E General Lower Bounds A monotonic recommendation algorithm that: achieves a constant accuracy (1-δ) is based on a utility function that satisfies exchangeability Has a lower bound on it s Ɛ parameter differential privacy Pr R G S e Pr R G ' S :

G V, E General Lower Bounds Split V into two groups by utility: c 0,1 V : u (1 c) u V : u (1 c) u high iv high max low i V low max k nodes n k nodes 0 1 2 1 n nodes

G V, E General Lower Bounds How much probability should we give each group to achieve (1-δ) accuracy? high low p u p 1 c u u p 1 u max max max i high low p p c 1 1 i i accuracy p high c low, p c c 0 high p total probability of V high 1 low p total probability of V low 2 V : u (1 c) u 1 V : u (1 c) u high iv high max low i V low max

G V, E General Lower Bounds How much probability should we give each group to achieve (1-δ) accuracy? example: 1c 0.2 1 0.9 p p high low c 0.8 0.1 0.875 c 0.8 0.1 0.125 c 0.8 0 high p total probability of V high 1 low p total probability of V low 2 V : u (1 c) u 1 V : u (1 c) u high iv high max low i V low max

G V, E General Lower Bounds t the number of edges that needs to be added to turn a node with the smallest probability of being recommended from the low utility group Vlow into the node of maximum utility in the modified graph. G' 0 1 t 3 2 1

x G 0 G' x 0 1 1 2 2 1 1 G V high G V low G ' V high G' V low 1 k nodes n k nodes k 1 nodes n k nodes p high c low, p c c G x V : p low G x c n x k ck1 p G' c

G V, E General Lower Bounds From differential privacy: G1 p e p x p e p G x p e p G x G x 2 1 G' G 2 x x e p x 2 G e p x 3 G p p G ' x G x e t 0 G' 1 2 1

General Lower Bounds Let s put it all together!, G V E G' t x G x p e p G p x c n k ' 1 G x c p c k 1 t c n k e k 1 ln ln 1 c n k t k Lower Bound!

G V, E General Lower Bounds Important result: cn k 1 1 t n k k 1 e upper bound on accuracy!

G V, E General Lower Bounds Example: A social network with 400 million nodes: n Lets assume that for c 0.99, we have k 100 and consider t 150 (which is about the average degree in some social networks). Suppose we want to guarantee 0.1-differential privacy, then we compute the bound on the accuracy : 8 0.99 410 100 1 1 0.46 8 0.1150 410 100 100 1 e 410 This suggests that for a differential privacy guarantee of 0.1, no algorithm can guarantee an accuracy better than 0.46. 8

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Privacy preserving algorithms An algorithm that satisfies differential privacy must recommend every node, even the ones that have zero utility, with a non-zero probability.

G V, E Privacy preserving algorithms Let s add some noise! Exponential smoothing mechanism Laplace noise addition mechanism

G V, E Exponential smoothing mechanism Creates a smooth probability distribution from the utility vector and samples from it. Given the utilities vector: u,..., 1 un, algorithm AE recommends node i with probability p i n e j1 u f e i u f j u i 0 p i n j1 1 e u f j f is the sensitivity of the utility function (the maximum utility difference caused by adding / removing one edge).

G V, E Laplace noise addition mechanism Unlike the Exponential mechanism, the Laplace mechanism more closely mimics the optimal mechanism Rbest. Given nodes with utilities u,..., 1 un algorithm AL first computes a modified utility vector u' 1,..., u' n as follows: u u r ' i i where r is a random variable chosen from the Laplace distribution independently at random for each i. Then, recommends u A max ' i L

G V, E Laplace noise addition mechanism Laplace Distribution: 1 f x e 2b x b Laplace Density function b f, 0

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Experiments Our Goal: Comparing the theoretical upper bound on accuracy to the de-facto accuracy.

G V, E Experiments Settings: real network graphs: Wikipedia vote network G WV Twitter connections network G T utility functions: Common neighbors Number of paths privacy tools: Exponential smoothing mechanism Laplace noise addition mechanism

G V, E Wikipedia vote network Some Wikipedia users are administrators, who have access to additional technical features. Users are elected to be administrators via a public vote of other users and administrators. G WV : V users E votes 7,115 100, 762 nodes edges

G V, E Twitter connections network Users follow other users. G T : V users E follows 96, 403 nodes 489, 986 edges

G V, E Experiments STEPS: Select target node r uniformly at random. Compute utility according to both functions. Fix Ɛ of differential privacy. Compute expected accuracy: Exponential smoothing : directly. Laplace noise addition : average utility of 1000 independent trials Compute theoretical upper bound using: 1 c n k 1 1 t n k k e

G V, E Results Experiments show that the Laplace mechanism achieves nearly identical accuracy as the Exponential mechanism. But, Exponential mechanism s accuracy can be computed more efficiently, so we will compare our theoretical bound on accuracy only to it s actual accuracy.

G V, E Results # of common neighbors: Wikipedia vote network Twitter connections network X-axis is the accuracy (1-δ). y-axis is the % of nodes receiving recommendations with accuracy < (1-δ)

G V, E Results # of weighted paths: Wikipedia vote network Twitter connections network X-axis is the accuracy (1-δ). y-axis is the % of nodes receiving recommendations with accuracy < (1-δ)

G V, E Results The low degree nodes are also the most vulnerable to receiving low accuracy recommendations X-axis is the target s node degree. y-axis is the accuracy (1-δ)