Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

Personalized Recommendations Advertisement Products People content

Recommendation Algorithms

Traditional Based on generic recommendation & history. Other users that bought this book also bought

Social aware Based on active friends. Your friend already bought this book!

G V, E Facebook's Open Graph API & Google s Social Graph API

Sensitive information Friends Location Professional info Hobbies Sexual Orientation Relationship Status Contact info Date of Birth Traveling info

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

Motivation Increase user s degree of engagement Avoid privacy breach

G V, E Can a recommendation algorithm be accurate while not breaching one s privacy?

Tradeoff Sensitive information usage vs. better recommendation

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E The Model The network graph V entities: people, products. E connections

G V, E The Utility function u Gr, i The utility of recommending node i to node r # of common neighbors # of weighted paths Page Rank

G V, E The Utility by # of common neighbors 0 1 2 1

G V, E The Social Recommendation Algorithm (R) R is a probability vector on all nodes: p Gr, i R The probability of R recommending node i to node r

G V, E The Social Recommendation Algorithm (R) R : p, p, p, p G, Blue Guy G, Blue Guy G, Blue Guy G, Blue Guy Dog Book Camera MacBook Shoe R1 : 0.5, 0.2, 0.2, 0. 1 R 2 : 1 1 1 1,,, 4 4 4 4 0 1 2 1

G V, E Why do we need R to be probabilistic?

G V, E The Social Recommendation Algorithm (R) Example: R best p G, blue guy Macbook R best 0 p G, blue guy camera R best 0 p G, blue guy red shoes R best 0 p G, blue guy Dog book R best 1

G V, E OK, so why not recommend randomly?

G V, E 0 1 2 1 R 1 20.5 10.22 00.1 1.4 R2 1 1 1 1 2 1 1 0 1 4 4 4 4

G V, E Maximizing the expected utility Max i G, r G, r i i u p R

G V, E Simplifying the notation G and r are constant: u p Gr, i Gr, i ui p i

G V, E R s Accuracy R is (1-δ) accurate if for any r: i i max i u p R u 1

G V, E Example R max 1 2 i u i u p i R 1 1 1 1 1 2 1 1 0 2 0.5 1 0.2 2 0 0 :.1 : 4 0.7 4 R 4 4 0.5 2 2 0 1 2 1

G V, E Privacy Definition Differential Privacy an algorithm preserves privacy of an entity if the algorithm's output is not sensitive to the presence or absence of the entity's information in the input data set.

G V, E Differential Privacy Definition 1: recommendation algorithm R satisfies Ɛ-differential privacy if for any pair of graphs G and G that differ in one edge (i.e., G = G + {e} or vice versa) and every set of possible recommendations S: Pr R G S e Pr R G' S

G V, E Differential Privacy Pr RGS e Pr RG ' S R : p, p, p, p Dog Book Camera MacBook shoe R R 1 2 : : 0.5, 0.2, 0.2, 0.1 1 1 1 1,,, 4 4 4 4 1 2 0.56 0 R R 1 2 : : 0.286, 0.286, 0.286, 0.143 1 1 1 1,,, 4 4 4 4 0 G 1 0 G' 1 2 2 1 1

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Problem Statement Given u, determine a recommendation algorithm that : (a) satisfies the Ɛ- differential privacy constraints. (b) maximizes the accuracy of recommendations

G V, E Generic Privacy Lower Bounds We focus on: Theoretically determine the bounds on maximum accuracy (1-δ) achievable by any algorithm that satisfies Ɛ- differential privacy.

G V, E Exchangeability Let G be a graph & let h be an isomorphism on the nodes giving graph, s.t.: for target node r, h(r) = r. Then: i u G h Gr, i G, r hi : u h

G V, E 0 Exchangeability 1 1 2

G V, E Monotonicity R is monotonic if: i, j ui u j pi p j

G V, E 0 Monotonicity 1 2 1 R1 : 20.5 10.2 200.1 1.4 2 1 1 1 1 R : 2 1 1 0 1 4 4 4 4

G V, E General Lower Bounds A monotonic recommendation algorithm that: achieves a constant accuracy (1-δ) is based on a utility function that satisfies exchangeability Has a lower bound on it s Ɛ parameter differential privacy Pr R G S e Pr R G ' S :

G V, E General Lower Bounds Split V into two groups by utility: c 0,1 V : u (1 c) u V : u (1 c) u high iv high max low i V low max k nodes n k nodes 0 1 2 1 n nodes

G V, E General Lower Bounds How much probability should we give each group to achieve (1-δ) accuracy? high low p u p 1 c u u p 1 u max max max i high low p p c 1 1 i i accuracy p high c low, p c c 0 high p total probability of V high 1 low p total probability of V low 2 V : u (1 c) u 1 V : u (1 c) u high iv high max low i V low max

G V, E General Lower Bounds How much probability should we give each group to achieve (1-δ) accuracy? example: 1c 0.2 1 0.9 p p high low c 0.8 0.1 0.875 c 0.8 0.1 0.125 c 0.8 0 high p total probability of V high 1 low p total probability of V low 2 V : u (1 c) u 1 V : u (1 c) u high iv high max low i V low max

G V, E General Lower Bounds t the number of edges that needs to be added to turn a node with the smallest probability of being recommended from the low utility group Vlow into the node of maximum utility in the modified graph. G' 0 1 t 3 2 1

x G 0 G' x 0 1 1 2 2 1 1 G V high G V low G ' V high G' V low 1 k nodes n k nodes k 1 nodes n k nodes p high c low, p c c G x V : p low G x c n x k ck1 p G' c

G V, E General Lower Bounds From differential privacy: G1 p e p x p e p G x p e p G x G x 2 1 G' G 2 x x e p x 2 G e p x 3 G p p G ' x G x e t 0 G' 1 2 1

General Lower Bounds Let s put it all together!, G V E G' t x G x p e p G p x c n k ' 1 G x c p c k 1 t c n k e k 1 ln ln 1 c n k t k Lower Bound!

G V, E General Lower Bounds Important result: cn k 1 1 t n k k 1 e upper bound on accuracy!

G V, E General Lower Bounds Example: A social network with 400 million nodes: n Lets assume that for c 0.99, we have k 100 and consider t 150 (which is about the average degree in some social networks). Suppose we want to guarantee 0.1-differential privacy, then we compute the bound on the accuracy : 8 0.99 410 100 1 1 0.46 8 0.1150 410 100 100 1 e 410 This suggests that for a differential privacy guarantee of 0.1, no algorithm can guarantee an accuracy better than 0.46. 8

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Privacy preserving algorithms An algorithm that satisfies differential privacy must recommend every node, even the ones that have zero utility, with a non-zero probability.

G V, E Privacy preserving algorithms Let s add some noise! Exponential smoothing mechanism Laplace noise addition mechanism

G V, E Exponential smoothing mechanism Creates a smooth probability distribution from the utility vector and samples from it. Given the utilities vector: u,..., 1 un, algorithm AE recommends node i with probability p i n e j1 u f e i u f j u i 0 p i n j1 1 e u f j f is the sensitivity of the utility function (the maximum utility difference caused by adding / removing one edge).

G V, E Laplace noise addition mechanism Unlike the Exponential mechanism, the Laplace mechanism more closely mimics the optimal mechanism Rbest. Given nodes with utilities u,..., 1 un algorithm AL first computes a modified utility vector u' 1,..., u' n as follows: u u r ' i i where r is a random variable chosen from the Laplace distribution independently at random for each i. Then, recommends u A max ' i L

G V, E Laplace noise addition mechanism Laplace Distribution: 1 f x e 2b x b Laplace Density function b f, 0

Outline Introduction Motivation The model General Lower Bounds Privacy preserving algorithms Experiments

G V, E Experiments Our Goal: Comparing the theoretical upper bound on accuracy to the de-facto accuracy.

G V, E Experiments Settings: real network graphs: Wikipedia vote network G WV Twitter connections network G T utility functions: Common neighbors Number of paths privacy tools: Exponential smoothing mechanism Laplace noise addition mechanism

G V, E Wikipedia vote network Some Wikipedia users are administrators, who have access to additional technical features. Users are elected to be administrators via a public vote of other users and administrators. G WV : V users E votes 7,115 100, 762 nodes edges

G V, E Twitter connections network Users follow other users. G T : V users E follows 96, 403 nodes 489, 986 edges

G V, E Experiments STEPS: Select target node r uniformly at random. Compute utility according to both functions. Fix Ɛ of differential privacy. Compute expected accuracy: Exponential smoothing : directly. Laplace noise addition : average utility of 1000 independent trials Compute theoretical upper bound using: 1 c n k 1 1 t n k k e

G V, E Results Experiments show that the Laplace mechanism achieves nearly identical accuracy as the Exponential mechanism. But, Exponential mechanism s accuracy can be computed more efficiently, so we will compare our theoretical bound on accuracy only to it s actual accuracy.

G V, E Results # of common neighbors: Wikipedia vote network Twitter connections network X-axis is the accuracy (1-δ). y-axis is the % of nodes receiving recommendations with accuracy < (1-δ)

G V, E Results # of weighted paths: Wikipedia vote network Twitter connections network X-axis is the accuracy (1-δ). y-axis is the % of nodes receiving recommendations with accuracy < (1-δ)

G V, E Results The low degree nodes are also the most vulnerable to receiving low accuracy recommendations X-axis is the target s node degree. y-axis is the accuracy (1-δ)