Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks

Size: px

Start display at page:

Download "Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks"

Quentin Conley
5 years ago
Views:

1 Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks Josiane Xavier Parreira, Debora Donato, Carlos Castillo, Gerhard Weikum Max-Planck Institute for Informatics Yahoo! Research May 8, 27

2 Introduction Motivation P2P systems for storing and sharing information. Decentralized nature opens doors to malicious behaviors from peers. Previous Work JXP algorithm for computing decentralized PageRank-style authority scores in a P2P network [VLDB 6]. Assumes peers are always honest. Contribution Decentralized reputation system to be integrated into JXP. Allows computation of trusted authority scores.

3 JXP Algorithm [VLDB 6] A A F D C W B E Peer X E G W node: G C J E F E F A E B W W node: K E L G G A A F G C Peer Y D C W W node: B E E G Peer X G C J E F A F E K E F G F F A F E W node: K E Subgraph relevant to Peer X

4 TrustJXP Algorithm Idea Detect when peers report false scores at the meeting phase. Analyze peer s deviation from common features that constitute usual peer profile. Forms of attack addressed Peers report higher scores for a subset of their local pages. Peers permute the scores of its local pages.

5 Malicious Increase of Scores Why peers cheat High authority scores for local pages can bring benefits to a peer. Our approach Analyse the distribution of the scores reported by a peer. Use histograms to store and compare score distributions. Motivation: Web graph is self-similar local scores distribution should resemble global distribution after a few iterations.

6 Histograms Histograms Each peer stores a histogram H. Scores from other peers are inserted after each meeting. A novelty factor accounts for the dynamics of the scores. H (t+) = ( ρ)h t + ρd D is the score distribution of the other peer, and ρ is the novelty factor.

7 Histograms Comparing Histograms Hellinger Distance HD i,j = 2 [ k ( H i (k) D j (k)) 2 ] 2 k = total number of buckets H i (k) and D j (k) = number of elements at bucket k at the two distributions

8 Malicious Permutation of Scores Problem Peers can cheat and yet keep the original score distribution. Histogram comparison not effective in this case. Our approach Compare the rankings from both peers for the overlapping graph. Observation: Relative order of scores very close to the actual ordering, after few meetings.

9 Comparing Rankings Tolerant Kendall s Tau Distance K i,j = (a, b) : a < b score i (a) score i (b) τ i (a) < τ i (b) τ j (a) > τ j (b) score i (a), score i (b) = scores of pages a and b at peer i τ i, τ j = rankings of pages at peers i and j = tolerance threshold

10 TrustJXP Algorithm Computing Trust Scores Idea: Combine previous measures to assign trust scores to peers. Each peer assigns its own trust score to another peer, at each meeting step. How to combine the measures? We take a safer approach. θ i,j = min( HD i,j, K i,j) Trust score is integrated to the JXP computing, at the merging lists phase.

11 Integrating Trust Scores and JXP Scores Integrating Trust Scores and JXP Scores When merging lists, scores from both lists can be combined by either averaging or taking the max score. If page is not present on a list score =. Averaging the scores JXP: L (i) = (L A (i) + L B (i))/2 TrustJXP: L (i) = ( θ/2) L A (i) + θ/2 L B (i) Taking max score JXP: L (i) = max(l A (i), L B (i)) TrustJXP: L (i) = max(l A (i), θ L B (i))

12 Experimental Results Web collection Obtained using a focused crawler. Setup 34,45 pages,,95,4 links. categories. honest peers, peers/category. Malicious peers Perform JXP meetings and local PR computation like a normal peer. Lie when asked by another peer about the local scores, according to attacks previously described.

13 Experimental Results Evaluation Measures Global JXP ranking vs. Global PageRank ranking. Spearman s Footrule Distance at top-k. Linear error score at top-k. Cosine at full ranking. L norm of full JXP ranking (L norm of Global PR always ).

14 JXP Performance - No Malicious Peers Spearman s Footrule Distance.4 Peers No Malicious x Linear Error Score Peers No Malicious Cosine Peers No Malicious.8.6 L Norm Peers No Malicious

15 Impact of Malicious Peers (Peers report 2x the true score value for all local pages) Spearman s Footrule Distance Peers Malicious 5 Peers 5 Malicious Cosine.8 Peers Malicious 5 Peers 5 Malicious x Linear Score Error 4 Peers Malicious 5 Peers 5 Malicious L Norm Peers Malicious 5 Peers 5 Malicious 2 3 4

16 Averaging the Scores Spearman s Footrule Distance.4 Peers reporting 2x the scores Peers reporting 5x the scores.3 2 x Linear Score Error 4 Peers reporting 2x the scores Peers reporting 5x the scores Cosine.5.25 L Norm Peers reporting 2x the scores Peers reporting 5x the scores.9.85 Peers reporting 2x the scores Peers reporting 5x the scores

17 Trust Model Histograms Divergence Rank Divergence Histograms Divergence Rank Divergence (a) (b) (c) (d) Figure: Increased-scores attack: (a) and (b). Permuted-scores attack: (c) and (d). A green circle ( ) represents a meeting between two honest peers, and a red cross ( ) a meeting between an honest and a dishonest peers.

18 Trust Scores (Random Attacks) Histograms Divergence Rank Divergence Trust Scores Max. Detection False θ rate positives % 4.7% % 2.%.6 98.% 54.5%

19 Trust JXP Spearman s Footrule Distance Ideal Trust Model Our Trust Model No Trust Model Cosine Ideal Trust Model Our Trust Model No Trust Model x Linear Score Error Ideal Trust Model Our Trust Model No Trust Model L Norm Ideal Trust Model Our Trust Model No Trust Model * 5 Peers - 5 Malicious; Mixed malicious behavior

20 Conclusion TrustJXP algorithm for identifying and reducing the impact of cheating peers. Uses scores distribution and ranking analysis to detect malicious behavior. Experiments demonstrate viability of the method. Future Work Detect other types of malicious behaviors. Network dynamics.

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a