Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Similar documents
Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search

1 Searching the World Wide Web

Link Analysis Ranking

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Data Mining and Matrices

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Link Analysis. Leonid E. Zhukov

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Introduction to Data Mining

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

The Second Eigenvalue of the Google Matrix

Lecture 12: Link Analysis for Web Retrieval

Inf 2B: Ranking Queries on the WWW

Slides based on those in:

Node Centrality and Ranking on Networks

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

STA141C: Big Data & High Performance Statistical Computing

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

DS504/CS586: Big Data Analytics Graph Mining II

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

1998: enter Link Analysis

Web Structure Mining Nodes, Links and Influence

Finding central nodes in large networks

Link Analysis. Stony Brook University CSE545, Fall 2016

DS504/CS586: Big Data Analytics Graph Mining II

0.1 Naive formulation of PageRank

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Analysis and Computation of Google s PageRank

CS6220: DATA MINING TECHNIQUES

Data Mining Recitation Notes Week 3

How does Google rank webpages?

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Google Page Rank Project Linear Algebra Summer 2012

Manipulability of PageRank under Sybil Strategies

A Note on Google s PageRank

Authoritative Sources in a Hyperlinked Enviroment

Link Mining PageRank. From Stanford C246

ECEN 689 Special Topics in Data Science for Communications Networks

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

Computing PageRank using Power Extrapolation

CS6220: DATA MINING TECHNIQUES

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

The Dynamic Absorbing Model for the Web

CS47300: Web Information Search and Management

Data and Algorithms of the Web

IR: Information Retrieval

Models and Algorithms for Complex Networks. Link Analysis Ranking

Calculating Web Page Authority Using the PageRank Algorithm

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

Combating Web Spam with TrustRank

Node and Link Analysis

Jeffrey D. Ullman Stanford University

Uncertainty and Randomization

Jeffrey D. Ullman Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Links between Kleinberg s hubs and authorities, correspondence analysis, and Markov chains

Some relationships between Kleinberg s hubs and authorities, correspondence analysis, and the Salsa algorithm

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Lecture 7 Mathematics behind Internet Search

CS249: ADVANCED DATA MINING

Analysis of Google s PageRank

Applications to network analysis: Eigenvector centrality indices Lecture notes

Cross-lingual and temporal Wikipedia analysis

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Some relationships between Kleinberg s hubs and authorities, correspondence analysis, and the Salsa algorithm

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

Finding Authorities and Hubs From Link Structures on the World Wide Web

Algebraic Representation of Networks

The Static Absorbing Model for the Web a

Data science with multilayer networks: Mathematical foundations and applications

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

c 2005 Society for Industrial and Applied Mathematics

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Perturbation of the hyper-linked environment

PROBABILISTIC LATENT SEMANTIC ANALYSIS

arxiv:cond-mat/ v1 3 Sep 2004

Complex Social System, Elections. Introduction to Network Analysis 1

Graph Models The PageRank Algorithm

Machine Learning CPSC 340. Tutorial 12

Updating PageRank. Amy Langville Carl Meyer

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Faloutsos, Tong ICDE, 2009

googling it: how google ranks search results Courtney R. Gibbons October 17, 2017

Influence Diffusion Dynamics and Influence Maximization in Social Networks with Friend and Foe Relationships

Transcription:

Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a group of people, or an organization Caltech http://yury.name Caltech CMI Seminar March 4, 2008 1 / 32 2 / 32 Outline 1 Intro 2 Reputations in Hyperlink Graphs HITS PageRank SALSA 3 Trust Reputations ebay EigenTrust 1 Introduction to Reputations 4 Personal Reputations VKontakte 3 / 32 4 / 32

Applications Aspects Search Trust and recommendations Motivating openness & contribution Keeping users engaged Spam protection Loyalty programs Input information Benefits of reputation Centralized/decentralized Spam protection mechanisms Online systems: Slashdot, epinions, Amazon, ebay, Yahoo! Answers, Digg, Wikipedia, World of Warcraft, BizRate. Russian systems: Habr, VKontakte, Photosight 5 / 32 6 / 32 Main Ideas Challenges Random walk model Rights, limits and thresholds Real name, photo, contact and profile information Spam protection Fast computing General theory, taxonomy of existing systems Reputation exchange market What s inside the real systems? 7 / 32 8 / 32

Challenge 2 Reputations in Hyperlink Graphs How to define the most relevant webpage to Bill Gates? Naive ideas By frequency of query words in a webpage By number of links from other relevant pages 9 / 32 10 / 32 Web Search: Formal Settings HITS Algorithm Every webpage is represented as a weighted set of keywords There are hyperlinks (directed edges) between webpages 1 Given a query construct a focused subgraph F(q) of the web 2 Compute hubs and authorities ranks for all vertices in F(q) Conceptual problem: define a relevance rank based on keyword weights and link structure of the web Focused subgraph: pages with highest weights of query words and pages hyperlinked with them 11 / 32 12 / 32

Hubs and Authorities Hubs and Authorities: Equations Mutual reinforcing relationship: A good hub is a webpage with many links to query-authoritative pages A good authority is a webpage with many links from query-related hubs a(p) h(q) q:(q,p) E h(p) a(q) q:(p,q) E 13 / 32 14 / 32 Hubs and Authorities: Solution Initial estimate: Convergence Theorem Iteration: p : a 0 (p) = 1, h 0 (p) = 1 a k+1 (p) = h k (q) q:(q,p) E h k+1 (p) = a k (q) q:(p,q) E Theorem Let M be the adjacency matrix of focused subgraph F(query). Then ā k converges to principal eigenvector of M T M and h k converges to principal eigenvector of MM T We normalize ā k, h k after every step 15 / 32 16 / 32

Lessons from HITS PageRank: Problem Statement Link structure is useful for relevance sorting Link popularity is defined by linear equations Solution can be computed by iterative algorithm Compute quality of every page Idea: base quality on the number of referring pages and their own quality Other factors: Frequency of updates Number of visitors Registration in affiliated directory 17 / 32 18 / 32 Random Walk Model Network: Nodes Directed edges (hyperlinks) Model of random surfer Start in a random node Use a random outgoing edge with probability 1 ϵ Move to a random node with probability ϵ Limit probabilities For every k the value PR k (i) is defined as probability to be in the node i after k steps Fact: lim k PR k (i) = PR(i), i.e. all probabilities converge to some limit ones 19 / 32 20 / 32

PageRank Equation Let T 1,..., T n be the nodes referring to i Let C(X) denote the out-degree of X Claim: PR(i) = ϵ/n + (1 ϵ) n PR(T i ) i=1 C(T i ) Proof? By definition of PR k (i): PR 0 (i) = 1/N PR k (i) = ϵ/n + (1 ϵ) n PR k 1 (T i ) i=1 C(T i ) Then just take the limits of both sides Practical solution: to use PR 50 (i) computed via iterative formula instead of PR(i) 21 / 32 PageRank as an Eigenvector Let us define a matrix L: l ij := ϵ/n, if there is no edge from i to j l ij := ϵ/n + (1 ϵ), if there is an edge 1 C(j) Notation: PR k = (PR k (1),..., PR k (N)) PR = (PR(1),..., PR(N)) We have: PR k = L k PR 0 PR = L PR 22 / 32 SALSA Construct query-specific directed graph F(q) Transform F(q) into undirected bipartite undirected graph W Define its column weighted and row weighted versions W c, W r Consider hub-authority random walk: a (k+1) = W T c W ra (k) Define authorities as the limit value of a (k) vector 23 / 32 3 Trust Reputations 24 / 32

ebay Buyers and sellers Bidirectional feedback evaluation after every transaction ebay Feedback: +/-, four criteria-specific ratings, text comment Total score: sum of +/- Feedback points 1, 6, 12, months and lifetime versions EigenTrust Local trust c i j 0 is based on personal experience Normalization n j=1 c ij = 1 Experience matrix C Trust equation t (k) i t (k) i = (C T ) n c i = n c j=1 ij t (k 1) j Trust vector t is the principle eigenvector of C: t = lim t (k) i 25 / 32 26 / 32 EigenTrust: Pre-Trusted Nodes Starting vector. Let P is the set of pre-trusted nodes. Use t (0) = 1/ P Local trust. Assume ϵ local trust from any node to any pre-trusted node 4 Personal Reputations 27 / 32 28 / 32

VKontakte What is VKontakte.ru? Russian Facebook-style website Name means in touch in Russian 8.5M users (February 2008) Working on English language version VKontakte Rating 1 First 100 points: real name and photo, profile completeness 2 Then: paid points (via SMS) gifted by your supporters 3 Any person has 1 free reference link, initially pointing to a person who invited him to VKontakte. Bonus points (acquired by rules 2 and 3) are propagating with 1/4 factor by reference links. Rating benefits: Basis for sorting: friends lists, group members, event attendees Bias for random six friends selection 29 / 32 30 / 32 References J. Kleinberg Authoritative sources in a hyperlinked environment L. Page, S. Brin, R. Motwani, T. Winograd The Pagerank citation ranking: Bringing order to the web R. Lempel, S. Moran The stochastic approach for link-structure analysis (SALSA) and the TKC effect D. Houser, J. Wooders Reputation in Auctions: Theory, and Evidence from ebay S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina The Eigentrust algorithm for reputation management in P2P networks VKontakte Team http://vkontakte.ru/rate.php?act=help (in Russian) http://yury.name Ongoing project: http://businessconsumer.net Thanks for your attention! Questions? Second part (March 11, 4pm): Spam protection for reputations Open problems 31 / 32 32 / 32