Authoritative Sources in a Hyperlinked Enviroment

Similar documents
Data Mining and Matrices

1 Searching the World Wide Web

Link Analysis Ranking

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

A Note on Google s PageRank

Online Social Networks and Media. Link Analysis and Web Search

Link Analysis. Leonid E. Zhukov

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Faloutsos, Tong ICDE, 2009

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

How does Google rank webpages?

Applications to network analysis: Eigenvector centrality indices Lecture notes

Detailed Proof of The PerronFrobenius Theorem

Spectra of Adjacency and Laplacian Matrices

Online Social Networks and Media. Link Analysis and Web Search

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

1 Matrix notation and preliminaries from spectral graph theory

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

STA141C: Big Data & High Performance Statistical Computing

Chapter 3 Transformations

Lecture 15 Perron-Frobenius Theory

Fiedler s Theorems on Nodal Domains

Google Page Rank Project Linear Algebra Summer 2012

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

(a) II and III (b) I (c) I and III (d) I and II and III (e) None are true.

Complex Social System, Elections. Introduction to Network Analysis 1

ECEN 689 Special Topics in Data Science for Communications Networks

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Linear Algebra: Characteristic Value Problem

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Linear algebra and applications to graphs Part 1

Eigenvalues of Exponentiated Adjacency Matrices

Conceptual Questions for Review

Node and Link Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Web Structure Mining Nodes, Links and Influence

Lecture 7 Mathematics behind Internet Search

PROBABILISTIC LATENT SEMANTIC ANALYSIS

1 Inner Product and Orthogonality

googling it: how google ranks search results Courtney R. Gibbons October 17, 2017

IR: Information Retrieval

Spectral Analysis of k-balanced Signed Graphs

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

Parallel Singular Value Decomposition. Jiaxing Tan

Node Centrality and Ranking on Networks

Math 1553, Introduction to Linear Algebra

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

1 Last time: least-squares problems

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

Link Analysis and Web Search

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Spectral Graph Theory

Notes on Linear Algebra and Matrix Theory

Calculating Web Page Authority Using the PageRank Algorithm

1. The Polar Decomposition

Latent Semantic Analysis. Hongning Wang

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Principal Component Analysis

On the mathematical background of Google PageRank algorithm

Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM

Multimedia Databases - 68A6 Final Term - exercises

7. Symmetric Matrices and Quadratic Forms

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Chapter 7: Symmetric Matrices and Quadratic Forms

MA 265 FINAL EXAM Fall 2012

Properties of Matrices and Operations on Matrices

MATH 829: Introduction to Data Mining and Analysis Clustering II

Riemann Hypotheses. Alex J. Best 4/2/2014. WMS Talks

Introduction. Chapter One

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Bounds on the Ratio of Eigenvalues

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

On the convergence of weighted-average consensus

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Link Mining PageRank. From Stanford C246

MATH36001 Perron Frobenius Theory 2015

0.1 Naive formulation of PageRank

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

1.3 Convergence of Regular Markov Chains

Nonnegative Matrices I

Math 4153 Exam 3 Review. The syllabus for Exam 3 is Chapter 6 (pages ), Chapter 7 through page 137, and Chapter 8 through page 182 in Axler.

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Validated computation tool for the Perron-Frobenius eigenvalue using graph theory

Transcription:

Authoritative Sources in a Hyperlinked Enviroment Dimitris Sakavalas June 15, 2010

Outline Introduction Problem description Authorities and Broad-topic queries Algorithm outline Construction of a Focused Subgraph of the www Expanding R into the base set S Heuristics impoving base set S Computing Hubs and Authorities Iterative Algorithm Elements of Linear Algebra Convergence of the Iterative Algorithm Similar page queries Multiple Sets of Hubs and Authorities Non-Principal Eigenvectors and Clustering of G Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 2/25

Introduction Problem description www-search Discover pages relevant to a given query string Discover the most authoritative pages (subjective, requires human evaluation) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 3/25

Introduction Problem description www-search Discover pages relevant to a given query string Discover the most authoritative pages (subjective, requires human evaluation) Queries Specic queries Scarcity Problem Broad-topic queries Abundance Problem Similar-page queries Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 3/25

Introduction Authorities and Broad-topic queries In search of a denition of authority Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 4/25

Introduction Authorities and Broad-topic queries In search of a denition of authority No endogenous measure of a page to assess authority (ex. Text-based ranking functions for queries: \Harvard", \search engines", \automobile manufacturers", images) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 4/25

Introduction Authorities and Broad-topic queries In search of a denition of authority No endogenous measure of a page to assess authority (ex. Text-based ranking functions for queries: \Harvard", \search engines", \automobile manufacturers", images) Solution: Analysis of the link structure Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 4/25

Introduction Authorities and Broad-topic queries In search of a denition of authority No endogenous measure of a page to assess authority (ex. Text-based ranking functions for queries: \Harvard", \search engines", \automobile manufacturers", images) Solution: Analysis of the link structure Pitfalls: Advertising links Navigational links Universal Popularity links Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 4/25

Introduction Authorities and Broad-topic queries In search of a denition of authority No endogenous measure of a page to assess authority (ex. Text-based ranking functions for queries: \Harvard", \search engines", \automobile manufacturers", images) Solution: Analysis of the link structure Pitfalls: Advertising links Navigational links Universal Popularity links Question.How reliable is page p? Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 4/25

Introduction Algorithm outline Hubs Pages that link to many Authorities Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 5/25

Introduction Algorithm outline Hubs Pages that link to many Authorities Outline of the algorithm 1 Create a focused subgraph of the www from the output of a text-based engine (Small collection of pages likely to contain the most authoritative pages) 2 Identify hubs and authorities Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 5/25

Construction of a Focused Subgraph of the www Directed graph G = (V; E) of the www. V the collection of hyperlinked pages of the www V : (nodes correspond to the pages) (p; q) E There is a link from p to q G[W]. Subgraph of G induced on W V Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 6/25

Construction of a Focused Subgraph of the www Directed graph G = (V; E) of the www. V the collection of hyperlinked pages of the www V : (nodes correspond to the pages) (p; q) E There is a link from p to q G[W]. Subgraph of G induced on W V Goal Construct G = G[S ] such that 1 S is relatively small (computational cost) 2 S rich in relevant pages (search quality) 3 S contains most of the strongest authorities Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 6/25

Construction of a Focused Subgraph of the www Directed graph G = (V; E) of the www. V the collection of hyperlinked pages of the www V : (nodes correspond to the pages) (p; q) E There is a link from p to q G[W]. Subgraph of G induced on W V Goal Construct G = G[S ] such that 1 S is relatively small (computational cost) 2 S rich in relevant pages (search quality) 3 S contains most of the strongest authorities Observation S rich in relevant pages many links to authorities Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 6/25

Construction of a Focused Subgraph of the www Notation Γ + (p): the set of all pages p points to Γ (p): the set of all pages pointing to p E: a text-based search engine Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 7/25

Construction of a Focused Subgraph of the www Notation Γ + (p): the set of all pages p points to Γ (p): the set of all pages pointing to p E: a text-based search engine 1st Step Root set R ={t highest ranked pages for the query from E} R satises (i); (ii). Generally doesn't satisfy (iii) and is structureless (few links) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 7/25

Construction of a Focused Subgraph of the www Notation Γ + (p): the set of all pages p points to Γ (p): the set of all pages pointing to p E: a text-based search engine 1st Step Root set R ={t highest ranked pages for the query from E} R satises (i); (ii). Generally doesn't satisfy (iii) and is structureless (few links) Observation If a = R, strong authority, there is a high probability that p R such that a Γ + (p) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 7/25

Expanding R into the base set S 2nd Step Idea:Expand R into the Base set S by adding strong authorities and relevant pages Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 8/25

Expanding R into the base set S 2nd Step Idea:Expand R into the Base set S by adding strong authorities and relevant pages Subgraph(; E ; t ; d) t; d N Set S := R For each page p R Add all pages in Γ + (p) to S If Γ (p) d then, Add all pages in Γ (p) to S Else Add an arbitary set of d pages from Γ (p) to S End. Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 8/25

Expanding R into the base set S Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 9/25

Heuristics impoving S Links Transverse: link between pages with dierent domain name Intrinsic: link between pages with the same domain name (navigational) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 10/25

Heuristics impoving S Links Transverse: link between pages with dierent domain name Intrinsic: link between pages with the same domain name (navigational) 3rd Step:Heuristics 1 Delete all Intrinsic links from the graph G 2 Allow up to a small number m pages from a single domain to point to any given page p Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 10/25

Heuristics impoving S Links Transverse: link between pages with dierent domain name Intrinsic: link between pages with the same domain name (navigational) 3rd Step:Heuristics 1 Delete all Intrinsic links from the graph G 2 Allow up to a small number m pages from a single domain to point to any given page p Finally we obtain a small subgraph G, relatively focused on the query, containing many relevant pages and strong authorities. Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 10/25

Hubs and Authorities Purely In-degree ordering of Authorities Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 11/25

Hubs and Authorities Purely In-degree ordering of Authorities Problem. \Universally popular" pages, large in-degree regardless of the underlying query topic (ex.amazon Books) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 11/25

Hubs and Authorities Purely In-degree ordering of Authorities Problem. \Universally popular" pages, large in-degree regardless of the underlying query topic (ex.amazon Books) Observation Authorities: high in-degree, considerable overlap in the sets of pages that point to them(hubs) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 11/25

Hubs and Authorities Observation (mutually reinforcing relationship) Good Hub: a page that points to many good authorities Good Authority: a page that is pointed by many good hubs Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 12/25

Hubs and Authorities Observation (mutually reinforcing relationship) Good Hub: a page that points to many good authorities Good Authority: a page that is pointed by many good hubs Denitions p S = {1; 2; :::; n} assign: authority weight x <p> 0 hub weight y <p> 0 Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 12/25

Hubs and Authorities Observation (mutually reinforcing relationship) Good Hub: a page that points to many good authorities Good Authority: a page that is pointed by many good hubs Denitions p S = {1; 2; :::; n} assign: authority weight x <p> 0 hub weight y <p> 0 corresponding vectors x = x <1> x <2>. x <n> and y = y <1> y <2>. y <n> Normalized such that x = y = 1 example.for norm 1 ; n i=1 x <i> = n i=1 y <i> = 1 Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 12/25

Hubs and Authorities Denitions I(x ; y) (x -weight update): x <p> ; x <p> q:(q;p) E y <q> Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 13/25

Hubs and Authorities Denitions I(x ; y) (x -weight update): x <p> ; x <p> q:(q;p) E O(x ; y) (y-weight update): y <p> ; y <p> q:(p;q) E y <q> x <q> ; Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 13/25

Hubs and Authorities Denitions I(x ; y) (x -weight update): x <p> ; x <p> q:(q;p) E O(x ; y) (y-weight update): y <p> ; y <p> q:(p;q) E y <q> x <q> ; Idea Find the disired \equilibrium" values for the weights by applying I; O operations iterativelly. Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 13/25

Hubs and Authorities Denitions I(x ; y) (x -weight update): x <p> ; x <p> q:(q;p) E O(x ; y) (y-weight update): y <p> ; y <p> q:(p;q) E y <q> x <q> ; Idea Find the disired \equilibrium" values for the weights by applying I; O operations iterativelly. Question. Does this \equilibrium" exist? Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 13/25

Hubs and Authorities Denitions I(x ; y) (x -weight update): x <p> ; x <p> q:(q;p) E O(x ; y) (y-weight update): y <p> ; y <p> q:(p;q) E y <q> x <q> ; Idea Find the disired \equilibrium" values for the weights by applying I; O operations iterativelly. Question. Does this \equilibrium" exist? YES (linear algebra) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 13/25

Iterative algorithm Iterate(G ; k) z := (1; 1; :::; 1) R n Set x 0 := z Set y 0 := z For i = 1; 2; :::; k Apply the I(x i 1 ; y i 1 ) operation to obtain new x weights x i Apply the O(x i ; y i 1) operation to obtain new y weights y i Normalize x i, obtaining x i Normalize y i, obtaining y i End Return (x k ; y k ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 14/25

Iterative algorithm Iterate(G ; k) z := (1; 1; :::; 1) R n Set x 0 := z Set y 0 := z For i = 1; 2; :::; k Apply the I(x i 1 ; y i 1 ) operation to obtain new x weights x i Apply the O(x i ; y i 1) operation to obtain new y weights y i Normalize x i, obtaining x i Normalize y i, obtaining y i End Return (x k ; y k ) Filter(G ; k ; c) (x k ; y k ) :=Iterate(G ; k) Report the pages with the c largest coordinates in x k as authorities Report the pages with the c largest coordinates in y k as hubs Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 14/25

Elements of Linear Algebra Notation Matrix M C n n Eigenvalues i (M ) Eigenvectors! i (M ) eigenspace V subspace of C n multiplicity of : m = dim(v ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 15/25

Elements of Linear Algebra Notation Matrix M C n n Eigenvalues i (M ) Eigenvectors! i (M ) eigenspace V subspace of C n multiplicity of : m = dim(v ) Denitions Positive matrix M : If M i;j > 0; i ; j Primitive matrix M : If m N; M m : positive Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 15/25

Elements of Linear Algebra Let M R n n symmetric, nonnegative matrix (M i;j 0; i ; j ) Theorems 1 M has at most n distinct eigenvalues and = n m i Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 16/25

Elements of Linear Algebra Let M R n n symmetric, nonnegative matrix (M i;j 0; i ; j ) Theorems 1 M has at most n distinct eigenvalues and m i = n 2 M has only real eigenvalues i Denote them 1 (M ) 2 (M ) ::: n (M ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 16/25

Elements of Linear Algebra Let M R n n symmetric, nonnegative matrix (M i;j 0; i ; j ) Theorems 1 M has at most n distinct eigenvalues and m i = n 2 M has only real eigenvalues i Denote them 1 (M ) 2 (M ) ::: n (M ) 3 (Perron-Frobenius) If M is primitive then: i. Largest eigenvalue 1 (M ) > 0 and m = 1 ii. 1 > i i 1 iii. 1 has a corresponding eigenvector! 1 (M ) with all entries positive (the principal eigenvector) Consider! 1 (M ) normalized,! 1 (M ) = 1 Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 16/25

Elements of Linear Algebra Let M R n n symmetric, nonnegative matrix (M i;j 0; i ; j ) Theorems 1 M has at most n distinct eigenvalues and m i = n 2 M has only real eigenvalues i Denote them 1 (M ) 2 (M ) ::: n (M ) 3 (Perron-Frobenius) If M is primitive then: i. Largest eigenvalue 1 (M ) > 0 and m = 1 ii. 1 > i i 1 iii. 1 has a corresponding eigenvector! 1 (M ) with all entries positive (the principal eigenvector) Consider! 1 (M ) normalized,! 1 (M ) = 1 4 If M is primitive and vector v not orthogonal to the principal eigenvector! 1 (M ), let v k be the unit vector in the direction of M k k v. Then v k! 1 (M ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 16/25

Elements of Linear Algebra Observation Note that we can consider! 1 (M ) as a vector of the orthonormal base of V 1. Moreover dimv 1 = m 1 = 1 and all entries in! 1 (M ) are positive. Hence the principal eigenvector! 1 (M ) is unique. Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 17/25

Connection of Iterative algorithm to linear Algebra Let A be the adjacency matrix of G Observation Operations I(x ; y) and O(x ; y) can be written as x A y and y Ax respectivelly Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 18/25

Connection of Iterative algorithm to linear Algebra Let A be the adjacency matrix of G Observation Operations I(x ; y) and O(x ; y) can be written as x A y and y Ax respectivelly Iterate Algorithm The algorithm initializes x 0 = y 0 = z =. 1 step of thalgorithm we have the following steps: I(x i ; y i 1) x i = A y i 1 O(x i ; y i ) y i = Ax i 1. In the and i th Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 18/25

Connection of Iterative algorithm to linear Algebra Let A be the adjacency matrix of G Observation Operations I(x ; y) and O(x ; y) can be written as x A y and y Ax respectivelly Iterate Algorithm The algorithm initializes x 0 = y 0 = z =. 1 step of thalgorithm we have the following steps: I(x i ; y i 1) x i = A y i 1 1. In the and i th O(x i ; y i ) y i = Ax i x i = A Ax i 1 = (A A) i x 0 y i = AA y i 1 = (AA ) i y 0 } x i = (A A) i z y i = (AA ) i z ( ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 18/25

Convergence of the Iterative Algorithm Observation We consider x i ; y i normalized in each step as in Iterate algorithm. So in every step x i = y i = 1 For simplicity, matrices AA and A A are considered primitive Obviously z c 0; c R n 0 Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 19/25

Convergence of the Iterative Algorithm Observation We consider x i ; y i normalized in each step as in Iterate algorithm. So in every step x i = y i = 1 For simplicity, matrices AA and A A are considered primitive Obviously z c 0; c R n 0 Theorem The sequences {x i } and {y i } produced by Iterate Algorithm converge to limits x ; y respectively Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 19/25

Convergence of the Iterative Algorithm Observation We consider x i ; y i normalized in each step as in Iterate algorithm. So in every step x i = y i = 1 For simplicity, matrices AA and A A are considered primitive Obviously z c 0; c R n 0 Theorem The sequences {x i } and {y i } produced by Iterate Algorithm converge to limits x ; y respectively Proof. A A primitive, z not orthogonal to the principal eigenvector! 1 (A A) = x, x i the unit vector in the direction (A A) i z Hence from Theorem 4 {x i } x, simirarly {y i } y =! 1 (AA ) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 19/25

Conclusion Conclusion The process of iterated I=O operations will converge. The desired equilibrium values for the x ; y weights, are the values x ; y. Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 20/25

Conclusion Conclusion The process of iterated I=O operations will converge. The desired equilibrium values for the x ; y weights, are the values x ; y. Observation In practice the convergence of Iterate is quite rapid, so one can compute x ; y weights by starting from any initial vectors x 0 ; y 0, and performing a xed number of I; O operations Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 20/25

Conclusion Conclusion The process of iterated I=O operations will converge. The desired equilibrium values for the x ; y weights, are the values x ; y. Observation In practice the convergence of Iterate is quite rapid, so one can compute x ; y weights by starting from any initial vectors x 0 ; y 0, and performing a xed number of I; O operations Hyperlink-Induced Topic Search (HITS) algorithm (Ask.com) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 20/25

Similar-Page Queries Query. Find pages similar to p Idea Initiate search with the page p instead of a query string Use the link structure to infer \similarity among pages" Similar pages to p Strongest authorities in local region of p Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 21/25

Similar-Page Queries Query. Find pages similar to p Idea Initiate search with the page p instead of a query string Use the link structure to infer \similarity among pages" Similar pages to p Strongest authorities in local region of p Adaptation of the broad-topic query method 1 Find t pages pointing to p Assemble root set R p 2 Expand R p to base set S p as before 3 Search for authorities and hubs in the focused subgraph G p Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 21/25

Multiple Sets of Hubs and Authorities Situation. For a query string, relevant pages grouped into clusters. Reasons Dierent meanings of. ex. \jaguar\ String arises as term in multiple technical communities. ex. \randomized algorithms" String refers to a highly polarized issue. ex.\abortion" Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 22/25

Multiple Sets of Hubs and Authorities Situation. For a query string, relevant pages grouped into clusters. Reasons Dierent meanings of. ex. \jaguar\ String arises as term in multiple technical communities. ex. \randomized algorithms" String refers to a highly polarized issue. ex.\abortion" Problem. Identify the clusters Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 22/25

Multiple Sets of Hubs and Authorities Situation. For a query string, relevant pages grouped into clusters. Reasons Dierent meanings of. ex. \jaguar\ String arises as term in multiple technical communities. ex. \randomized algorithms" String refers to a highly polarized issue. ex.\abortion" Problem. Identify the clusters Idea Clusters represented in focused subgraph G as densely linked subgraphs(collection of hubs and authorities) Extract densely linked collections of hubs and authorities through non-principal eigenvectors of AA and A A Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 22/25

Non-Principal Eigenvectors and Clustering of G Proposition AA and A A have the same multiset of eigenvalues, and their eigenvectors can be chosen so that! i (AA ) = A! i (A A) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 23/25

Non-Principal Eigenvectors and Clustering of G Proposition AA and A A have the same multiset of eigenvalues, and their eigenvectors can be chosen so that! i (AA ) = A! i (A A) Observation For each pair of eigenvectors x i =! i (A A); y i =! i (AA ) operation I(x i ; y i ) keeps x weights parallel to x i : x = A! i (AA ) = A A! i (A A) = i x i Simirarly operation O(x i ; y i ) keeps x weights parallel to y i Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 23/25

Non-Principal Eigenvectors and Clustering of G Proposition AA and A A have the same multiset of eigenvalues, and their eigenvectors can be chosen so that! i (AA ) = A! i (A A) Observation For each pair of eigenvectors x i =! i (A A); y i =! i (AA ) operation I(x i ; y i ) keeps x weights parallel to x i : x = A! i (AA ) = A A! i (A A) = i x i Simirarly operation O(x i ; y i ) keeps x weights parallel to y i Non-principal eigenvectors have both positive and negative entries. Hence each pair of (x i ; y i ) provide us with two densely connected set of hubs and authorities Pages that correspond to the c coordinates with the most positive values Pages that correspond to the c coordinates with the most negative values Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 23/25

Non-Principal Eigenvectors and Clustering of G Description of the method For each of the st few non-principal eigenvectors (x i ; y i ) nd the two densely connected set of hubs and authorities through an algorithm similar(but less clean conceptually) to the Iterate algorithm Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 24/25

Non-Principal Eigenvectors and Clustering of G Description of the method For each of the st few non-principal eigenvectors (x i ; y i ) nd the two densely connected set of hubs and authorities through an algorithm similar(but less clean conceptually) to the Iterate algorithm The pages with large coordinates in the rst few nonprincipal eigenvectors tend to recur, so that essentially the same collection of hubs and authorities will often be generated by several of the strongest nonprincipal eigenvectors Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 24/25

Similar Concepts in Other Areas Social networks \Standing" Bibliometrics \Impact" Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 25/25

Similar Concepts in Other Areas Social networks \Standing" Bibliometrics \Impact" Economics (Wassily Leontief 1941-Nobel prize 1973) Dimitris Sakavalas, Authoritative Sources in a Hyperlinked Enviroment 25/25