Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

Similar documents
Combinatorial Hodge Theory and a Geometric Approach to Ranking

Graph Helmholtzian and Rank Learning

Combinatorial Laplacian and Rank Aggregation

Rank Aggregation via Hodge Theory

Lecture III & IV: The Mathemaics of Data

Statistical ranking and combinatorial Hodge theory

Statistical ranking and combinatorial Hodge theory

HodgeRank on Random Graphs

HODGERANK: APPLYING COMBINATORIAL HODGE THEORY TO SPORTS RANKING ROBERT K SIZEMORE. A Thesis Submitted to the Graduate Faculty of

Flows and Decompositions of Games: Harmonic and Potential Games

A case study in Interaction cohomology Oliver Knill, 3/18/2016

Matrix estimation by Universal Singular Value Thresholding

Hodge Laplacians on Graphs

Collaborative Filtering. Radek Pelánek

Jeffrey D. Ullman Stanford University

Graph Detection and Estimation Theory

arxiv: v3 [cs.it] 15 Nov 2015

Spectral Processing. Misha Kazhdan

Seminar on Vector Field Analysis on Surfaces

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Math 302 Outcome Statements Winter 2013

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

A New Space for Comparing Graphs

Elementary linear algebra

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Homology of Random Simplicial Complexes Based on Steiner Systems

STA141C: Big Data & High Performance Statistical Computing

Lecture 9: Low Rank Approximation

Algebraic Methods in Combinatorics

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

Lecture 13: Spectral Graph Theory

Theorems of Erdős-Ko-Rado type in polar spaces

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

New Model Stability Criteria for Mixed Finite Elements

5 Quiver Representations

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Recommendation Systems

Consistency Under Sampling of Exponential Random Graph Models

Algebraic Representation of Networks

Recommendation Systems

STA141C: Big Data & High Performance Statistical Computing

Nonlinear Dimensionality Reduction

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

The large deviation principle for the Erdős-Rényi random graph

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Eigenvalue Problems Computation and Applications

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

1 Searching the World Wide Web

LECTURE NOTE #11 PROF. ALAN YUILLE

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Euler Characteristic of Two-Dimensional Manifolds

Lecture: Some Practical Considerations (3 of 4)

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

CS 468 (Spring 2013) Discrete Differential Geometry

Differential Geometry: Discrete Exterior Calculus

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

Lecture 2: Linear Algebra Review

Diffusion and random walks on graphs

This section is an introduction to the basic themes of the course.

Handout to Wu Characteristic Harvard math table talk Oliver Knill, 3/8/2016

Homologies of digraphs and Künneth formulas

Lecture 4: Harmonic forms

AM205: Assignment 2. i=1

Graph Partitioning Using Random Walks

Spectra of Adjacency and Laplacian Matrices

Algebraic Methods in Combinatorics

Borda count approximation of Kemeny s rule and pairwise voting inconsistencies

Notes 6 : First and second moment methods

Practical Linear Algebra: A Geometry Toolbox

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Numerical Optimization

. = V c = V [x]v (5.1) c 1. c k

Data Preprocessing Tasks

Spectral Continuity Properties of Graph Laplacians

Collaborative Filtering

Properties of Matrices and Operations on Matrices

Computational Methods. Eigenvalues and Singular Values

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

EE Technion, Spring then. can be isometrically embedded into can be realized as a Gram matrix of rank, Properties:

STA141C: Big Data & High Performance Statistical Computing

A Generalization for Stable Mixed Finite Elements

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Since G is a compact Lie group, we can apply Schur orthogonality to see that G χ π (g) 2 dg =

Lecture 8: Linear Algebra Background

Foundations of Computer Vision

Hungry, Hungry Homology

From point cloud data to the continuum model of geometry

Main matrix factorizations

8.1 Concentration inequality for Gaussian random matrix (cont d)

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Deligne s Mixed Hodge Structure for Projective Varieties with only Normal Crossing Singularities

A posteriori error estimates in FEEC for the de Rham complex

Transcription:

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices Lek-Heng Lim and Yuan Yao 2008 Workshop on Algorithms for Modern Massive Data Sets June 28, 2008 (Contains joint work with Qi-xing Huang and Xiaoye Jiang) L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 1 / 32

Hodge decomposition) Vector calculus: Helmholtz s decomposition, ie. vector fields on nice domains may be resolved into irrotational (curl-free) and solenoidal (divergence-free) component vector fields F = ϕ + A ϕ scalar potential, A vector potential. Linear algebra: additive orthogonal decomposition of a skew-symmetric matrix into three skew-symmetric matrices W = W 1 + W 2 + W 3 W 1 = ve T ev T, W 2 clique-consistent, W 3 inconsistent. Graph theory: orthogonal decomposition of network flows into acyclic and cyclic components. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 2 / 32

Ranking on networks (graphs) Multicriteria rank/decision systems Amazon or Netflix s recommendation system (user-product) Interest ranking in social networks (person-interest) S&P index (time-price) Voting (voter-candidate) Peer review systems publication citation systems (paper-paper) Google s webpage ranking (web-web) ebay s reputation system (customer-customer) L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 3 / 32

Characteristics Aforementioned ranking data often incomplete: typically about 1% (cf. earlier talks by Agarwal, Banerjee) imbalanced: power-law, heavy-tail distributed votes (cf. earlier talks by Fatlousos, Mahoney, Mihail) cardinal: given in terms of scores or stochastic choices Implicitly or explicitly, ranking data may be viewed to live on a simple graph G = (V, E), where V : set of alternatives (products, interests, etc) to be ranked E: pairs of alternatives to be compared L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 4 / 32

Example: Netflix customer-product rating Example (Netflix Customer-Product Rating) However, 480189-by-17770 customer-product rating matrix X X is incomplete: 98.82% of values missing pairwise comparison graph G = (V, E) is very dense! only 0.22% edges are missed, almost a complete graph rank aggregation may be carried out without estimating missing values imbalanced: number of raters on e E varies Caveat: we are not trying to solve the Netflix prize problem L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 5 / 32

Netflix example continued The first order statistics, mean score for each product, is often inadequate because of the following: most customers would rate just a very small portion of the products different products might have different raters, whence mean scores involve noise due to arbitrary individual rating scales customers give ratings instead or orderings How about high order statistics? L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 6 / 32

From 1st order to 2nd order: pairwise rankings Linear Model: average score difference between product i and j over all customers who have rated both of them, k w ij = (X kj X ki ) #{k : X ki, X kj exist}. Invariant up to translation. Log-linear Model: when all the scores are positive, the logarithmic average score ratio, k w ij = (log X kj log X ki ) #{k : X ki, X kj exist}. Invariant up to a multiplicative constant. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 7 / 32

More invariants Linear Probability Model: the probability that product j is preferred to i in excess of a purely random choice, w ij = Pr{k : X kj > X ki } 1 2. Invariant up to monotone transformation. Bradley-Terry Model: logarithmic odd ratio (logit) w ij = log Pr{k : X kj > X ki } Pr{k : X kj < X ki }. Invariant up to monotone transformation. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 8 / 32

Skew-symmetric matrices of pairwise rankings Recall skew-symmetric matrices: W R n n, W T = W : every A R n n decomposable into A = S + W, S = (A + A T )/2 symmetric, W = (A A T )/2 skew-symmetric W = {skew-symmetric matrices} = 2 (R) = o n (R) All previous models induce (sparse) skew-symmetric matrices of size V -by- V { w ji if {i, j} E w ij =? otherwise where G = (V, E) is a pairwise comparison graph. Note: such a skew-symmetric matrix induces a pairwise ranking flow on graph G. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 9 / 32

Pairwise ranking graph for IMDb top 20 movies Figure: Pairwise ranking flow of Netflix data restricted to top IMDb movies L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 10 / 32

Rank aggregation problem Difficulties: Arrow s impossibility theorem Kemeny-Snell optimal ordering is NP-hard to compute Harmonic analysis on S n is impractical for large n since S n = n! Our approach: Problem Does there exist a global ranking function, v : V R, such that w ij = v j v i =: δ 0 (v)(i, j)? Equivalently, does there exists a scalar field v : V R whose gradient field gives the flow w? ie. is w integrable? L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 11 / 32

Answer: not always! Multivariate calculus: there are non-integrable vector fields; cf. the film A Beautiful Mind: Similarly here, A = {F : R 3 \ X R 3 F smooth}, dim(a/b) =? B = {F = g}, B B 1 C 1 1 2 1 1 A D A 1 2 1 1 C F 1 E Figure: No global ranking v gives w ij = v j v i : (a) triangular cyclic, note w AB + w BC + w CA 0; (b) it contains a 4-node cyclic flow A C D E A, note on a 3-clique {A, B, C} (also {A, E, F }), w AB + w BC + w CA = 0 L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 12 / 32

Triangular transitivity Fact W = [w ij ] skew symmetric associated with graph G = (V, E). If w ij = v j v i for all {i, j} E, then w ij + w jk + w ki = 0 for all 3-cliques {i, j, k}. Transitivity subspace: {W skew symmetric w ij + w jk + w ki = 0 for all 3-cliques} Example in the last slide, (a) lies outside; (b) lies in this subspace, but not a gradient flow. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 13 / 32

Hodge theory: matrix theoretic A skew-symmetric matrix W associated with G can be decomposed uniquely W = W 1 + W 2 + W 3 where W 1 satisfies integrable : W1 (i, j) = v j v i for some v : V R. W 2 satisfies curl free : W2 (i, j) + W 2 (j, k) + W 2 (k, i) = 0 for all (i, j, k) 3-clique; divergence free : j:(i,j) E W 2(i, j) = 0 W 3 W 1 and W 3 W 2. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 14 / 32

Hodge theory: graph theoretic Orthogonal decomposition of network flows on G into gradient flow + globally cyclic + locally cyclic where the first two components make up transitive component and gradient flow is integrable to give a global ranking example (b) is locally (triangularly) acyclic, but cyclic on large scale example (a) is locally (triangularly) cyclic L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 15 / 32

Clique complex of a graph Extend graph G to a simplicial complex K(G) by attaching triangles 0-simplices K 0 (G): V 1-simplices K 1 (G): E 2-simplices K 2 (G): triangles {i, j, k} such that every edge is in E k-simplices K k (G): (k + 1)-cliques {i 0,..., i k } of G For ranking problems, suffices to construct K(G) up to dimension 2! global ranking v : V R, 0-forms, ie. vectors pairwise ranking w(i, j) = w(j, i) for (i, j) E, 1-forms, ie. skew-symmetric matrices L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 16 / 32

Clique complex k-forms: C k (K(G), R) = {u : K k+1 (G) R, u iσ(0),...,i σ(k) = sign(σ)u i0,...,i k } for (i 0,..., i k ) K k+1 (G), where σ S k+1 is a permutation on (0,..., k). May put metrics/inner products on C k (K(G), R). The following metric on 1-forms, is useful for the imbalance issue w ij, ω ij D = D ij w ij ω ij (i,j) E where D ij = {customers who rate both i and j}. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 17 / 32

Discrete exterior derivatives (coboundary maps) k-coboundary maps δ k : C k (K(G), R) C k+1 (K(G), R) are defined as the alternating difference operator k+1 (δ k u)(i 0,..., i k+1 ) = ( 1) j+1 u(i 0,..., i j 1, i j+1,..., i k+1 ) j=0 δ k plays the role of differentiation δ k+1 δ k = 0 In particular, (δ0 v)(i, j) = v j v i =: (grad v)(i, j) (δ1 w)(i, j, k) = (±)(w ij + w jk + w ki ) =: (curl w)(i, j, k) (triangular-trace of skew-symmetric matrix [w ij ]) L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 18 / 32

Div, Grad, Curl For each triangle {i, j, k}, the curl (curl w)(i, j, k) = (δ 1 w)(i, j, k) = w ij + w jk + w ki measures the total flow-sum along the loop i j k i. (δ 1 w)(i, j, k) = 0 implies the flow is path-independent, which defines the triangular transitivity subspace. For each alternative i V, the divergence (div w)(i) := (δ T 0 w)(i) := w i measures the inflow-outflow sum at i. (δ0 T w)(i) = 0 implies alternative i is preference-neutral in all pairwise comparisons. divergence-free flow δ0 T w = 0 is cyclic L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 19 / 32

Boundary of a boundary is empty Fundamental tenet of topology: δ k+1 δ k = 0. For k = 0, C 0 δ 0 C 1 δ 1 C 2, ie. and so We have Global Global grad Pairwise curl Triplewise grad (=: div) Pairwise curl Triplewise. curl grad(global Rankings) = 0. This implies global rankings are transitive/consistent, no need to consider rankings beyond triplewise. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 20 / 32

Combinatorial Laplacians k-dimensional combinatorial Laplacian, k : C k C k by k = δ k 1 δk 1 + δ k δ k, k > 0 k = 0, graph Laplacian or vertex Laplacian 0 = δ0δ 0 k = 1, vector Laplcian (first term is edge Laplacian) 1 = δ 0 δ0 + δ1δ 1 = curl curl div grad Important Properties: k positive semidefinite ker( k ) = ker(δk 1 ) ker(δ k) harmonic forms Hodge decomposition L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 21 / 32

Hodge decomposition for combinatorial Laplacians Every combinatorial Laplacians k has an associated Hodge decomposition. For k = 1, this is the decomposition (of discrete vector fields/skew symmetric matrices/network flows) that we have been discussing. Theorem (Hodge decomposition for pairwise ranking) The space of pairwise rankings, C 1 (K(G), R), admits an orthogonal decomposition into three C 1 (K(G), R) = im(δ 0 ) H 1 im(δ 1) where H 1 = ker(δ 1 ) ker(δ 0) = ker( 1 ). L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 22 / 32

Illustration Figure: Hodge decomposition for pairwise rankings L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 23 / 32

Harmonic rankings: locally consistent but globally inconsistent B 1 C 1 2 1 A D 1 2 1 F 1 E Figure: A locally consistent but globally cyclic harmonic ranking. Figure: A harmonic ranking from truncated Netflix movie-movie network L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 24 / 32

Rank aggregation as projection Rank aggregation problem reduced essentially to linear least squares Corollary Every pairwise ranking admits a unique orthogonal decomposition, w = proj im(δ0 ) w + proj ker(δ 0 ) w i.e. pairwise = grad(global) + cyclic Particularly the first projection grad(global) gives a global ranking x = (δ 0δ 0 ) δ 0w = ( 0 ) div(w) O(n 3 ) flops complexity with great algorithms (dense: Givens/Householder QR, Golub-Reinsch SVD; sparse: CGLS, LSQR; sampling: DMM 06) L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 25 / 32

Erdős-Rényi random graph Heuristical justification from Erdős-Rényi random graphs (cf. earlier talks by Chung, Mihail, Saberi) Theorem (Kahle 07) For an Erdős-Rényi random graph G(n, p) with n vertices and edges forming independently with probability p, its clique complex χ G will have zero 1-homology almost always, except when 1 n 2 p 1 n. Since the full Netflix movie-movie comparison graph is almost complete (0.22% missing edges), one may expect the chance of nontrivial harmonic ranking is small. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 26 / 32

Which pairwise ranking model might be better? Use curl distribution: x Curl Distribution of Pairwise Rankings 105 3 Pairwise Score Difference Pairwise Probability Difference Probability Log Ratio 2.5 2 1.5 1 0.5 0 0 100 200 300 400 500 600 700 800 900 1000 Figure: Curl distribution of three pairwise rankings, based on most popular 500 movies. The pairwise score difference in red have the thinnest tail. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 27 / 32

Comparisons of Netflix global rankings Mean Score Score Difference Probability Difference Logarithmic Odd Ratio Mean Score 1.0000 0.9758 0.9731 0.9746 Score Difference 1.0000 0.9976 0.9977 Probability Difference 1.0000 0.9992 Logarithmic Odd Ratio 1.0000 Cyclic Residue - 6.03% 7.16% 7.15% Table: Kendall s Rank Correlation Coefficients between different global rankings for Netflix. Note that the pairwise score difference has the smallest relative residue. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 28 / 32

Why pairwise ranking works for Netflix? Pairwise rankings are good approximations of gradient flows on movie-movie networks In fact, Netflix data in the large scale behaves like a 1-dimensional curve in high dimensional space To visualize this, we use a spectral embedding approach L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 29 / 32

Spectral embedding Technique proposed by Goel, Diaconis, and Holmes (cf. earlier talk by Jordan). Map every movie to a point in S 5 by movie m ( p 1 (m),..., p 5 (m)) where p k (m) is the probability that movie m is rated as star k {1,..., 5}. Obtain a movie-by-star matrix Y. Do SVD on Y, which is equivalent to do eigenvalue decomposition on the linear kernel K(s, t) = s, t d, d = 1 K(s, t) is nonnegative, whence the first eigenvector captures the centricity (density) of data and the second captures a tangent field of the manifold. L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 30 / 32

SVD embedding Figure: The second singular vector is monotonic to the mean score, indicating the intrinsic parameter of the horseshoe curve is driven by the mean score L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 31 / 32

Conclusions Ranking as 1-dimensional scaling of data Pairwise ranking as approximate gradient fields or flows on graphs Hodge Theory provides an orthogonal decomposition for pairwise ranking flows, This decomposition helps characterize the local (triangular) vs. global consistency of pairwise rankings, and gives a natural rank aggregation scheme L.-H. Lim and Y. Yao (MMDS 2008) Rank aggregation and Hodge decomposition June 28, 2008 32 / 32