Combinatorial Hodge Theory and a Geometric Approach to Ranking

Similar documents
Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

Combinatorial Laplacian and Rank Aggregation

Graph Helmholtzian and Rank Learning

Rank Aggregation via Hodge Theory

Lecture III & IV: The Mathemaics of Data

HodgeRank on Random Graphs

Statistical ranking and combinatorial Hodge theory

Statistical ranking and combinatorial Hodge theory

HODGERANK: APPLYING COMBINATORIAL HODGE THEORY TO SPORTS RANKING ROBERT K SIZEMORE. A Thesis Submitted to the Graduate Faculty of

New Model Stability Criteria for Mixed Finite Elements

A case study in Interaction cohomology Oliver Knill, 3/18/2016

Homology of Random Simplicial Complexes Based on Steiner Systems

Nonlinear Dimensionality Reduction

Elementary linear algebra

Hodge Laplacians on Graphs

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

arxiv: v3 [cs.it] 15 Nov 2015

LECTURE NOTE #11 PROF. ALAN YUILLE

A New Space for Comparing Graphs

Numerical Methods - Numerical Linear Algebra

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Lecture 9: Low Rank Approximation

Practical Linear Algebra: A Geometry Toolbox

CS 468. Differential Geometry for Computer Science. Lecture 13 Tensors and Exterior Calculus

Flows and Decompositions of Games: Harmonic and Potential Games

Nonlinear Dimensionality Reduction

Borda count approximation of Kemeny s rule and pairwise voting inconsistencies

Algebraic Representation of Networks

Lecture 2: Linear Algebra Review

Background Mathematics (2/2) 1. David Barber

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Matrix estimation by Universal Singular Value Thresholding

ABSTRACT ALGEBRA WITH APPLICATIONS

Spectral Processing. Misha Kazhdan

Lecture 02 Linear Algebra Basics

Collaborative Filtering. Radek Pelánek

CS 468 (Spring 2013) Discrete Differential Geometry

Eigenvalue Problems Computation and Applications

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Recommendation Systems

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Seminar on Vector Field Analysis on Surfaces

STA141C: Big Data & High Performance Statistical Computing

Using SVD to Recommend Movies

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Theorems of Erdős-Ko-Rado type in polar spaces

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Some Topics in Computational Topology. Yusu Wang. AMS Short Course 2014

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Math 302 Outcome Statements Winter 2013

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

Rotational motion of a rigid body spinning around a rotational axis ˆn;

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Review of Some Concepts from Linear Algebra: Part 2

STA141C: Big Data & High Performance Statistical Computing

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Directional Field. Xiao-Ming Fu

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Mathematical Preliminaries

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

L26: Advanced dimensionality reduction

Klaus Janich. Vector Analysis. Translated by Leslie Kay. With 108 Illustrations. Springer

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Transcendental L 2 -Betti numbers Atiyah s question

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Differential Geometry: Discrete Exterior Calculus

Survey on exterior algebra and differential forms

Recommendation Systems

Section 2. Basic formulas and identities in Riemannian geometry

Some Topics in Computational Topology

A Generalization for Stable Mixed Finite Elements

Control Using Higher Order Laplacians in Network Topologies

Linear Algebra Review

MULTIVARIABLE CALCULUS, LINEAR ALGEBRA, AND DIFFERENTIAL EQUATIONS

Collaborative Filtering

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Math 6510 Homework 10

MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT

Graph Detection and Estimation Theory

Dot Products, Transposes, and Orthogonal Projections

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

PHYS 705: Classical Mechanics. Rigid Body Motion Introduction + Math Review

Spectral Theorem for Self-adjoint Linear Operators

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

STA141C: Big Data & High Performance Statistical Computing

o A set of very exciting (to me, may be others) questions at the interface of all of the above and more

Euler Characteristic of Two-Dimensional Manifolds

. = V c = V [x]v (5.1) c 1. c k

Generalized Exponential Random Graph Models: Inference for Weighted Graphs

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

15 Singular Value Decomposition

MATHEMATICS 217 NOTES

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Cartesian Tensors. e 2. e 1. General vector (formal definition to follow) denoted by components

8.1 Concentration inequality for Gaussian random matrix (cont d)

1 Searching the World Wide Web

Transcription:

Combinatorial Hodge Theory and a Geometric Approach to Ranking Yuan Yao 2008 SIAM Annual Meeting San Diego, July 7, 2008 with Lek-Heng Lim et al.

Outline 1 Ranking on networks (graphs) Netflix example Skew-symmetric matrices and network flows 2 Combinatorial Hodge Theory Discrete Differential Geometry Hodge Decomposition Theorem Rank aggregation as a linear projection 3 Why pairwise works for netflix: a spectral embedding view 4 Conclusions

Ranking on networks (graphs) Ranking on Networks (Graphs) Multi-criteria rank/decision systems Amazon or Netflix s recommendation system (user-product) Interest ranking in social networks (person-interest) Financial analyst recommendation system (analyst-stock) Voting (voter-candidate) Peer-review systems publication citation systems (paper-paper) Google s webpage ranking (web-web) ebay s reputation system (customer-customer)

Ranking on networks (graphs) continued... Ranking data are incomplete: partial list or pairwise (e.g. 1% in netflix) unbalanced: varied distributed votes (e.g. power law) cardinal: scores or stochastic choices Implicitly or explicitly, ranking data may be viewed to live on a simple graph G = (V,E) V : set of alternatives (products, interests,...) to be ranked E: pairs of alternatives compared

Ranking on networks (graphs) Netflix example Example: Netflix Example (Netflix Customer-Product Rating) 480189-by-17770 customer-product 5-star rating matrix X X is incomplete, with 98.82% of missing values However, pairwise comparison graph G = (V,E) is very dense! only 0.22% edges are missed, almost complete rank aggregation without estimating missing values! unbalanced: number of raters on e E varies. Caveat: we are not trying to solve the Netflix prize problem

Ranking on networks (graphs) Netflix example Netflix example continued The first-order statistics, mean score for each product, is often inadequate because of the following: most customers just rate a very small portion of the products different products might have different raters, whence mean scores involve noise due to arbitrary individual rating scales not able to characterize the inconsistency (ubiquitous for rank aggregation: Arrow s impossibility theorem) How about high order statistics?

Ranking on networks (graphs) Skew-symmetric matrices and network flows From 1 st Order to 2 nd Order: Pairwise Rankings Linear Model: average score difference between product i and j over all customers who have rated both of them, k w ij = (X kj X ki ) #{k : X ki,x kj exist}, is translation invariant. Log-linear Model: when all the scores are positive, the logarithmic average score ratio, k w ij = (log X kj log X ki ) #{k : X ki,x kj exist}, is invariant up to a multiplicative constant.

Ranking on networks (graphs) Skew-symmetric matrices and network flows More Invariants Linear Probability Model: the probability that product j is preferred to i in excess of a purely random choice, w ij = Pr{k : X kj > X ki } 1 2. This is invariant up to a monotone transformation. Bradley-Terry Model: logarithmic odd ratio (logit) w ij = log Pr{k : X kj > X ki } Pr{k : X kj < X ki }. This is invariant up to a monotone transformation.

Ranking on networks (graphs) Skew-symmetric matrices and network flows Skew-Symmetric Matrices of Pairwise Rankings All such models induce (sparse) skew-symmetric matrices of V -by- V (or 2-alternating tensor), { wji, {i,j} E w ij = NA, otherwise where G = (V,E) is a pairwise comparison graph. Note: Such a skew-symmetric matrix induces a pairwise ranking network flow on graph G.

Ranking on networks (graphs) Skew-symmetric matrices and network flows Pairwise Ranking Flow for IMDB Top 20 Movies Figure: Pairwise ranking flow on a complete graph

Ranking on networks (graphs) Skew-symmetric matrices and network flows Rank Aggregation Problem Hardness: Arrow-Sen s impossibility theorems in social choice theory Kemeny-Snell optimal ordering is NP-hard to compute Spectral analysis on S n is impractical for large n since S n = n! Our approach: Problem (Rank Aggregation) Does there exist a global ranking function, v : V R, such that w ij = v j v i =: δ 0 (v)(i,j)? Equivalently, does there exist a scalar field v : V R whose gradient field gives the flow w?

Ranking on networks (graphs) Skew-symmetric matrices and network flows Answer: Not Always! From multivariate calculus, there are non-integrable vector fields (cf. movie A Beautiful Mind). A combinatorial version: B B 1 C 1 1 2 1 1 A D A 1 2 1 1 C F 1 E Figure: No global ranking v gives w ij = v j v i : (a) cyclic ranking, note w AB + w BC + w CA 0; (b) contains a 4-node cyclic flow A C D E A, note on 3-clique {A, B, C} (also {A, E, F }), w AB + w BC + w CA = 0

Ranking on networks (graphs) Skew-symmetric matrices and network flows Triangular Transitivity: null triangular-trace Fact For a skew-symmetric matrix W = (w ij ) associated with graph G, v : w ij = v j v i w ij + w jk + w ki = 0, 3-clique {i,j,k} Note: Transitivity subspace: null triangular-trace or curl-free {W : w ij + w jk + w ki = 0, 3-clique {i,j,k}} Example in the last slide, (a) lies outside; (b) lies in this subspace, but not a gradient flow.

Ranking on networks (graphs) Skew-symmetric matrices and network flows Hodge decomposition for skew-symmetric matrices A skew-symmetric matrix W associated with G has an unique orthogonal decomposition where W = W 1 + W 2 + W 3 W 1 satisfies ( integrable ): W 1 (i,j) = v j v i for some v : V R; W 2 satisfies that ( curl-free ) W 2 (i,j) + W 2 (j,k) + W 3 (k,i) = 0 for all 3-clique {i,j,k} ( divergence-free ) j W 3(i,j) = 0 for all edge {i,j} E; W 3 W 1 and W 3 W 2.

Ranking on networks (graphs) Skew-symmetric matrices and network flows Hodge decomposition for network flows A network flows (e.g. pairwise rankings) on graph G has an orthogonal decomposition into gradient flow + locally acyclic (harmonic) + locally cyclic where the first two components lie in the transitivity subspace and gradient flow is integrable to give a global ranking example (b) is locally acyclic, but cyclic on large scale (harmonic) example (a) is locally cyclic

Combinatorial Hodge Theory Discrete Differential Geometry Clique Complex and Discrete Differential Forms We extend graph G to a simplicial complex χ G by attaching triangles 0-simplices χ 0 G : vertices V 1-simplices χ 1 G : edges E such that comparison (i.e. pairwise ranking) between i and j exists 2-simplices χ 2 G : triangles {i,j,k} such that every edge exists Note: it suffices here to construct χ G up to dimension 2! global ranking v : V R, 0-forms, vectors pairwise ranking w(i,j) = w(j,i) ( (i,j) E), 1-forms, skew-symmetric matrices, network flows

Combinatorial Hodge Theory Discrete Differential Geometry Space of k-forms (k-cochains) and Metrics k-forms: C k (χ G, R) = {u : χ k+1 G R,u σ(i0 ),...,σ(i k ) = sign(σ)u i0,...,i k } for (i 0,...,i k ) χ k+1 G, where σ S k+1 is a permutation on (0,...,k). Also k + 1-alternating tensors. One may associate C k (χ G, R) with inner-products. In particular, the following inner-product on 1-forms is for unbalance issue in pairwise ranking w ij,ω ij D = D ij w ij ω ij, w,ω C 1 (χ G, R) (i,j) E where D ij = {customers who rate both i and j}.

Combinatorial Hodge Theory Discrete Differential Geometry Discrete Exterior Derivatives (Coboundary Maps) k-coboundary maps δ k : C k (χ G, R) C k+1 (χ G, R) are defined as the alternating difference operator k+1 (δ k u)(i 0,...,i k+1 ) = ( 1) j+1 u(i 0,...,i j 1,i j+1,...,i k+1 ) j=0 δ k plays the role of differentiation In particular, (δ 0 v)(i,j) = v j v i =: (gradv)(i,j) (δ 1 w)(i,j,k) = (±)w ij + w jk + w ki =: (curlw)(i,j,k) (triangular-trace of skew-symmetric matrix (w ij ))

Combinatorial Hodge Theory Discrete Differential Geometry Curl Definition For each triangle {i,j,k}, the curl (triangular trace) w ij + w jk + w ki measures the total flow-sum along the loop i j k i. (δ 1 w)(i,j,k) = 0 implies the flow is path-independent, which defines the triangular transitivity subspace.

Combinatorial Hodge Theory Discrete Differential Geometry Two directions of cochain maps: Forward in other words, C 0 δ 0 C 1 δ 1 C 2, Global grad Pairwise curl Triplewise Backward Global δ 0 =grad Pairwise δ 1 =curl 3-alternating C 2 grad = δ0 is the negative divergence curl = δ1, is the boundary operator, gives triangular (locally) cyclic pairwise rankings along triangles

Combinatorial Hodge Theory Discrete Differential Geometry Divergence Definition For each alternative i V, the divergence (div w)(i) := (δ T 0 w)(i) := w i measures the inflow-outflow sum at i. (δ0 T w)(i) = 0 implies alternative i is preference-neutral in all pairwise comparisons. divergence-free flow δ0 T w = 0 is cyclic With metric D, conjugate operator gives weighted flow-sum (δ 0w)(i) = j w ij D ij = (δ T 0 Dw)(i)

Combinatorial Hodge Theory Discrete Differential Geometry A Fundamental Property: Closed Map Property Boundary of boundary is empty : δ k+1 δ k = 0 in particular gradient flow is curl-free: curl grad = δ 2 δ 1 = 0 circular flow is divergence-free: div curl = δ 1 δ 2 = 0 This leads to the powerful combinatorial Laplacians.

Combinatorial Hodge Theory Discrete Differential Geometry Combinatorial Laplacians Define the k-dimensional combinatorial Laplacian, k : C k C k by k = δ k 1 δ k 1 + δ k δ k, k > 0 k = 0, 0 = δ T 0 δ 0 is the well-known graph Laplacian k = 1, 1 = curl curl grad div Important Properties: k positive semi-definite ker( k ) = ker(δ T k 1 ) ker(δ k): divergence-free and curl-free, called harmonics

Combinatorial Hodge Theory Hodge Decomposition Theorem Hodge Decomposition Theorem Theorem The space of pairwise rankings, C 1 (χ G, R), admits an orthogonal decomposition into three C 1 (χ G, R) = im(δ 0 ) H 1 im(δ 1 ) where H 1 = ker(δ 1 ) ker(δ 0 ) = ker( 1).

Combinatorial Hodge Theory Hodge Decomposition Theorem Hodge Decomposition Illustration CYCLIC(divergence-free) ker δ 0 imδ0 H1 im δ 1 Global Harmonic Locally cyclic (globally acyclic) (locally acyclic) ker δ1 LOCALLY CONSISTENT(curl-free) Figure: Hodge decomposition for pairwise rankings

Combinatorial Hodge Theory Hodge Decomposition Theorem Harmonic Rankings: Locally but NOT Globally Consistent B 1 C 1 2 1 A D 1 2 1 F 1 E Figure: (a) a harmonic ranking; (b) from truncated Netflix network

Combinatorial Hodge Theory Rank aggregation as a linear projection Rank aggregation as a linear projection Corollary Every pairwise ranking admits a unique orthogonal decomposition, w = proj imδ0 w + proj ker(δ 0 ) w i.e. pairwise = grad(global) + cyclic Particularly the first projection grad(global) gives a global ranking x = (δ 0 δ 0) δ 0 w = ( 0) div w

Combinatorial Hodge Theory Rank aggregation as a linear projection When Harmonic Ranking Vanishes Corollary If the clique complex χ G has trivial 1-homology (no holes of boundary length 4), then triangular transitivity (curl w = 0) implies the existence of global ranking imδ 0. On simple-connected domains, curl-free vector fields are integrable. With trivial 1-homology χ G, local consistency implies the global consistency A particular case is when G is a complete graph, the projection on transitivity subspace gives a unique global ranking, called Borda Count (1782) in social choice theory

Combinatorial Hodge Theory Rank aggregation as a linear projection Example: Erdös-Renyi Random Graph Theorem (Kahle 06) For an Erdos-Renyi random graph G(n,p) with n vertices and each edge independently emerging with probability p, its clique complex χ G almost always has zero 1-homology, except that 1 n 2 p 1 n. Note that full Netflix movie-movie comparison graph is almost complete (0.22% missing edges), is that a Erdös-Renyi random graph?

Combinatorial Hodge Theory Rank aggregation as a linear projection Which Pairwise Ranking Model is More Consistent? Curl distribution measures the intrinsic local inconsistency in a pairwise ranking: x Curl Distribution of Pairwise Rankings 105 3 Pairwise Score Difference Pairwise Probability Difference Probability Log Ratio 2.5 2 1.5 1 0.5 0 0 100 200 300 400 500 600 700 800 900 1000 Figure: Curl distribution of three pairwise rankings, based on most popular 500 movies. The pairwise score difference (the Linear model) in red have the thinnest tail.

Combinatorial Hodge Theory Rank aggregation as a linear projection Comparisons of Netflix Global Rankings Mean Score Score Difference Probability Difference Logarithmic Odd Ratio Mean Score 1.0000 0.9758 0.9731 0.9746 Score Difference 1.0000 0.9976 0.9977 Probability Difference 1.0000 0.9992 Logarithmic Odd Ratio 1.0000 Cyclic Residue - 6.03% 7.16% 7.15% Table: Kendall s rank correlation coefficients between different global rankings for Netflix. Note that the pairwise score difference (the Linear model) has the smallest relative residue.

Why pairwise works for netflix: a spectral embedding view Why Pairwise Ranking Works for Netflix? Pairwise rankings are good approximations of gradient flows on movie-movie networks In fact, netflix data in the large scale behave like a 1-dimensional curve in high dimensional space To visualize this, we may use a spectral embedding approach

Why pairwise works for netflix: a spectral embedding view Spectral Embedding Map every movie to a point in S 4 by movie m ( p 1 (m),..., p 5 (m)) where p k (m) is the probability that movie m is rated as star k {1,...,5}. Obtain a movie-by-star matrix Y. Do SVD on Y, which is equivalent to do eigenvalue decomposition on the linear kernel K(s,t) = s,t d, d = 1, s,t S 4 K(s,t) is non-negative, whence the first eigenvector captures the centricity (density) of data and the second captures a tangent field of the manifold.

Why pairwise works for netflix: a spectral embedding view SVD Embedding Figure: The second singular vector is monotonic to the mean score, indicating the intrinsic parameter of the horseshoe curve is driven by the mean score

Conclusions Conclusions Ranking as 1-dimensional scaling of data Pairwise ranking as approximate gradient fields or flows on graphs Hodge Theory provides an orthogonal decomposition for pairwise ranking flows This decomposition helps characterize the local (triangular) vs. global consistency of pairwise rankings, and gives a natural rank aggregation scheme Geometry and topology play important roles, but everything is just linear algebra!

Acknowledgements Gunnar Carlsson Vin de Silva Persi Diaconis Leo Guibas Fei Han Susan Holmes Qixing Huang Xiaoye Jiang Ming Ma Jason Morton Art Owen Michael Saunders Harlan Sexton Steve Smale Shmuel Weinberger Ya Xu