o A set of very exciting (to me, may be others) questions at the interface of all of the above and more

Similar documents
Ranking from Pairwise Comparisons

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Minimax-optimal Inference from Partial Rankings

Recoverabilty Conditions for Rankings Under Partial Information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Inferring Rankings Using Constrained Sensing Srikanth Jagabathula and Devavrat Shah

SYNC-RANK: ROBUST RANKING, CONSTRAINED RANKING AND RANK AGGREGATION VIA EIGENVECTOR AND SDP SYNCHRONIZATION

Ranking: Compare, don t Score

Convex Optimization of Graph Laplacian Eigenvalues

Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

The non-backtracking operator

Social Choice and Networks

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

3.1 Arrow s Theorem. We study now the general case in which the society has to choose among a number of alternatives

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Overlapping Communities

Approximating a single component of the solution to a linear system

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Lecture: Aggregation of General Biased Signals

14 : Theory of Variational Inference: Inner and Outer Approximation

Lecture 9: PGM Learning

Lecture 13: Spectral Graph Theory

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Graph Helmholtzian and Rank Learning

CO759: Algorithmic Game Theory Spring 2015

Machine Learning for Data Science (CS4786) Lecture 24

1 Primals and Duals: Zero Sum Games

Simple, Robust and Optimal Ranking from Pairwise Comparisons

A physical model for efficient rankings in networks

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Undirected Graphical Models

Markov Chains and Spectral Clustering

Spring 2016 Network Science. Solution of Quiz I

Restricted Boltzmann Machines for Collaborative Filtering

Lecture 7: Convex Optimizations

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

Today: Linear Programming (con t.)

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

SF2972 Game Theory Exam with Solutions March 15, 2013

EECS 495: Combinatorial Optimization Lecture Manolis, Nima Mechanism Design with Rounding

Multiclass Classification-1

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Social Choice and Social Networks. Aggregation of General Biased Signals (DRAFT)

14 : Theory of Variational Inference: Inner and Outer Approximation

STA 4273H: Statistical Machine Learning

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

CS-E4830 Kernel Methods in Machine Learning

Graph Sparsifiers: A Survey

Recap Social Choice Fun Game Voting Paradoxes Properties. Social Choice. Lecture 11. Social Choice Lecture 11, Slide 1

Probabilistic and Bayesian Machine Learning

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming

Network Theory with Applications to Economics and Finance

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Breaking the 1/ n Barrier: Faster Rates for Permutation-based Models in Polynomial Time

CS6220: DATA MINING TECHNIQUES

Inference as Optimization

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Junction Tree, BP and Variational Methods

Polyhedral Approaches to Online Bipartite Matching

Linear Programming. Jie Wang. University of Massachusetts Lowell Department of Computer Science. J. Wang (UMass Lowell) Linear Programming 1 / 47

Algorithm Design and Analysis

Data Mining Recitation Notes Week 3

Lecture 6: Graphical Models: Learning

High-dimensional graphical model selection: Practical and information-theoretic limits

CS 188: Artificial Intelligence Fall 2011

CSC 412 (Lecture 4): Undirected Graphical Models

Submodular Functions Properties Algorithms Machine Learning

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1

Approximate Nash Equilibria with Near Optimal Social Welfare

13 : Variational Inference: Loopy Belief Propagation

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

1 Matrix notation and preliminaries from spectral graph theory

Matrix estimation by Universal Singular Value Thresholding

Markov Chains, Random Walks on Graphs, and the Laplacian

fsat We next show that if sat P, then fsat has a polynomial-time algorithm. c 2010 Prof. Yuh-Dauh Lyuu, National Taiwan University Page 425

Probabilistic Graphical Models

Fast Linear Iterations for Distributed Averaging 1

CS6220: DATA MINING TECHNIQUES

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

arxiv: v1 [cs.lg] 30 Dec 2015

Does Better Inference mean Better Learning?

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

High-dimensional graphical model selection: Practical and information-theoretic limits

A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

Overview. 1 Introduction. 2 Preliminary Background. 3 Unique Game. 4 Unique Games Conjecture. 5 Inapproximability Results. 6 Unique Game Algorithms

Machine Learning Summer School

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Discrepancy Theory in Approximation Algorithms

Learning Binary Classifiers for Multi-Class Problem

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Math 201 Statistics for Business & Economics. Definition of Statistics. Two Processes that define Statistics. Dr. C. L. Ebert

Transcription:

Devavrat Shah Laboratory for Information and Decision Systems Department of EECS Massachusetts Institute of Technology http://web.mit.edu/devavrat/www (list of relevant references in the last set of slides)

o Ideally o Graphical models o Belief propagation o Connections to Probability, Statistics, EE, CS, o In reality o A set of very exciting (to me, may be others) questions at the interface of all of the above and more o Seemingly unrelated to graphical model o However, provide fertile ground to understand everything about graphical models (algorithms, analysis)

o Recommendations o What movie to watch o Which restaurant to eat o o Precisely, o Suggest what you may like o Given what others have liked o By finding others like you and what they had liked

o Ranking o Players and/or Teams o Based on outcome of games o Papers at a competitive conference o Using reviews o Graduate admissions o From feedback of professors o Precisely, o Global ranking of objects from partial preferences

o Partial preferences are revealed in different forms o Sports: Win and Loss o Social: Starred rating o Conferences: Scores o All can be viewed as pair- wise comparisons o IND beats AUS: IND AUS o Clio ***** vs No 9 Park ****: Clio No 9 Park o Ranking Paper 9/10 vs Other Paper 5/10: Ranking Other

o Revealed preferences lead to o Bag of pair- wise comparisons o Question of interest o Recommendations o Suggest what you may like given what others have liked o Ranking o Global ranking of objects given outcome of games/ Ø This requires understanding (computing) choice model o What people like/dislike from pair- wise comparisons

o Rational view: Axiom of revealed preferences [Samuelson 37] o There is one ordering over all objects consistent across population o Unlikely (lack of transitivity in people s preferences) o Meaningful view discrete choice model o Distribution over orderings of objects o consistent with population s revealed preferences Data Choice Model Decision B B A B C C B C A A C A 0.25 0.75 A B B C C A B C A

o Object tracking (cf. Huang, Guestrin, Guibas 08) o Noisy observations of locations o Feasible to maintain partial information only o Q=[Q ij ] first- order information 1 2 Q 11 = P(1è P1) Q 13 Q 12 P1 P2 3 P3 Objects Locations

o Object tracking o Noisy observations of locations o Feasible to maintain partial information only o Q=[Q ij ] first- order information Data Choice Model Decision 1 2 Q 11 Q 12 Q 13 P1 P2 1 2 P1 P2 1 2 P1 P2 3 P3 3 P3 3 P3

o Recommendation o Ranking o Object tracking o Policy making o Business operations (assortment optimization) o Display advertising o Polling, o Canonical question o Decision using choice model learnt from partial preference data

1 2 Q 11 Q 12 Q 13 P1 P2 3 P3

Q 11 1 Q 12 P1 Q 13 2 P2 3 P3 o Q. Given weighted bipartite graph G=(V, E, Q) o Find matching of objects/positions o That is `most likely

1 P1 2 P2 3 P3 o Answer: maximum weight matching o Weight of a matching equals o summation of Q- entries of edges participating in the matching

A B C C B C A A

# times 1 defeats 2 1 A 12 A 21 6 2 5 3 4 o Q1. Given weighted comparison graph G=(V, E, A) o Find ranking of/scores associated with objects o Q2. When possible (e.g. Conference/Crowd- Sourcing), choose G so as to o Minimize the number of comparisons required to find ranking/scores

1 A 12 A 21 6 2 o Random walk on comparison graph G=(V,E,A) o d = max (undirected) vertex degree of G o For each edge (i,j): o P ij = (A ji +1)/(A ij +A ij +2) x 1/d o For each node i: o P ii = 1- Σ j i P ij o Let G be connected o Let s be the unique stationary distribution of RW P o Ranking: o Use s as scores of objects s T = s T P 5 4 3

1 A 12 A 21 6 2 o Random walk on comparison graph G=(V,E,A) o d = max (undirected) vertex degree of G o For each edge (i,j): o P ij = (A ji +1)/(A ij +A ij +2) x 1/d o For each node i: o P ii = 1- Σ j i P ij o Ranking: use s as scores of objects, where o s be the unique stationary distribution of RW P o Choice of graph G s T = s T P o Subject to constraints, choose G so that o Spectral gap of natural RW on G is maximized o SDP [Boyd, Diaconis, Xiao 04] 5 4 3

1 2 Q 11 Q 12 Q 13 P1 P2 A B C B C A 3 P3 C A o Maximum Weight Matching o How to compute it? o Belief propagation o Why does it make sense? o Max- likelihood estimation w.r.t. exponential family o Rank centrality o How to compute it? o Power- iteration o Why does it make sense? o Mode for Bradley- Terry- Luce (or MNL) model

1 2 Q 11 Q 12 Q 13 P1 P2 3 P3

(all of below explained using class- board) o Computation o Belief propagation o Algorithm o Why it works o Model o Maximum entropy (max- ent) consistent distribution o Maximum Likelihood in exponential family o Maximum weight matching o First- order approximation of mode of this distribution o Exact computation of max- ent o Via dual gradient o Belief propagation/mcmc at rescue

A B C C B C A A

o Choice model (distribution over permutations) [Bradley- Terry- Luce (BTL) or MNL (cf. McFadden) Model] o Each object i has an associated weight w i 0 o When objects i and j are compared o P(i j) = w i /(w i + w j ) o Sampling model o Edges E of graph G are selected o For each (i,j) ε E, sample k pair- wise comparisons

o Error(s) = 1 w ( (w i -w j ) 2 I {(s(i)-s(j))(w i -w j )<0 ij }) 1/2 o G: Erdos- Renyi graph with edge prob. d/n 0.1 Ratio Matrix L1 ranking Rank Centrality ML estimate 0.1 Ratio Matrix L1 ranking Rank Centrality ML estimate 0.01 0.01 0.001 0.0001 1 10 100 0.001 0.1 1 k d/n

o Theorem 1. o Let R= (max ij w i /w j ). o Let G be Erdos- Renyi graph. o Under Rank centrality, with d = Ω(log n) s-w w C R 5 log n kd o That is, sufficient to have O(R 5 n log n) samples o Optimal dependence on n for ER graph o Dependence on R?

o Theorem 1. o Let R= (max ij w i /w j ). o Let G be Erdos- Renyi graph. o Under Rank centrality, with d = Ω(log n) s-w w C R 5 log n kd o Information theoretic lower- bound: for any algorithm s-w w C' 1 kd

o Theorem 2. o Let R= (max ij w i /w j ). o Let G be any connected graph: o L = D - 1 E be natural RW transition matrix o Δ = 1- λ max (L) o κ = d max /d min o Under Rank centrality, with kd = Ω(log n) s-w w C Δ κ R5 log n kd o That is, number of samples required O(R 5 κ 2 n log n x Δ - 2 ) o Graph structure plays role through it s Laplacian

o Theorem 2. o Under Rank centrality, with kd = Ω(log n) s-w w C Δ κ R5 log n kd o That is, number of samples required O(R 5 κ 2 n log n x Δ - 2 ) o Choice of graph G o Subject to constraints, choose G so that o Spectral gap Δ is maximized o SDP [Boyd, Diaconis, Xiao 04]

o Bound on o Use of comparison theorem [Diaconis- Saloff Coste 94]++ o Bound on o Use of (modified) concentration of measure inequality for matrices o Finally, use this to further bound Error(s)

Washington Post: Allourideas

o Ground truth: algorithm s result with complete data o Error: average position discrepancy 20 18 16 14 12 10 8 6 4 2 0 L1 ranking Rank Centrality 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

o Input: complete preference (not comparisons) o Axiomatic impossibility [Arrow 51] o Some algorithms o Kemeny optimal: minimize disagreements o Extended Condorcet Criteria o NP- hard, 2- approx algorithm [Dwork et al 01] 1 6 2 5 2 3 4 1 5 6 4 A 21 6 2 5 1 4 3 A 12 3 o Borda count: average position is score o Simple o Useful axiomatic properties [Young 74]

o Algorithms with partial data o Let pair- wise data available for all pairs o Kemeny distance depends on pair- wise marginal only argmin σ A ij Ι(σ (i) < σ (j)) o Data is consistent with a distribution on permutations o For example, obtained as the max- ent approximation o Kemeny optimal of this distribution is the same as above o NP- hard o 2- approx for this distribution acts as 2- approx for the above [Ammar- Shah 11 12]

o Borda count o Average position o But, comparison do not have position information o Given pair- wise marginal p ij for all i j o For any distribution consistent with pair- wise marginal o Borda count is given as c(i) p j ij [Ammar- Shah 11 12]

o Finding winner and BTL choice model [Adler, Gemmell, Harchol- Balter, Karp, Kenyon 87] o O(log n) iteration adaptive algorithm, O(n) total comparisons o Noisy sorting and Mallow s model [Braverman, Mossel 09] o O(n log n) samples in total (complete ordering) o Average position (Borda count) algorithm++ o Polynomial(n) time algorithm

o Choice model o A powerful model to tackle a range of questions o Many are in it s infancy (e.g. recommendations) o Challenge being computation + statistics o Excellent play ground to resolve challenges of graphical models o Two examples in this tutorial o Object tracking o Learning from first- order marginals o Ranking o Using pair- wise comparisons

o Open direction: o Learning graphical models efficiently o Computationally and statistically o A concrete question: o Given pair- wise comparisons data o When can we learn the choice model efficiently? o For example, if exact pair- wise comparison marginals available o Then, can learn `sparse choice model up to o(log n) sparsity [Farias + Jagabathula + S 09 12] o But what about noisy setting? o Or, max- ent learning?

o Part I: o A. Ammar, D. Shah, ``Efficient rank aggregation from partial data, Proceedings of ACM Sigmetrics 2012. o M. Bayati, D. Shah, M. Sharma, ``Max- product for maximum weight matching: convergence, correctness and LP duality, IEEE Transactions on Information Theory, 2008. o S. Jagabathula, D. Shah, ``Inferring ranking using constrained sensing, IEEE Transactions on Information Theory, 2011. o S. Agrawal, Z. Wang, Y. Ye, ``Parimutuel betting on permutations, Internet and Network Economics, 2008. o J. Huang, C. Guestrin, L. Guibas, ``Fourier theoretic probabilistic inference over permutations, Journal of Machine Learning Research, 2009. Color coding covered in tutorial sparse choice model that is v. relevant others/ closely related/definitely worth a read

o Part II: o S. Negahban, S. Oh, D. Shah, ``Iterative ranking using pair- wise comparisons, Proceedings of NIPS 2012. o V. Farias, S. Jagabathula, D. Shah, ``Data driven approach to modeling choice, Proceedings of NIPS, 2009. Also Management Science, 2012 (and Arxiv). o V. Farias, S. Jagabathula, D. Shah, ``Sparse choice model, available on Arxiv, 2012. o A. Ammar, D. Shah, ``Compare, don t score, Proceedings of Allerton, 2011. Color coding covered in tutorial sparse choice model that is v. relevant others/ closely related/definitely worth a read

o At large: o H. Varian, ``Revealed preferences, Samuelsonian economics and the twenty- first century, 2006. o D. McFadden, ``Disaggregate Behavioral Travel Demand s RUM Side, A 30- year Retrospective, available online http://emlab.berkeley.edu/pub/wp/mcfadden0300.pdf o P. Diaconis, ``Group representation in probability and statistics, Lecture notes- monograph series, 1988. o M. Wainwright, M. Jordan, ``Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, 2008. Color coding covered in tutorial sparse choice model that is v. relevant others/ closely related/definitely worth a read