Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Similar documents
Statistical and Computational Phase Transitions in Planted Models

Computational Lower Bounds for Community Detection on Random Graphs

Collaborative filtering with information-rich and information-sparse entities

Robust Principal Component Analysis

1-Bit Matrix Completion

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

1-Bit Matrix Completion

Reconstruction in the Generalized Stochastic Block Model

1-Bit Matrix Completion

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Binary matrix completion

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Rank minimization via the γ 2 norm

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Information Recovery from Pairwise Measurements

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

PU Learning for Matrix Completion

Reconstruction in the Sparse Labeled Stochastic Block Model

Data Mining Techniques

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

Overlapping Communities

High-dimensional graphical model selection: Practical and information-theoretic limits

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

High-dimensional graphical model selection: Practical and information-theoretic limits

Adaptive one-bit matrix completion

Analysis of Robust PCA via Local Incoherence

Collaborative Filtering

Nonnegative Matrix Factorization

Learning Relational Kalman Filtering

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

CSC 576: Variants of Sparse Learning

Recovering any low-rank matrix, provably

8.1 Concentration inequality for Gaussian random matrix (cont d)

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Collaborative Filtering Matrix Completion Alternating Least Squares

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

sparse and low-rank tensor recovery Cubic-Sketching

Low-Rank Matrix Recovery

Lecture Notes 10: Matrix Factorization

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Conditions for Robust Principal Component Analysis

Lecture 16: Compressed Sensing

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

Machine Learning - MT Clustering

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

Spectral k-support Norm Regularization

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

a Short Introduction

Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization

SQL-Rank: A Listwise Approach to Collaborative Ranking

Computational math: Assignment 1

Lecture 9: SVD, Low Rank Approximation

High-dimensional Statistics

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Lecture 3: Error Correcting Codes

Learning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA

Direct product theorem for discrepancy

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Lecture 9: September 28

Restricted Boltzmann Machines for Collaborative Filtering

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Graphical Models for Collaborative Filtering

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Collaborative Filtering

High-dimensional Statistical Models

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Coding with Constraints: Different Flavors

Probabilistic Graphical Models

High dimensional Ising model selection

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Final Exam, Machine Learning, Spring 2009

Inference for High Dimensional Robust Regression

Spectral Clustering. Guokun Lai 2016/10

Data Mining and Analysis: Fundamental Concepts and Algorithms

An Approximation Algorithm for Approximation Rank

Prediction and Clustering in Signed Networks: A Local to Global Perspective

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Disentangling Orthogonal Matrices

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Decoupled Collaborative Ranking

Provable Alternating Minimization Methods for Non-convex Optimization

Spatial Autocorrelation (2) Spatial Weights

Optimal Value Function Methods in Numerical Optimization Level Set Methods

How to learn from very few examples?

Some bounds for the spectral radius of the Hadamard product of matrices

Earthmover resilience & testing in ordered structures

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Recent Developments in Compressed Sensing

Non-convex Robust PCA: Provable Bounds

Matrix-Product-States/ Tensor-Trains

Transcription:

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs Jiaming Xu Joint work with Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, and Lei Ying University of Illinois, Urbana-Champaign June 7, 204

2 / 3 Motivation Data matrices with both row and column cluster structure arise in many applications 7.5 0.5.0 5.0 5.5 6.5 3.0 3.5 4.0 8.5 9.0 0.5 6.0 3.0 0.5 9.5 0.0 0.0 9.5.5 8.0 0.0 6.0 4.5 8.5 5.0 9.0 9.5 0.0 5.0 6.0 4.0 9.0 5.0 7.5 2.0 2.5 7.0 7.5 3.0 6.5.0 6.5 7.5 7.5 7.0 8.0 User rating matrix Gene expression matrix Goal: Cluster rows and columns based on a noisy, partially observed matrix

3 / 3. like: +; dislike: Simple model 2. n users (movies) form r clusters of equal size K 3. users in the same cluster give the same rating to movies in the same cluster 4. block rating is + or with equal prob. Ground truth Y : binary block-constant matrix

3 / 3. like: +; dislike: Simple model 2. n users (movies) form r clusters of equal size K 3. users in the same cluster give the same rating to movies in the same cluster 4. block rating is + or with equal prob. + + + + + + + + + + Ground truth Y : binary block-constant matrix Partial and noisy observation R: erasure prob. ɛ flipping prob. p

4 / 3 When cluster recovery is possible (impossible)? Assume that 0 p < /2 is a constant. ur results apply to the general setting allowing any K, ɛ. large cluster K = n small cluster low erasure ɛ = n α high erasure α

4 / 3 When cluster recovery is possible (impossible)? Assume that 0 p < /2 is a constant. ur results apply to the general setting allowing any K, ɛ. large cluster easy K = n small cluster low erasure ɛ = n α hard α high erasure

5 / 3 utline of the remainder. Impossible regime 2. Nearest-neighbor clustering 3. Spectral method 4. Convex method 5. Maximum likelihood estimation (MLE)

6 / 3 Impossible regime Genie-aided with the set of flipped entries revealed

6 / 3 Impossible regime Genie-aided with the set of flipped entries revealed Construct a new user clustering by swapping two rows in two different row clusters

6 / 3 Impossible regime Genie-aided with the set of flipped entries revealed Construct a new user clustering by swapping two rows in two different row clusters K = n ɛ = n α α

6 / 3 Impossible regime Genie-aided with the set of flipped entries revealed Construct a new user clustering by swapping two rows in two different row clusters K = n /2 /2 impossible ɛ = n α α

6 / 3 Impossible regime Genie-aided with the set of flipped entries revealed Construct a new user clustering by swapping two rows in two different row clusters K = n? /2 /2 impossible ɛ = n α α

7 / 3 Nearest-neighbor clustering Similarity between two users: The number of movies with the same observed rating [Dabeer et al. 2] Algorithm: Each user finds the K most similar users

7 / 3 Nearest-neighbor clustering Similarity between two users: The number of movies with the same observed rating [Dabeer et al. 2] Algorithm: Each user finds the K most similar users K = n /2 /2 ɛ = n α α

7 / 3 Nearest-neighbor clustering Similarity between two users: The number of movies with the same observed rating [Dabeer et al. 2] Algorithm: Each user finds the K most similar users K = n B NN /2 A /2 ɛ = n α α

7 / 3 Nearest-neighbor clustering Similarity between two users: The number of movies with the same observed rating [Dabeer et al. 2] Algorithm: Each user finds the K most similar users K = n B NN? /2 A /2 ɛ = n α α

Spectral method. Approximately clustering rows and columns of the best rank r approximation P r (R) 2. Majority voting within each block of R 3. Reclustering by assigning rows and columns to nearest centers 8 / 3

8 / 3 Spectral method. Approximately clustering rows and columns of the best rank r approximation P r (R) 2. Majority voting within each block of R 3. Reclustering by assigning rows and columns to nearest centers K = n B NN /2 A /2 ɛ = n α α

8 / 3 Spectral method. Approximately clustering rows and columns of the best rank r approximation P r (R) 2. Majority voting within each block of R 3. Reclustering by assigning rows and columns to nearest centers K = n B spectral NN /2 A /2 ɛ = n α α

8 / 3 Spectral method. Approximately clustering rows and columns of the best rank r approximation P r (R) 2. Majority voting within each block of R 3. Reclustering by assigning rows and columns to nearest centers K = n B spectral NN? /2 A /2 ɛ = n α α

9 / 3 Convex method Clustering by first recovering ground truth Y : Y R Y

9 / 3 Convex method Clustering by first recovering ground truth Y : Y R Y MLE is to find a block-constant binary matrix Y matching R as much as possible

9 / 3 Convex method Clustering by first recovering ground truth Y : Y R Y MLE is to find a block-constant binary matrix Y matching R as much as possible A convex relaxation of MLE: max Y R ij Y ij λ Y i,j s.t. Y ij [, ], λ = C ( ɛ)n, C 3

0 / 3 Performance of convex method Assume a technical conjecture (come back later) holds K = n B spectral NN /2 A /2 ɛ = n α α

0 / 3 Performance of convex method Assume a technical conjecture (come back later) holds K = n NN B spectral convex /2 A /2 ɛ = n α α

0 / 3 Performance of convex method Assume a technical conjecture (come back later) holds K = n NN B spectral convex? /2 A /2 ɛ = n α α

/ 3 Performance of MLE K = n B spectral convex NN /2 A /2 ɛ = n α α

/ 3 Performance of MLE K = n B spectral convex NN MLE (p = 0) C /2 A /2 ɛ = n α α

/ 3 Performance of MLE K = n B spectral convex NN MLE (p = 0) C /2 A /2 ɛ = n α α Conjecture: MLE succeeds all the way up to the gray region

2 / 3 Conjecture on convex method Conjecture: For a r r random sign matrix B with SVD B = UΣV, UV log r scales as r.

2 / 3 Conjecture on convex method Conjecture: For a r r random sign matrix B with SVD B = UΣV, UV log r scales as r..38.36.34.32 UB V B r log r.3.28.26.24.22.2 4 5 6 7 8 9 0 2 log 2 r

3 / 3 Please check our paper for details Thank you! Questions?